Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Micro task] Explore NYC taxi trips dataset #6

Open
pavithraes opened this issue Mar 2, 2023 · 46 comments · May be fixed by #11
Open

[Micro task] Explore NYC taxi trips dataset #6

pavithraes opened this issue Mar 2, 2023 · 46 comments · May be fixed by #11
Labels
outreachy-may-2023 Internship projects for Outreachy's May 2023 round

Comments

@pavithraes
Copy link
Member

pavithraes commented Mar 2, 2023

The New York City TLC taxi trips records data is frequently used for creating examples and tutorials for Python data science workflows. You can access the dataset through any of the following ways:

Note that the actual dataset is quite large, so please use a subset of the data or consider reducing it.

To complete this micro-task, download and explore a subset of the dataset with Bokeh plots. You can share your Jupyter Notebooks with us as a GitHub gist. As per Bryan's comment here, please open separate issues/PRs with your wok, so that we can share feedback individually.

@robinokwanma
Copy link

robinokwanma commented Mar 7, 2023

Hi, i was approved for the initial stage of Outreachy. i noticed the datasets are in parquet format. I need some clarity and guidance, can bokeh read the parquet files directly? I was able to read them using pandas.pd. @pavithraes

@BhaswatiRoy
Copy link

Hello @pavithraes
In this issue, we will mainly focus on cleaning and preprocessing the data as well as visualizing the data using Bokeh with as many important plots as possible??

@akanshajais
Copy link

Hello @pavithraes , I am exploring this data for the project Create a Blog Post Series - " Fundamentals of Data visualization in Bokeh ". I will use some python libraries to summarize and analyze the data after Performing tasks like Data Wrangling and Data processing to visualize it as per project requirements and then I'll use it on project.

@AnishereMariam
Copy link

Hi @robinokwanma, open the link to the dataset website, scroll down, and you will see a hyperlink; "Working with PARAQUET format" right under the "Data Dictionary and MetaData" subtitle. There are details on how to work with the format in there and full details in the "trip record user guide".

@robinokwanma
Copy link

Thank you @AnishereMariam . I'm taking a look now

@robinokwanma
Copy link

Hi @pavithraes @AnishereMariam
Does this work? https://gist.github.com/robinokwanma/cc81d1a9f491377f963216848c036d26

That's the link to my githubgist on this microtask. Please review

@Soot3
Copy link

Soot3 commented Mar 7, 2023

@pavithraes started with this https://gist.github.com/Soot3/9eaf170fa2048e373e05046222350f54

@oluwaseun-tech
Copy link

@AnishereMariam thank you for answering the question @robinokwanma, I was about to ask the same question.

@oluwaseun-tech
Copy link

@pavithraes I realized that the dataset was done on monthly basis, can someone download more than one month's dataset for the exploration?

@AnishereMariam
Copy link

AnishereMariam commented Mar 7, 2023

@robinokwanma, I opened the file and noticed that although the code seems fine, some variables are wrongly placed. Please fix that.

@oluwaseun-tech, you are most welcome. Each month has over a million rows of data, if you are sure you can handle multiple months, it's great, but I suggest you use a subset of the data. That is just my opinion.

@robinokwanma
Copy link

Thank's i have made the changes.

@oluwaseun-tech
Copy link

Oh! Okay thank you

@oluwaseun-tech
Copy link

@pavithraes please take a look at what have done so far https://gist.github.com/oluwaseun-tech/ef413dd9658b2123bfc7240652bae90b

@BhaswatiRoy
Copy link

BhaswatiRoy commented Mar 9, 2023

Hello @pavithraes @AnishereMariam
Here is my work on the analysis of NYC Taxi data on Jupyter Notebook.
I have also attached pictures of the output after the codes. Reviews would be appreciated.

https://github.com/BhaswatiRoy/Data-Analysis-Projects/tree/main/Bokeh_Plots

@JoyclynUjunwaOgbonna
Copy link

robinokwanma

I had this similar problem. You can use pd.read_parquet to load the dataset.

@AnishereMariam
Copy link

Hi @BhaswatiRoy, your choice of visualizations is really cool.

@AnishereMariam
Copy link

Hi @JoyclynUjunwaOgbonna, have you been able to solve that via the solutions I suggested earlier?

@BhaswatiRoy
Copy link

BhaswatiRoy commented Mar 10, 2023

Hi @BhaswatiRoy, your choice of visualizations is really cool.

thanks @AnishereMariam for the feedback, I am on my way to adding more visualizations!

@AnishereMariam
Copy link

That is perfect @BhaswatiRoy

@AnishereMariam
Copy link

@pavithraes, the link to my work on NYC Data Exploration on GitHub gist is below:
https://gist.github.com/anisheremariam/e5f4cb9f46f05f7ba5aa35d449922f53
I appreciate any reviews and comment on it. Thank you

@Faith-Nchifor
Copy link

Hello @pavithraes, @bryevdv, everyone.
I have an issue. My lineplot does not display as expected. If you look at it, you will see that it does not plot as expected. What can I do? Here is the link to my notebook: https://www.kaggle.com/faithnchifor/nyc-trips-viz

@JoyclynUjunwaOgbonna
Copy link

JoyclynUjunwaOgbonna commented Mar 10, 2023

@Faith-Nchifor the link to your notebook is showing a 404 error-"I can't find this page". This usually happens when your kaggle notebook is on private. Could you check if your notebook is on private? If it is, you might want to make it public so people can access it.

@Faith-Nchifor
Copy link

I'm sorry about that @JoyclynUjunwaOgbonna . It's now public

@AnishereMariam
Copy link

@Faith-Nchifor, I think it is the method you used. The chart followed the irregular fitting of the index. Would you consider using the groupby method;
group by
use groupby

@Faith-Nchifor
Copy link

@AnishereMariam your method is good. I realized that my map behaved the way it did because I never sorted the data. It looks just like this one now. Thanks for your input

@BhaswatiRoy BhaswatiRoy linked a pull request Mar 11, 2023 that will close this issue
@Faith-Nchifor
Copy link

Hello @bryevdv, @pavithraes
Here is the link to my gist: https://gist.github.com/Faith-Nchifor/b57ee2140e2dd1ea110d5f17c54626ee
My project interest is Create a blog post series: "Fundamentals of Data Visualization in Bokeh"

@Ajoke23
Copy link

Ajoke23 commented Mar 12, 2023

Hi @Faith-Nchifor well done

@Ajoke23
Copy link

Ajoke23 commented Mar 12, 2023

Hi @BhaswatiRoy nice analysis and you choice of visualization is really great

@Ajoke23
Copy link

Ajoke23 commented Mar 12, 2023

If you are having any challenges regarding the project, ask on this channel. I will be of great help to assist anyone

@BhaswatiRoy
Copy link

thanks @Ajoke23 for the reviews

@Isaakkamau
Copy link

Hello, @pavithraes @AnishereMariam please take a look at my first assignment on the analysis of NYC Taxi data on Jupyter Notebook.
https://gist.github.com/Isaakkamau/358d2ccff3612d95496972fa67842021

@anushka-png
Copy link

Hello everyone ,my name is Anushka Sharma and I have made my contribution in #1 project
@pavithraes @bryevdv please have a look at my assignment
Here is my gist link https://gist.github.com/anushka-png/ffd9d83d2b6b46d169c5e510dc4123d9

I have tried to work with two different datasets first one is TLC Driver 24 hour course and second one is yellow taxi dataset for the month oct and nov . Also for the reference , have attached a pdf containing my outputs and other relevant data as well .I am contributing to a project for the first time . I appreciate any reviews and comment on it.
Thank you

@Faith-Nchifor
Copy link

#6 (comment)
Thank you @Ajoke23
Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

@Azaya89
Copy link
Member

Azaya89 commented Mar 12, 2023

Hi, here is my submission for the microtask on the project, Create a blog post series: "Fundamentals of Data Visualization in Bokeh."
https://github.com/Azaya89/Bokeh-microtask

Attached in a separate images folder are the plots that were generated inline. For some reason, they do not appear inline in the notebook here on github.

@Ajoke23
Copy link

Ajoke23 commented Mar 12, 2023

#6 (comment)
Thank you @Ajoke23
Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot:
You do show(variable name)
Variable name assign when creating the plot

@AnishereMariam
Copy link

Hello, @pavithraes @AnishereMariam please take a look at my first assignment on the analysis of NYC Taxi data on Jupyter Notebook. https://gist.github.com/Isaakkamau/358d2ccff3612d95496972fa67842021

@Isaakkamau, that is fine work. keep up the good work.

@Isaakkamau
Copy link

@AnishereMariam thanks a lot, but how many visualizations are we supposed to have? I decided first to do one then I can add others if it's needed

@PatChizzy
Copy link

Hello @pavithraes @AnishereMariam

Please find my contribution for task 1 here
Your feedback would be appreciated.

I added the visualizations as comment since github gist cant render it from my notebook.

@Ajoke23
Copy link

Ajoke23 commented Mar 13, 2023

Hi, here is my submission for the microtask on the project, Create a blog post series: "Fundamentals of Data Visualization in Bokeh."
https://github.com/Azaya89/Bokeh-microtask

Attached in a separate images folder are the plots that were generated inline. For some reason, they do not appear inline in the notebook here on github.

Well done @Azaya89, you did a great work

@Ajoke23
Copy link

Ajoke23 commented Mar 13, 2023

Hello @pavithraes @AnishereMariam

Please find my contribution for task 1 here
Your feedback would be appreciated.

I added the visualizations as comment since github gist cant render it from my notebook.

You did a great work. Well done @PatChizzy. Unique and creative visualization

@bryevdv
Copy link
Member

bryevdv commented Mar 13, 2023

Hi all thanks for the submissions so far! This is our first time doing outreachy so this is a learning experience for us as well! One thing that has become apparent is that it is a bit confusing and difficult to provide individualized comments when all the submissions are mixed together in one place like this! I'd like to ask everyone who has submitted here to open a new issue that has any relevant links, images, etc for your work. This will allow us to have 1-1 conversations with everyone on their own issue :)

@Ajoke23
Copy link

Ajoke23 commented Mar 13, 2023

Hi all thanks for the submissions so far! This is our first time doing outreachy so this is a learning experience for us as well! One thing that has become apparent is that it is a bit confusing and difficult to provide individualized comments when all the submissions are mixed together in one place like this! I'd like to ask everyone who has submitted here to open a new issue that has any relevant links, images, etc for your work. This will allow us to have 1-1 conversations with everyone on their own issue :)

For those that might been having issue figuring it out you can follow this steps.
To do this kindly:
1.visit the link to the project on Github
#6
2. If you are using a desktop, click on the "New issue" button on the right hand side of the page.
3. Write a title and a description. Give a descriptive title and a well detailed description on the "Write" section comment.
Let the description contain the link to notebook
4. Click on "Submit new issues"

That's all.
I hope this helps someone

@Azaya89
Copy link
Member

Azaya89 commented Mar 13, 2023

#6 (comment)
Thank you @Ajoke23
Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

I think the issue here is not the code written. What I've been able to figure out is that using output_notebook() in the jupyter notebook is what renders it inline on your notebook but exporting the notebook to github won't render the plots since output_notebook() is not running on github. So it's best to also post the plots as a separate image.

I hope this helps.

@Faith-Nchifor
Copy link

#6 (comment)
Thank you @Ajoke23
Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

This works quite alright in my python environment. However, when the notebook has been downloaded, the images do no show

@Faith-Nchifor
Copy link

#6 (comment)
Thank you @Ajoke23
Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

I think the issue here is not the code written. What I've been able to figure out is that using output_notebook() in the jupyter notebook is what renders it inline on your notebook but exporting the notebook to github won't render the plots since output_notebook() is not running on github. So it's best to also post the plots as a separate image.

I hope this helps.

Okay @Azaya89. I'm gonna try it out. Thanks

@Azaya89
Copy link
Member

Azaya89 commented Mar 13, 2023

Okay @Azaya89. I'm gonna try it out. Thanks

You're welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
outreachy-may-2023 Internship projects for Outreachy's May 2023 round
Projects
None yet
Development

Successfully merging a pull request may close this issue.