Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

02-dataframe.ipynb notebook - Solutions for checkpint using df_dask which currently does not exist? #1

Open
billglennon opened this issue Jun 14, 2021 · 2 comments

Comments

@billglennon
Copy link

In the 02-dataframe.ipynb notebook, you have the following:
df = dd.read_csv("data/yellow_tripdata_2019-.csv")
df
and
df = dd.read_csv("data/yellow_tripdata_2019-
.csv",
dtype={'RatecodeID': 'float64',
'VendorID': 'float64',
'passenger_count': 'float64',
'payment_type': 'float64'
})

However, you are using df_dask in your checkpoint solution (for both) which does not exist.

Solution 1

std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()

Solution 2

mean_total = df_dask.total_amount.mean()
std_total = df_dask.total_amount.mean()

dask.compute(mean_total, std_total)

I recommend changing to use df_dask for the Dask dataframe section and change existing code to use it.
There are a few places that use df.xxx when going over the Dask dataframe.
e.g.
%%time

mean_tip_amount = df.groupby("passenger_count").tip_amount.mean()
mean_tip_amount

Also, I would make Solution 1 the following (if you use df_dask) and show the output (added 2nd line)
std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()
std_tip

Thanks for putting this course together and the notebooks! Very much enjoyed it!

@pavithraes
Copy link
Contributor

Thank you for sharing this issue! We'll make the updates soon!

Also, we're glad you liked the course :)

@selasley
Copy link

selasley commented Jan 9, 2022

Should the standard deviation calculation in Solution 2
std_total = df_dask.total_amount.mean()
actually be
std_total = df_dask.total_amount.std()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants