-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rounding of Pandas datetimes in ICON CMORizer #2529
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2529 +/- ##
=======================================
Coverage 94.77% 94.77%
=======================================
Files 249 249
Lines 14094 14095 +1
=======================================
+ Hits 13358 13359 +1
Misses 736 736 ☔ View full report in Codecov by Sentry. |
ayay - check out the devel test with bleeding edge deps:
cftime plugging in some 0.997s out of thin air, they'd get absolutely destroyed if they worked in motorsports 😁 |
Have you tried this suggestion to work around the issue? pandas-dev/pandas#57002 (comment) |
Yes, it don't work, no Pd rounding works, including the one in the cmorizer 🍺 |
This also doesn't sound like a workaround to me... However, what DOES work is the hint in the description of that issue: # Round just date works
dates[0].round("30min")
# >Timestamp('2022-01-11 12:00:00') See here: >>> datetimes = year_month_day + day_float
>>> print(datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
>>> print(rounded_datetimes)
0 2004-11-04 14:00:00
dtype: datetime64[ns] |
Ahh wait, I just realized that I have an old version of pandas, sorry, will update you again once that is updated... |
You need to update ninja bear to 2.2.3 Manu 😁 |
Yes!! It does work with pandas v2.2.3: >>> import pandas as pd
>>> pd.__version__
'2.2.3'
>>> time_points = [20041104.5833333]
>>> time_str = pd.Series(time_points, dtype=str)
>>> year_month_day_str = time_str.str.extract(r'(\d*)\.?\d*', expand=False)
>>> year_month_day = pd.to_datetime(year_month_day_str, format='%Y%m%d')
>>> day_float_str = time_str.str.extract(
r'\d*(\.\d*)',... r'\d*(\.\d*)', expand=False
... ).fillna('0.0')
>>> day_float = pd.to_timedelta(day_float_str.astype(float), unit='D')
>>> datetimes = year_month_day + day_float
>>> print(datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> # Old version:
>>> rounded_datetimes = datetimes.round('s')
>>> print(rounded_datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> # New version:
>>> rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
>>> print(rounded_datetimes)
0 2004-11-04 14:00:00
dtype: datetime64[ns] |
You want to plug this in @valeriupredoi? You can simply replace the existing l.496 with rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes) and should work 😁 I can also do it if you want, already got a local copy with this change. |
So why is the rounding done in the ICON cmorizer don't work then, and all the attempts I tried two days ago? You mind plopping the correct syntax in this PR, and get rid of the peasant rounding then plss, bud 🍺 |
That'd be fab, cheers, bud! 🍺 |
In the pandas issue they say that only rounding the rounded_datetimes = datetimes.round('s') This is what we currently have; hence, it fails. However, rounding individual elements, i.e., rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes) works. Please don't ask me why... |
that'd be why then - I completely missed the conversion to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers 🍻
(not sure if that approval counts since I contributed to the code 😁)
Did you time some representative test case @schlunma? Looping over every element of an array in Python and then doing something with it can be really slow. |
No, but I don't think it's critical here. All ICON output I've seen so far is scattered across many files with only very few time points (sometimes only one time step is stored). Even the high-res output (20 mins) we have only has 72 time points per file. However, I am open to any better solution. All solutions I could think of involved Python loops in one way or another. |
we can always revert to the original implementation when Pandas finally fix their silly bug. Thanks a lot, Manu! @bouweandela would you be a trooper and merge this, pls 🍺 |
Well, my peasant approach didn't - it involved a list not an array 🤣 |
But you still had a |
An even worse one too 🤣 |
I did some timings (with pandas 2.1.4) and for 100 points the time it takes is quite marginal, even if the new approach is more than 10 times slower:
|
Many thanks, gents 🍺 |
Description
We have been pinning Pandas since at least three minor versions ago, see #2394 due to a bug in Pandas that they seem to be taking forever to fix issue pandas-dev/pandas#57002 and now we are yet again faced with the same issue with the new version of Pandas 2.2.3 - see failed nightly GA. The problem is simple: Pandas are simply unable to round to the next minute eg 12H:12M:39S should be 12H:13M:00S (for daily data), so I got fed up and wrote our own rounding function (that took me surprisingly long to get it right - about an hour, rounded to the next minute :rofl: I think this should not affect the result when Pandas finally fix their bug - but I'll let @schlunma decide 🍺
Closes #2530
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: