-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark for climatology #1552
Conversation
tests/geospatial/test_climatology.py
Outdated
.expand_dims(hour=hours, dayofyear=daysofyear) | ||
.assign_coords(hour=hours, dayofyear=daysofyear) | ||
) | ||
working = working.map_blocks(compute_hourly_climatology, template=template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if people typically compute climatology with complex-enough calculations that warrant the use of map_blocks
or typically use calculations that leverage Xarray's higher-level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People use rechunk+map_blocks because it usually would not work otherwise. It's basically "manual fusion of tasks" to not overwhelm the scheduler. And convenient because you can use Xarray API in there.
I strongly recommend running a version of this without those steps. Ideally, these tricks should not be necessary for this calculation.
Notice that the workload rechunks back to pancakes... The chunking to pencils is purely to get this to work with dask+distributed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, what I'm hearing you say is: The computations people generally execute for calculating climatology, be it mean
, std
, quantile
(or SEEP
?) can be expressed in Xarray's high-level API and do not fundamentally require a custom function mapped over all blocks.
In that case, I fully agree that we should avoid map_blocks
but express these calculations in Xarray. We should probably benchmark both a computation based on a decomposable aggregation like mean
and one based on a holistic aggregation like quantile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The computations people generally execute for calculating climatology,
Yes. generally. The trick is to look at the function being mapped: in this case it's all Xarray ops and no for loops. That indicates the map_blocks is a workaround for some Xarray and/or dask/distributed inefficiency.
We should probably benchmark both a computation based on a decomposable aggregation like mean and one based on a holistic aggregation like quantile.
Yes totally agree with adding quantile. Dask will force a rechunk :)
PS: What is SEEP
?
PPS: I notice this is based on a Xarray-beam workload, it may be that rechunk+map_blocks is the only way to express this in that framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. generally. The trick is to look at the function being mapped: in this case it's all Xarray ops and no for loops. That indicates the map_blocks is a workaround for some Xarray and/or dask/distributed inefficiency.
Thanks for the clarification!
PS: What is
SEEP
?
All I can offer is https://github.com/google-research/weatherbench2/blob/47d72575cf5e99383a09bed19ba989b718d5fe30/scripts/compute_climatology.py#L147-L175. Maybe @shoyer can elaborate on SEEP
and how common similar calculations are? Just looking at the code, this can probably be expressed in Xarray's high-level API as well.
PPS: I notice this is based on a Xarray-beam workload, it may be that rechunk+map_blocks is the only way to express this in that framework.
Yes, that's what it's based on, but not what we should be limited by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, less experienced users usually try without rechunk+map_blocks first, and then grudgingly resort to that. It'd be nice to model that interaction and improve things if possible.
Yes, rechunk+map_blocks is definitely not the obvious solution. It would be quite nice if we could automatically optimize high-level Xarray code to use a rechunk + map_blocks style implementation when it will be more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we won't beat hand-optimized code. However, a major goal of these benchmarks is to understand how we can improve the performance of code that relies on high-level APIs to improve the end-user experience. We all seem to agree that it would be great if users wouldn't have to worry about rewriting their high-level code in a rechunk+map_blocks
-style code if it can be avoided. So, wherever possible, I think we should try and benchmark code without optimization and analyze where and how this falls apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also add that we want rechunk+map_blocks
to always succeed, even if slow, so that it can be a dependable fallback, particularly when the algorithms get more complicated. This would be a major improvement over the dask.array experience today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcherian: Could you elaborate on that last comment? In what situations does that combination not succeed today?
I can imagine this failing if the block sizes or the temporary memory footprint of the mapped function end up being too large. However, this seems like a situation that is beyond the control of higher-level APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fail whenever rechunking fails... agree that after that, the blockwise is pretty stable, which is why people use this solution in combination with some kind of batching to avoid the rechunking failure. I don't have an example at hand. This is anecdotal experience from looking at lot of non-expert code at NCAR.
This should be good for a review now. (I expect that we will iterate on this once we have improved on some things within Xarray/Dask.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmakait
Closes #1549
This benchmark doesn't even run at the small scale yet.