Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make preprocessor lazy #674

Open
56 of 62 tasks
bouweandela opened this issue Jun 12, 2020 · 7 comments
Open
56 of 62 tasks

Make preprocessor lazy #674

bouweandela opened this issue Jun 12, 2020 · 7 comments
Labels
enhancement New feature or request preprocessor Related to the preprocessor

Comments

@bouweandela
Copy link
Member

bouweandela commented Jun 12, 2020

Overview issue with laziness status of preprocessor functions:

Checked means lazy, unchecked means not lazy or partially lazy, a question mark behind the preprocessor name means that it is unknown whether this preprocessor function is lazy or not.

Note that *_statistics preprocessor functions are lazy except for median, workaround is to use operator: percentile; percent: 50.

It would be great if we could make more preprocessor functions lazy. The laziness status should also be indicated in the docstrings.

Related to #51

@bouweandela bouweandela added enhancement New feature or request preprocessor Related to the preprocessor labels Jun 12, 2020
@valeriupredoi
Copy link
Contributor

good call!! I reckon this is a good first serious feature for v2.1 🍺

@bouweandela
Copy link
Member Author

bouweandela commented Jan 31, 2021

To get an idea of the priority of the preprocessor functions, here is a rough count of the number of recipes in the ESMValTool that they are used in:

regrid 60
extract_region 30
derive 26
extract_levels 24
mask_landsea 21
area_statistics 20
climate_statistics 18
multi_model_statistics 16
mask_fillvalues 15
annual_statistics 8
convert_units 8
weighting_landsea_fraction 7
anomalies 7
extract_point 6
zonal_statistics 5
extract_time 4
extract_shape 4
amplitude 4
extract_season 3
detrend 3
extract_month 2
extract_transect 2
depth_integration 2
volume_statistics 2
mask_landseaice 1
extract_volume 1
extract_trajectory 1
extract_named_regions 1
meridional_statistics 1
daily_statistics 1
decadal_statistics 1

@Peter9192
Copy link
Contributor

Two points related to this:

@bouweandela
Copy link
Member Author

Now I'm inclined to do ad-hoc rechunking whenever I need it

I think it makes sense to do that, because in most cases there is no need to rechunk. I would expect this is needed only for preprocessor functions that dramatically increase chunk size, and even then it might be best to try and leave that to iris.

@Peter9192
Copy link
Contributor

Some things we ran into:

  • We start to see that the timing of log messages becomes confusing for lazy evaluation.
  • We should consider the interplay between the multiprocessing on tasks and the multithreading by dask.
  • Maybe we should consider setting up some global dask configuration, or a way to tune easily.

@remi-kazeroni
Copy link
Contributor

The recipe_collins13ipcc is one of the most demanding memory recipe that we have in ESMValTool, see log file for a test run for the v2.5 release. Would it be a case that would benefit from preprocessor laziness?

@bouweandela
Copy link
Member Author

Yes, it would benefit. Some progress has been made recently by @zklaus on the lazy regridding for ocean data (though I think this is currently not yet enabled by default, you need to make changes to the recipe in order to use the new functionality and install an extra package manually). Lazy vertical interpolation would be a good next candidate to tackle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
Status: In Progress
Development

No branches or pull requests

4 participants