Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dask whenever possible in preprocessor to keep memory intake low #32

Closed
2 of 3 tasks
valeriupredoi opened this issue Apr 24, 2019 · 2 comments
Closed
2 of 3 tasks
Assignees
Labels
enhancement New feature or request preprocessor Related to the preprocessor

Comments

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Apr 24, 2019

This is a followup from a lot of suggestions and work (mainly done by @bouweandela and @jvegasbsc ). Examples of high memory intake can be seen in various issues like #810 or #922 ; also work is underway in PRs like #1001 or initiated by issues like #949 . There are also the issues related to inherent changes of iris and handling of lazy data eg see #887 So far, actual work is done as follows:

Active work as PR

Let's add pull requests here that address the use of dask in other preprocessor modules and gradually close all the issues and PR's (upon acceptance and good code behavior wrt memory) that are listed above 🍺

@mattiarighi mattiarighi transferred this issue from ESMValGroup/ESMValTool Jun 11, 2019
@mattiarighi mattiarighi added enhancement New feature or request preprocessor Related to the preprocessor paper labels Jun 11, 2019
@bjlittle
Copy link
Contributor

bjlittle commented Jul 29, 2019

@valeriupredoi and @bouweandela You guys should be aware of SciTools/iris#3357

In a nutshell, if a netCDF variable has an UNLIMITED dimension, then netCDF automatically applies netCDF level chunking to the file, which in most cases will be detrimental to the performance of dask within iris i.e. the chunking specified by netCDF is really small, almost too small for dask, and as such there is a massive overhead in dask to deal with files that have tiny chunks.

A fix on our side will resolve this... just sayin 😉

@mattiarighi mattiarighi removed the paper label Jan 7, 2020
@bouweandela
Copy link
Member

Created a new overview issue: #674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

No branches or pull requests

6 participants