Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor multi-model statistics code to facilitate ensemble stats and lazy evaluation #949

Merged
merged 9 commits into from
Jan 27, 2021

Conversation

Peter9192
Copy link
Contributor

@Peter9192 Peter9192 commented Jan 18, 2021

Description

This PR refactors the multimodel statistics code. Specifically, it separates the core functionality (computing statistics across cubes) from the intermediate layer that deals with the handling of ESMValCore products.

This facilitates re-use of the core function in e.g. an ensemble statistics function (see #673) and also it makes it easier to make the core functionality lazy (see #781 and #950).


Before you get started

Checklist

If you make backwards incompatible changes to the recipe format:

  • N/A Update ESMValTool and link the pull request(s) in the description

To help with the number pull requests:

Separate the ESMValCore internals, dealing with products, from
the core function operating on cubes.

This makes it easier to add a new preprocessor for ensemble statistics,
and also to make a new core function (lazily) compute statistics
across cubes.
Copy link
Contributor

@stefsmeets stefsmeets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Peter9192 , all tests are passing, so it looks good to me! Separating how products are processed from cubes will make the code much easier to work with.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers @Peter9192 for this! I am still to read the changes to the documentation, the functionality seems to be nicely restructured, but am not quite happy with the docstring changes - they have become too technical and I don't think this is a good idea, given who we cater for 👍 EDIT I have now gone through the documentation changes and I like'em, job well done there - but still confused as to why multimodel -> multi-model 🍺

doc/recipe/preprocessor.rst Show resolved Hide resolved
doc/recipe/preprocessor.rst Show resolved Hide resolved
esmvalcore/preprocessor/_multimodel.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_multimodel.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_multimodel.py Show resolved Hide resolved
@Peter9192
Copy link
Contributor Author

Thanks @valeriupredoi I'll look into your suggestions! I appreciate the point about the audience for the docstrings.

return statistics_products


def multi_model_statistics(products: set,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this function is part of the public API. Changing it so it no longer accepts cubes is a breaking change that will probably break diagnostic code. Would it be possible to make it work again that if you put cubes in, you get cubes out? And of course the docstring needs to be good as @valeriupredoi already pointed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shucks! I totally missed that - very good point @bouweandela - yes, @Peter9192 please see this and revise the approach 👍

Copy link
Contributor Author

@Peter9192 Peter9192 Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I realized this as well but I'm struggling to find an appropriate solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Peter9192 Didn't we talk about making multi_model_statistics cube only (not to mess with the public API) and simply pass the arguments to multi_cube_statistics. Then use _multi_model_statistics with products for internal use? Or am I confusing 2 PRs now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a lot of contemplating I have accepted that the public API function will have to accept both products and cubes as possible inputs. I had rather hidden those products from the public altogether, but this becomes very ugly very quickly. So I have now restored the public multi_model statistics as the single entry point that returns different functions based on its input type.

as entry point for both multi-cube and multi-product statistics. This
is to maintain the alignment between the recipe API and the public
preprocessor documentation.
@Peter9192
Copy link
Contributor Author

All green! @valeriupredoi I think I got everything covered, could you please have another look? Thanks 🍻

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, development tests pass too (not cached ones on Circle, actual branch and ESMValTool dev ones), cheers @Peter9192 🍺 @bouweandela did you want to have another look yerself?

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, more clean. Just a minor comment on the unit tests.

@Peter9192
Copy link
Contributor Author

Now the test fails because of iris docs change or smt like that 😞

@bouweandela
Copy link
Member

Now the test fails because of iris docs change or smt like that disappointed

That should be fixed by #964

@bouweandela bouweandela merged commit b632f9a into master Jan 27, 2021
@bouweandela bouweandela deleted the mmstats_refactor branch January 27, 2021 14:26
@jvegreg jvegreg added the preprocessor Related to the preprocessor label Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants