Clean new config #269

huard · 2023-03-08T02:49:30Z

This is a big refactor of the way models are configured and run.
This PR does not remove any of the original code, but creates a new_config directory with the new architecture.

It also drastically simplifies the user interface, see ravenpy/ravenpy.py for the two main functions users will use:

run to run the model on an existing configuration
parse to read the outputs and create Python objects

See tests.conftest.gr4jcn_config for an example of the configuration with the refactor:

m = GR4JCN(
        params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
        Gauge=rc.Gauge.from_nc(
            f,
            data_type=["PRECIP", "TEMP_MIN", "TEMP_MAX"],
            alt_names=alt_names,
            extra={1: {"elevation": salmon_hru["land"]["elevation"]}},
        ),
        ObservationData=rc.ObservationData.from_nc(f, alt_names="qobs"),
        HRUs=[salmon_hru["land"]],
        StartDate=dt.datetime(2000, 1, 1),
        EndDate=dt.datetime(2002, 1, 1),
        RunName="test",
        CustomOutput=rc.CustomOutput("YEARLY", "AVERAGE", "PRECIP", "ENTIRE_WATERSHED"),
        GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
    )

Writing the config files to disk is then done with m.write(path).

The configuration supports symbolic expressions, e.g.

uniform_initial_conditions: Dict[str, float] = Field(
        {"SOIL[0]": P.GR4J_X1 * 1000 / 2, "SOIL[1]": 15},
        alias="UniformInitialConditions",
    )

I think the objective now would be to pinpoint non-intuitive stuff in the new configuration and fix those. Once we're happy, let's merge this PR and then create others adding the other emulators and functionalities we want to port.

Ping @Mayetea

This PR fixes #272

huard · 2023-03-09T16:58:40Z

The failing tests are due to a mistake on my end. I did an automated rename of something and I thought it applied only to a file, when it applied to the whole repo... will fix this.

coveralls · 2023-03-09T17:19:28Z

Pull Request Test Coverage Report for Build 4409663436

1084 of 1155 (93.85%) changed or added relevant lines in 11 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+1.9%) to 86.204%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ravenpy/new_config/base.py	127	129	98.45%
ravenpy/utilities/calibration.py	46	48	95.83%
ravenpy/new_config/utils.py	52	58	89.66%
ravenpy/ravenpy.py	94	102	92.16%
ravenpy/new_config/rvs.py	139	150	92.67%
ravenpy/new_config/commands.py	363	405	89.63%

Files with Coverage Reduction	New Missed Lines	%
ravenpy/models/base.py	2	93.3%

Totals
Change from base Build 4358668933:	1.9%
Covered Lines:	4830
Relevant Lines:	5603

💛 - Coveralls

huard · 2023-03-09T18:45:31Z

So a few questions to guide the review:

Should ravenpy.run return something. At the moment, it will raise errors if something's wrong, otherwise if the simulation is complete, it returns None.
In ravenpy.__init__, I'm exposing run and parse, but should other parsers be made available at the top level (e.g. parse_diagnostics)?
Should we rename run and parse to something else ? run could also be raven.
In run, the file name is called identifier, while in parse its run_name. This should probably uniformized. By default, the configuration uses run_name for the rv configuration files. I could replace identifier by run_name.

Zeitsperre · 2023-03-09T20:04:23Z

Should ravenpy.run return something. At the moment, it will raise errors if something's wrong, otherwise if the simulation is complete, it returns None.

If we do not want to raise errors, could we return an object that describes whether the simulation failed (and how) ? This would be good information to pass along via WPS process / logging.

Should we rename run and parse to something else ? run could also be raven.

I like run; less confusing when we already have ravenpy and ravenwps.

richardarsenault · 2023-03-09T20:04:39Z

Good questions.

Maybe it should return the standard "0"? I'm not a good enough developer to provide more in-depth comments on this.
Ideally we should have a method like "build" where we build_model the model before running it, maybe set_parameters, etc.? That would allow accessing everything before we actually launch a run. Ideal for calibration and assimilation, for example.
I suggest keeping run and parse. Calling "ravenpy.raven" seems odd to me.
I agree to uniformize to run_name.

Trying to setup my env to help evaluate, debug and contribute to this!

Mayetea · 2023-03-10T14:26:35Z

I agree with Trevor and Richard comment on the way we should structure the code. Also, shouldn't we removed the part of Ostrich in base.py? I never installed the binary for it since we were suppose to rip it. So right now, if I'm not commenting the Ostrich code, 60 tests fails and if I comment I still get errors for Ostrich unit tests (16 failed tests) and 2 others which could be related but doesnt give Ostrich errors.

So what do we do with this? Do ,we include the removal in this PR or in another PR?

Zeitsperre · 2023-03-10T18:42:24Z

Also, shouldn't we remove the part of Ostrich in base.py?
...
So what do we do with this? Do we include the removal in this PR or in another PR?

If we can get those changes underway, that removes one of the blockers for #264 (the other is being worked on currently).

huard · 2023-03-13T13:40:07Z

This PR is only about the new config, so yes, Ostrich is still there. I'll open a new PR to plug SPOTPY to the new config once this one is merged, and then we'll be able to remove Ostrich.

The "build" step is shown in the PR description, configure the model m=GR4JCN(...), then write it to disk m.write(path).

We can return True if the simulation has completed, or the path of the output directory.

Mayetea · 2023-03-13T13:56:06Z

I think what Richard meant is that we should have a dedicated class to do the raven job something like this :

model = GR4JCN(...)
raven_runner = raven.build(model, path)
raven_runner.run()
raven_runner.write() #Writes the results in a file.

It might be impossible to refactor this way now, but I think its the way he was talking about.

huard · 2023-03-13T15:42:19Z

That's very easy, I can whip something up for discussions.

huard · 2023-03-13T19:35:34Z

See the latest commit for a new Emulator class with methods:

build
run
parse

Mayetea · 2023-03-14T13:03:46Z

I've tested the new push from yesterday and the only errors I have (non-related to Ostrich) is in test_read_from_netcdf and test_open_dataset_false_cache.

In test_read_from_netcdf, we assert to have a message different than None, but we receive None.

In test_open_dataset_false_cache, I get an error for Permission denied on accessing a file.

huard · 2023-03-15T19:48:01Z

Weird, no clear idea where this is coming from. Bugs notwithstanding, what I'm interested in is feedback on the API: the naming of functions and arguments, the workflow, etc.

Mayetea · 2023-03-16T15:10:07Z

I'm currently trying to use the new_config in the spotpy test. I have 2 comments so far.

First, I think the import of the model should be as claer as before : from ravenpy.models import GR4JCN and not from ravenpy.new_config.emulators.gr4jcn import GR4JCN. So we could import all models from .emulators or choose to import a specific by naming it (as for GR4JCN)

Second, I dont understand why in the Emulator.build() we have overwrite as True for a default value, but in Config.build() the overwrite is set as False by default.

I'll let you know if I have more comment as soon as I make it work in with spotpy

huard · 2023-03-16T16:04:27Z

Agree with your first point, but this poses the problem of how to deal with the old and new versions. Maybe when this PR is completed we wipe out the old versions and new_config becomes config.

Good catch, changed overwrite to False in Emulator.build.

huard · 2023-03-16T16:13:18Z

test_spotpy_calibration should now run the calibration for gr4j. To run the other models that have a new config, we only need to add the low and high bounds.
The calibration test with the new config should probably be moved to tests/new_config

Mayetea · 2023-03-16T17:29:04Z

Ah yes, I understand the issue. I'll try spotpy with your new push. Ill let you know, how it goes!

Mayetea · 2023-03-16T18:37:45Z

I dont see any file enforcing in the new config. Is it the "path" params in the Emulator constructor?

huard · 2023-03-16T18:56:05Z

Check tests/new_config/emulators

There is a pytest fixture called emulator that will instantiate all the emulators in names.
Then this is used in tests/new_config/test_emulators to test all emulators with one single test, instead of writing one test per emulator.

huard · 2023-03-16T18:57:53Z

Maybe I misunderstood the question, but calibration.py has a SpotSetup class using the new config.
It's exercised in test_spotpy_calibration, using the same emulator fixture.

Mayetea · 2023-03-16T19:37:56Z

I was talking about the netCDF that we shoot for the training, but I think I found how you were passing it with the fixtures. Ill try to make it work without fixtures, but it seems good. It was a bad interpretation on my end!

tests/conftest.py

Mayetea · 2023-03-17T15:11:09Z

I've managed to make, spotpy work without fixture, but it doesnt do a rerun of raven for each evaluation. It keeps shooting the same nash -0.117301, so it doesnt try to balance the params. Im looking for the issue right now.

Mayetea · 2023-03-17T15:24:01Z

So I found out in debugging, that params are changing, but the Nash is always the same, which is (-0.117301) as stated in the other comment. I'll let you know if I found other info that could help you find the source of the issue!

huard · 2023-03-17T16:12:08Z

I think I know what's wrong. We need to instantiate the model at each iteration, not update the parameters.

Mayetea · 2023-03-17T16:19:37Z

Shouldnt this be the job of the build function? We do :

self.config.params = list(x)
self.build(self.path / f"c{self._iteration:03}")
self.run()

While we were doing this before :

self.model.config.update("params", np.array(x))
self.model._execute(self.ts)

The only thing I see different is in the execute function where we do self.setup() before calling run().

I'll try adding it in the simulation too, and see what we get after!

huard · 2023-03-17T17:48:11Z

But the config object underneath is completely different.

richardarsenault · 2023-03-26T18:08:54Z

I've been testing and trying to run this new_config, and I have hit a few snags. I've been able to overcome some of them but others are blocking.

1- Gauge.from_nc : It seems it can only handle a single file with all variables merged. However, users that would use the ERA5 extraction tool (or have different files for different variables) would have those separated. Is it possible to send a list of files, as we could before with the ts forcing? Or maybe I just failed and it's already possible? I got errors saying it expects a str or path object, but not a set or tuple, etc.

2- Gauge.from_nc (again): When I take the spatial mean of the ERA5 weather data, I mean over latitude and longitude, meaning that those dimensions disappear. The code is failing because it expects getting a gauge lat/lon. Is this necessary? I think it was not required before? Can we have a system where if there is only 1 station, it just takes a random value (since it will not impact the results)? I can force it in the extras, but it seems to be a redundant step if it's not useful in the actual code.

3- GR4JCN has a parameter defined as AVG_ANNUAL_RUNOFF, but I don't think this is what GR4JCN actually needs. The parameter is G50 (unless it is something else entirely, in which case I am at a loss as to what it is) and represents the median annual snowpack depth.

4- We need to expose more clearly how to run a model from rv / nc files from a folder that a user would upload. Try as I might, I was unsuccessful. I was only able to run models that I had previously built.

5- One notebook (02_Extract_geographical_watershed_properties.ipynb) fails for obscure reasons, to investigate.

I'll add more as I progress!

huard · 2023-03-26T20:50:28Z

You can call Gauge.from_nc multiple times and concatenate the results:
gauges = Gauge.from_nc(fn1, ...) + Gauge.from_nc(fn2, ...)
Lon and lat are necessary in Gauge commands (docs page 195). It was not required because we took the lats and lons from the HRU, but that was a hack. You can pass extra Gauge arguments using the extra keyword: Gauge.from_nc(fn, ..., extra={"latitude": 56})
? You mean there's a bug in the template?
ravenpy.run

… into clean_new_config

for more information, see https://pre-commit.ci

… into clean_new_config

Zeitsperre · 2023-04-13T17:42:59Z

@richardarsenault If you want to maintain a specific set of sections in the imports of a notebook: https://pycqa.github.io/isort/docs/configuration/action_comments.html#isort-split

… into clean_new_config

for more information, see https://pre-commit.ci

…clean_new_config

for more information, see https://pre-commit.ci

…clean_new_config

huard requested a review from richardarsenault March 8, 2023 02:49

Zeitsperre reviewed Mar 16, 2023

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

richardarsenault and others added 5 commits April 13, 2023 13:36

Merge branch 'clean_new_config' of https://github.com/CSHS-CWRA/RavenPy…

5f3eb99

… into clean_new_config

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d336be

for more information, see https://pre-commit.ci

updated index.rst NBs

703fbc9

readd new paper NB...

853b2ce

Merge branch 'clean_new_config' of https://github.com/CSHS-CWRA/RavenPy…

e79a0b4

… into clean_new_config

richardarsenault and others added 22 commits April 13, 2023 13:43

Remove old NBs and files for paper

403d53b

Fix NB02 and added needed dependencies

a6abcc0

Merge branch 'clean_new_config' of https://github.com/CSHS-CWRA/RavenPy…

beed0bc

… into clean_new_config

[pre-commit.ci] auto fixes from pre-commit.com hooks

872695e

for more information, see https://pre-commit.ci

changed input file for NB02

e414be5

[pre-commit.ci] auto fixes from pre-commit.com hooks

f276db8

for more information, see https://pre-commit.ci

pin pandas<2 and owslib<.29

7018990

add more accurate types

e344a1a

small fixes to nb

f9a81c3

Merge branch 'clean_new_config' of github.com:CSHS-CWRA/RavenPy into …

0fede4b

…clean_new_config

fixed link in NB02 and touchup in NB Running CANOPEX

f1168e7

[pre-commit.ci] auto fixes from pre-commit.com hooks

b200230

for more information, see https://pre-commit.ci

fix some sphinx warnings. set fail_on_warning to false

d037e4f

Merge branch 'clean_new_config' of github.com:CSHS-CWRA/RavenPy into …

0649d53

…clean_new_config

Fix bugs introduced by more accurate types.

8dec980

keep land use raster in memory

9bc38a3

change link in nb

782c3a3

fix links in nb

56d45c7

add verbose argument to run to silence Raven warnings

e959753

change advanced nb titles

56bc63b

remove %%capture --no-display

812e8b5

change order of advanced nb

b480728

huard merged commit 64a607d into master Apr 14, 2023

huard deleted the clean_new_config branch April 14, 2023 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean new config #269

Clean new config #269

huard commented Mar 8, 2023 •

edited by Zeitsperre

Loading

huard commented Mar 9, 2023

coveralls commented Mar 9, 2023 •

edited

Loading

huard commented Mar 9, 2023 •

edited

Loading

Zeitsperre commented Mar 9, 2023

richardarsenault commented Mar 9, 2023

Mayetea commented Mar 10, 2023

Zeitsperre commented Mar 10, 2023

huard commented Mar 13, 2023

Mayetea commented Mar 13, 2023

huard commented Mar 13, 2023

huard commented Mar 13, 2023

Mayetea commented Mar 14, 2023

huard commented Mar 15, 2023

Mayetea commented Mar 16, 2023

huard commented Mar 16, 2023

huard commented Mar 16, 2023 •

edited

Loading

Mayetea commented Mar 16, 2023

Mayetea commented Mar 16, 2023

huard commented Mar 16, 2023

huard commented Mar 16, 2023

Mayetea commented Mar 16, 2023

Mayetea commented Mar 17, 2023 •

edited

Loading

Mayetea commented Mar 17, 2023

huard commented Mar 17, 2023

Mayetea commented Mar 17, 2023

huard commented Mar 17, 2023

richardarsenault commented Mar 26, 2023

huard commented Mar 26, 2023

Zeitsperre commented Apr 13, 2023

Clean new config #269

Clean new config #269

Conversation

huard commented Mar 8, 2023 • edited by Zeitsperre Loading

huard commented Mar 9, 2023

coveralls commented Mar 9, 2023 • edited Loading

Pull Request Test Coverage Report for Build 4409663436

💛 - Coveralls

huard commented Mar 9, 2023 • edited Loading

Zeitsperre commented Mar 9, 2023

richardarsenault commented Mar 9, 2023

Mayetea commented Mar 10, 2023

Zeitsperre commented Mar 10, 2023

huard commented Mar 13, 2023

Mayetea commented Mar 13, 2023

huard commented Mar 13, 2023

huard commented Mar 13, 2023

Mayetea commented Mar 14, 2023

huard commented Mar 15, 2023

Mayetea commented Mar 16, 2023

huard commented Mar 16, 2023

huard commented Mar 16, 2023 • edited Loading

Mayetea commented Mar 16, 2023

Mayetea commented Mar 16, 2023

huard commented Mar 16, 2023

huard commented Mar 16, 2023

Mayetea commented Mar 16, 2023

Mayetea commented Mar 17, 2023 • edited Loading

Mayetea commented Mar 17, 2023

huard commented Mar 17, 2023

Mayetea commented Mar 17, 2023

huard commented Mar 17, 2023

richardarsenault commented Mar 26, 2023

huard commented Mar 26, 2023

Zeitsperre commented Apr 13, 2023

huard commented Mar 8, 2023 •

edited by Zeitsperre

Loading

coveralls commented Mar 9, 2023 •

edited

Loading

huard commented Mar 9, 2023 •

edited

Loading

huard commented Mar 16, 2023 •

edited

Loading

Mayetea commented Mar 17, 2023 •

edited

Loading