Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean new config #269

Merged
merged 341 commits into from
Apr 14, 2023
Merged

Clean new config #269

merged 341 commits into from
Apr 14, 2023

Conversation

huard
Copy link
Collaborator

@huard huard commented Mar 8, 2023

This is a big refactor of the way models are configured and run.
This PR does not remove any of the original code, but creates a new_config directory with the new architecture.

It also drastically simplifies the user interface, see ravenpy/ravenpy.py for the two main functions users will use:

  • run to run the model on an existing configuration
  • parse to read the outputs and create Python objects

See tests.conftest.gr4jcn_config for an example of the configuration with the refactor:

m = GR4JCN(
        params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
        Gauge=rc.Gauge.from_nc(
            f,
            data_type=["PRECIP", "TEMP_MIN", "TEMP_MAX"],
            alt_names=alt_names,
            extra={1: {"elevation": salmon_hru["land"]["elevation"]}},
        ),
        ObservationData=rc.ObservationData.from_nc(f, alt_names="qobs"),
        HRUs=[salmon_hru["land"]],
        StartDate=dt.datetime(2000, 1, 1),
        EndDate=dt.datetime(2002, 1, 1),
        RunName="test",
        CustomOutput=rc.CustomOutput("YEARLY", "AVERAGE", "PRECIP", "ENTIRE_WATERSHED"),
        GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
    )

Writing the config files to disk is then done with m.write(path).

The configuration supports symbolic expressions, e.g.

uniform_initial_conditions: Dict[str, float] = Field(
        {"SOIL[0]": P.GR4J_X1 * 1000 / 2, "SOIL[1]": 15},
        alias="UniformInitialConditions",
    )

I think the objective now would be to pinpoint non-intuitive stuff in the new configuration and fix those. Once we're happy, let's merge this PR and then create others adding the other emulators and functionalities we want to port.

Ping @Mayetea

This PR fixes #272

@huard
Copy link
Collaborator Author

huard commented Mar 9, 2023

The failing tests are due to a mistake on my end. I did an automated rename of something and I thought it applied only to a file, when it applied to the whole repo... will fix this.

@coveralls
Copy link

coveralls commented Mar 9, 2023

Pull Request Test Coverage Report for Build 4409663436

  • 1084 of 1155 (93.85%) changed or added relevant lines in 11 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+1.9%) to 86.204%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ravenpy/new_config/base.py 127 129 98.45%
ravenpy/utilities/calibration.py 46 48 95.83%
ravenpy/new_config/utils.py 52 58 89.66%
ravenpy/ravenpy.py 94 102 92.16%
ravenpy/new_config/rvs.py 139 150 92.67%
ravenpy/new_config/commands.py 363 405 89.63%
Files with Coverage Reduction New Missed Lines %
ravenpy/models/base.py 2 93.3%
Totals Coverage Status
Change from base Build 4358668933: 1.9%
Covered Lines: 4830
Relevant Lines: 5603

💛 - Coveralls

@huard
Copy link
Collaborator Author

huard commented Mar 9, 2023

So a few questions to guide the review:

  • Should ravenpy.run return something. At the moment, it will raise errors if something's wrong, otherwise if the simulation is complete, it returns None.
  • In ravenpy.__init__, I'm exposing run and parse, but should other parsers be made available at the top level (e.g. parse_diagnostics)?
  • Should we rename run and parse to something else ? run could also be raven.
  • In run, the file name is called identifier, while in parse its run_name. This should probably uniformized. By default, the configuration uses run_name for the rv configuration files. I could replace identifier by run_name.

@Zeitsperre
Copy link
Member

  • Should ravenpy.run return something. At the moment, it will raise errors if something's wrong, otherwise if the simulation is complete, it returns None.

If we do not want to raise errors, could we return an object that describes whether the simulation failed (and how) ? This would be good information to pass along via WPS process / logging.

  • Should we rename run and parse to something else ? run could also be raven.

I like run; less confusing when we already have ravenpy and ravenwps.

@richardarsenault
Copy link
Collaborator

Good questions.

  1. Maybe it should return the standard "0"? I'm not a good enough developer to provide more in-depth comments on this.
  2. Ideally we should have a method like "build" where we build_model the model before running it, maybe set_parameters, etc.? That would allow accessing everything before we actually launch a run. Ideal for calibration and assimilation, for example.
  3. I suggest keeping run and parse. Calling "ravenpy.raven" seems odd to me.
  4. I agree to uniformize to run_name.

Trying to setup my env to help evaluate, debug and contribute to this!

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 10, 2023

I agree with Trevor and Richard comment on the way we should structure the code. Also, shouldn't we removed the part of Ostrich in base.py? I never installed the binary for it since we were suppose to rip it. So right now, if I'm not commenting the Ostrich code, 60 tests fails and if I comment I still get errors for Ostrich unit tests (16 failed tests) and 2 others which could be related but doesnt give Ostrich errors.

So what do we do with this? Do ,we include the removal in this PR or in another PR?

@Zeitsperre
Copy link
Member

Also, shouldn't we remove the part of Ostrich in base.py?
...
So what do we do with this? Do we include the removal in this PR or in another PR?

If we can get those changes underway, that removes one of the blockers for #264 (the other is being worked on currently).

@huard
Copy link
Collaborator Author

huard commented Mar 13, 2023

This PR is only about the new config, so yes, Ostrich is still there. I'll open a new PR to plug SPOTPY to the new config once this one is merged, and then we'll be able to remove Ostrich.

The "build" step is shown in the PR description, configure the model m=GR4JCN(...), then write it to disk m.write(path).

We can return True if the simulation has completed, or the path of the output directory.

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 13, 2023

I think what Richard meant is that we should have a dedicated class to do the raven job something like this :

model = GR4JCN(...)
raven_runner = raven.build(model, path)
raven_runner.run()
raven_runner.write() #Writes the results in a file.

It might be impossible to refactor this way now, but I think its the way he was talking about.

@huard
Copy link
Collaborator Author

huard commented Mar 13, 2023

That's very easy, I can whip something up for discussions.

@huard
Copy link
Collaborator Author

huard commented Mar 13, 2023

See the latest commit for a new Emulator class with methods:

  • build
  • run
  • parse

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 14, 2023

I've tested the new push from yesterday and the only errors I have (non-related to Ostrich) is in test_read_from_netcdf and test_open_dataset_false_cache.

In test_read_from_netcdf, we assert to have a message different than None, but we receive None.

In test_open_dataset_false_cache, I get an error for Permission denied on accessing a file.

@huard
Copy link
Collaborator Author

huard commented Mar 15, 2023

Weird, no clear idea where this is coming from. Bugs notwithstanding, what I'm interested in is feedback on the API: the naming of functions and arguments, the workflow, etc.

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 16, 2023

I'm currently trying to use the new_config in the spotpy test. I have 2 comments so far.

First, I think the import of the model should be as claer as before : from ravenpy.models import GR4JCN and not from ravenpy.new_config.emulators.gr4jcn import GR4JCN. So we could import all models from .emulators or choose to import a specific by naming it (as for GR4JCN)

Second, I dont understand why in the Emulator.build() we have overwrite as True for a default value, but in Config.build() the overwrite is set as False by default.

I'll let you know if I have more comment as soon as I make it work in with spotpy

@huard
Copy link
Collaborator Author

huard commented Mar 16, 2023

Agree with your first point, but this poses the problem of how to deal with the old and new versions. Maybe when this PR is completed we wipe out the old versions and new_config becomes config.

Good catch, changed overwrite to False in Emulator.build.

@huard
Copy link
Collaborator Author

huard commented Mar 16, 2023

test_spotpy_calibration should now run the calibration for gr4j. To run the other models that have a new config, we only need to add the low and high bounds.
The calibration test with the new config should probably be moved to tests/new_config

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 16, 2023

Ah yes, I understand the issue. I'll try spotpy with your new push. Ill let you know, how it goes!

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 16, 2023

I dont see any file enforcing in the new config. Is it the "path" params in the Emulator constructor?

@huard
Copy link
Collaborator Author

huard commented Mar 16, 2023

Check tests/new_config/emulators

There is a pytest fixture called emulator that will instantiate all the emulators in names.
Then this is used in tests/new_config/test_emulators to test all emulators with one single test, instead of writing one test per emulator.

@huard
Copy link
Collaborator Author

huard commented Mar 16, 2023

Maybe I misunderstood the question, but calibration.py has a SpotSetup class using the new config.
It's exercised in test_spotpy_calibration, using the same emulator fixture.

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 16, 2023

I was talking about the netCDF that we shoot for the training, but I think I found how you were passing it with the fixtures. Ill try to make it work without fixtures, but it seems good. It was a bad interpretation on my end!

tests/conftest.py Outdated Show resolved Hide resolved
@Mayetea
Copy link
Collaborator

Mayetea commented Mar 17, 2023

I've managed to make, spotpy work without fixture, but it doesnt do a rerun of raven for each evaluation. It keeps shooting the same nash -0.117301, so it doesnt try to balance the params. Im looking for the issue right now.

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 17, 2023

So I found out in debugging, that params are changing, but the Nash is always the same, which is (-0.117301) as stated in the other comment. I'll let you know if I found other info that could help you find the source of the issue!

@huard
Copy link
Collaborator Author

huard commented Mar 17, 2023

I think I know what's wrong. We need to instantiate the model at each iteration, not update the parameters.

@Mayetea
Copy link
Collaborator

Mayetea commented Mar 17, 2023

Shouldnt this be the job of the build function? We do :

self.config.params = list(x)
self.build(self.path / f"c{self._iteration:03}")
self.run()

While we were doing this before :

self.model.config.update("params", np.array(x))
self.model._execute(self.ts)

The only thing I see different is in the execute function where we do self.setup() before calling run().

I'll try adding it in the simulation too, and see what we get after!

@huard
Copy link
Collaborator Author

huard commented Mar 17, 2023

But the config object underneath is completely different.

@richardarsenault
Copy link
Collaborator

I've been testing and trying to run this new_config, and I have hit a few snags. I've been able to overcome some of them but others are blocking.

1- Gauge.from_nc : It seems it can only handle a single file with all variables merged. However, users that would use the ERA5 extraction tool (or have different files for different variables) would have those separated. Is it possible to send a list of files, as we could before with the ts forcing? Or maybe I just failed and it's already possible? I got errors saying it expects a str or path object, but not a set or tuple, etc.

2- Gauge.from_nc (again): When I take the spatial mean of the ERA5 weather data, I mean over latitude and longitude, meaning that those dimensions disappear. The code is failing because it expects getting a gauge lat/lon. Is this necessary? I think it was not required before? Can we have a system where if there is only 1 station, it just takes a random value (since it will not impact the results)? I can force it in the extras, but it seems to be a redundant step if it's not useful in the actual code.

3- GR4JCN has a parameter defined as AVG_ANNUAL_RUNOFF, but I don't think this is what GR4JCN actually needs. The parameter is G50 (unless it is something else entirely, in which case I am at a loss as to what it is) and represents the median annual snowpack depth.

4- We need to expose more clearly how to run a model from rv / nc files from a folder that a user would upload. Try as I might, I was unsuccessful. I was only able to run models that I had previously built.

5- One notebook (02_Extract_geographical_watershed_properties.ipynb) fails for obscure reasons, to investigate.

I'll add more as I progress!

@huard
Copy link
Collaborator Author

huard commented Mar 26, 2023

  1. You can call Gauge.from_nc multiple times and concatenate the results:
    gauges = Gauge.from_nc(fn1, ...) + Gauge.from_nc(fn2, ...)

  2. Lon and lat are necessary in Gauge commands (docs page 195). It was not required because we took the lats and lons from the HRU, but that was a hack. You can pass extra Gauge arguments using the extra keyword: Gauge.from_nc(fn, ..., extra={"latitude": 56})

  3. ? You mean there's a bug in the template?

  4. ravenpy.run

@Zeitsperre
Copy link
Member

@richardarsenault If you want to maintain a specific set of sections in the imports of a notebook: https://pycqa.github.io/isort/docs/configuration/action_comments.html#isort-split

@huard huard merged commit 64a607d into master Apr 14, 2023
@huard huard deleted the clean_new_config branch April 14, 2023 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing packages to run notebooks using fresh RavenPy install
5 participants