Add module for computing river flood footprints from GloFAS river discharge data #64

peanutfun · 2023-01-05T13:22:44Z

The module includes a data pipeline which automatically downloads GloFAS river discharge data and transforms it to flood footprints, which in turn can be transfered toHazard or RiverFlood objects.

The module introduces a few new dependencies. They have to be installed into the Jenkins test environment for the builds to succeed:

dantro for the data pipeline
cdsapi for downloading data from the Copernicus Data Store
ruamel.yaml for reading YAML files. This one is already a dependency of dantro.

Major changes in CLIMADA Petals:

Add the climada_petals.hazard.rf_glofas subpackage
Add climada_petals.util.cds_glofas_downloader utility functions
Add module documentation

To do:

Tutorial
FLOPROS database integration
Some more tests
Fix linter issues

Use cdsapi to download GloFAS data from Copernicus Data Store. So far, the actual download is not tested. * Add util functions for downloading data. * Add unit tests for request handling.

* Update operations to fix issues found when writing the tests. * Add unit test case for dantro operations. * Tweak CDS GloFAS downloader.

* Add option to set countries instead of lat/lon limits when downloading GloFAS data. * Return pandas Series of Hazards with multi index. * Use discharge dataset for lat/lon slicing of all other datasets. * Add unit tests.

Downloads will be skipped if the target file exists with the same request dict. * Place the request as YAML file next to the target file for request comparison. * Add option to control using the "cached" results or always downloading the data. * Update unit tests. * Explicitly list ruamel.yaml as requirement (already required by dantro).

NOTE: Commented code would be an alternative to define the select dimension based on values instead of indices. * Add operation * Add test case for operation

* Update unit tests accordingly. * Add core dimension checks to flood depth unit tests.

* Add operations and config for computing the GEV fits and merging flood maps, which are both used for computing a flood footprint. * Update affected operations and configs. * Remove GloFASRiverFlood class in favor of two functions. * Update tests

* Move respective files into their own subdirectory. * Adapt configuration files to latest dantro version. * Add 'transform_ops.py' containing only dantro transformations. * Expose user functions via dedicated __init__.py * Add option to run tasks in parallel

…m function

Used for reading GeoTIFF with xarray.

peanutfun · 2023-01-16T10:31:55Z

@emanuel-schmid Could you have a look at why the new dependencies are not found in the checks?

Co-authored-by: Thomas Vogt <[email protected]>

peanutfun · 2023-12-08T10:33:49Z

@tovogt Thanks again for the thorough review! To comment on your overall thoughts:

It's really unfortunate that data needs to be written uncompressed first due to performance issues.

Yes, but this is what I came up with after months of using the module. I don't think I can do much better without further help. In my experience, using zlib is indeed horribly slow with dask, and it also does not properly support multi-process writing to a single file. You suggest calling compute first and then saving the data. However, this requires the entire data to fit into memory in the first place. Depending on the country and time frame you want to compute flood footprints for, this is not feasible given the usual memory space of a personal computer.

However, my module actually gives users the option to optimize this themselves. Setting store_intermediates to False and executing each step of RiverFloodComputation.compute themselves, they are free to call compute and store data as they see fit. I tried to find default settings that work no-matter-what. The main restriction now is drive space. You might end up writing hundreds of GB, but it actually is quite performant (with a modern SSD)

In the tutorial notebook, there are some occurrences of py:func which are not translated properly by Sphinx.

The Myst parser should support these types of references, see https://myst-parser.readthedocs.io/en/latest/syntax/cross-referencing.html#reference-roles

Now that I look at the doc/conf.py, we might actually still use the old nbsphinx parser for reading the notebooks. I'll try to fix that.

tovogt · 2023-12-11T11:58:01Z

Yes, but this is what I came up with after months of using the module. I don't think I can do much better without further help. In my experience, using zlib is indeed horribly slow with dask, and it also does not properly support multi-process writing to a single file. You suggest calling compute first and then saving the data. However, this requires the entire data to fit into memory in the first place. Depending on the country and time frame you want to compute flood footprints for, this is not feasible given the usual memory space of a personal computer.

I think it's desirable to require each individual NetCDF file to contain at most as much data as could potentially fit into (a reasonable amount of) memory. Ideally, I would even propose to have at most 4 GB of (uncompressed) data per NetCDF file. With NetCDFs, it is very easy to just split up data into several files, and then load the dataset as a multi-file-dataset, e.g. using xr.open_mfdataset. If you adhere to that, you can very well call compute for each chunk of data that's supposed to end up in an individual file. It's much faster, the data is very easy to handle, and there are practically no disadvantages. Monolithic, long-running processes that produce monolithic chunks of data, are extremely inconvenient in almost every environment. For all projects I work with, I try to split up everything into a high number of slim processes with short run times that produce comparably small chunks of data each - and it's much more convenient under almost all circumstances I can think of.

However, my module actually gives users the option to optimize this themselves. Setting store_intermediates to False and executing each step of RiverFloodComputation.compute themselves, they are free to call compute and store data as they see fit. I tried to find default settings that work no-matter-what. The main restriction now is drive space. You might end up writing hundreds of GB, but it actually is quite performant (with a modern SSD)

As I said, this is not a merge-blocker from my side, and I'm happy to go with this solution.

climada_petals/hazard/rf_glofas/test/test_transform_ops.py

climada_petals/hazard/rf_glofas/transform_ops.py

climada_petals/hazard/rf_glofas/test/test_transform_ops.py

Installing xesmf would require to reload the environment, which does not happen online.

tovogt · 2023-12-12T10:04:11Z

You are using xesmf to regrid a raster with bilinear interpolation. Why don't you use rasterio.warp.reproject for that? It would avoid introducing a new dependency, it's really very powerful, and it's already used in several other places in CLIMADA.

peanutfun · 2023-12-13T11:07:00Z

Why don't you use rasterio.warp.reproject for that?

Simply because I am not familiar with it. I first used the xarray-internal interpolate, which is horribly slow and does not take geospatial information into account. So I switched to xesmf because it was recommended to me and is simple to use for xarray data structures. But it is also difficult to install, and poses an issue on Euler. So I would be very happy about a suggestion how to drop it and switch to another implementation, given that it is not much slower.

tovogt · 2023-12-13T11:36:57Z

Okay, after a closer look, I think you won't be able to have anything similar to nearest_s2d extrapolation in rasterio or similarly basic packages.

This avoids overwriting data downloaded for the same day (forecast) or year (reanalysis/historical).

Co-authored-by: Thomas Vogt <[email protected]>

climada_petals/hazard/rf_glofas/cds_glofas_downloader.py

…DA-project/climada_petals into feature/glofas-river-flood

…er-flood # Conflicts: # requirements/env_climada.yml # setup.py

tovogt

Thanks @ThomasRoosli for the final cleanup. This is ready to be merged from my side.

peanutfun and others added 22 commits October 18, 2022 15:22

Add util functions for downloading GloFAS data

b589195

Use cdsapi to download GloFAS data from Copernicus Data Store. So far, the actual download is not tested. * Add util functions for downloading data. * Add unit tests for request handling.

Add 'cdsapi' to requirements

1b34b56

Allow date_to=None to download only a single file

e6487d2

[draft] Add class for processing GloFAS river flood

fe76300

Working on glofas flood stuff [revise!]

334bb8d

Add tests for dantro operations

7c6f018

* Update operations to fix issues found when writing the tests. * Add unit test case for dantro operations. * Tweak CDS GloFAS downloader.

Add dantro to requirements

789cf86

Merge branch 'develop' into feature/glofas-river-flood

1ca3119

Update GloFAS river flood pipeline

f93becc

* Add option to set countries instead of lat/lon limits when downloading GloFAS data. * Return pandas Series of Hazards with multi index. * Use discharge dataset for lat/lon slicing of all other datasets. * Add unit tests.

Fix an issue where ruamel.yaml could not dump numbers

71349aa

Add 'max_from_isel' operation

ff761f5

NOTE: Commented code would be an alternative to define the select dimension based on values instead of indices. * Add operation * Add test case for operation

Handle NaNs in flood depth interpolation

fb02700

* Update unit tests accordingly. * Add core dimension checks to flood depth unit tests.

Rename 'test_glofas_rf.py' to 'rest_rf_glofas.py'

4e8a340

Rename 'rf_glofas_util.yml' to 'setup.yml' and expose dantro_transfor…

32b06c7

…m function

Add rioxarray as dependency

c16b1d0

Used for reading GeoTIFF with xarray.

Update test_rf_glofas.py imports to new module structure

464b8e4

Rework user experience of rf_glofas module and write full documentation

e093d58

Fix formatting in glofas_rf docs

16b33a1

Fix type hints and data manager container type

31004b3

peanutfun marked this pull request as draft January 5, 2023 16:07

peanutfun added 2 commits January 6, 2023 17:03

Add operation for including FLOPROS database

2e170c5

Add tutorial for GloFAS river flood module

baf7fca

peanutfun linked an issue Jan 9, 2023 that may be closed by this pull request

Add module for computing river flood hazards from GloFAS discharge data #60

Closed

Lukas Riedel and others added 2 commits January 10, 2023 17:29

WIP: Add bootstrap resampling to return period computation

3465179

Fix glofas_rf unit tests, formatting, and docstrings

5352b37

Fix return period sampling and update tests

b0781b7

peanutfun and others added 3 commits December 8, 2023 11:04

Update climada_petals/hazard/rf_glofas/transform_ops.py

9a69d76

Co-authored-by: Thomas Vogt <[email protected]>

Merge branch 'develop' into feature/glofas-river-flood

2c8c57e

Only use dimensions with time information for events

d2b7320

peanutfun added 2 commits December 8, 2023 12:08

Render docs with 'myst_nb'

543c228

Add myst-nb parser to doc requirements

f63e410

tovogt reviewed Dec 11, 2023

View reviewed changes

climada_petals/hazard/rf_glofas/test/test_transform_ops.py Outdated Show resolved Hide resolved

tovogt reviewed Dec 11, 2023

View reviewed changes

climada_petals/hazard/rf_glofas/transform_ops.py Outdated Show resolved Hide resolved

tovogt reviewed Dec 11, 2023

View reviewed changes

climada_petals/hazard/rf_glofas/test/test_transform_ops.py Outdated Show resolved Hide resolved

Mock xesmf in sphinx build

d5209d3

Installing xesmf would require to reload the environment, which does not happen online.

peanutfun mentioned this pull request Jan 15, 2024

Allow reading Hazard events that are not dates from xarray CLIMADA-project/climada_python#837

Merged

13 tasks

peanutfun and others added 4 commits January 16, 2024 12:21

Avoid squeezing of 'time' dimension when opening discharge data

20256cd

Use hashed download paths for CDS data

58a3787

This avoids overwriting data downloaded for the same day (forecast) or year (reanalysis/historical).

Update tutorial

b5c2cca

Apply suggestions from code review

c5cf1b7

Co-authored-by: Thomas Vogt <[email protected]>

peanutfun commented Feb 23, 2024

View reviewed changes

climada_petals/hazard/rf_glofas/cds_glofas_downloader.py Show resolved Hide resolved

Thomas Roosli added 6 commits February 29, 2024 14:09

update glofas tutorial after code review

cbe20f0

Merge branch 'feature/glofas-river-flood' of https://github.com/CLIMA…

b4f2b1f

…DA-project/climada_petals into feature/glofas-river-flood

Merge remote-tracking branch 'origin/develop' into feature/glofas-riv…

bbc039d

…er-flood # Conflicts: # requirements/env_climada.yml # setup.py

update glofas tutorial to make selection in multiindex more stable

477e667

fix pylint

0aba137

Update docstrings

cbb3082

tovogt approved these changes Mar 4, 2024

View reviewed changes

ThomasRoosli merged commit c118b6c into develop Mar 5, 2024
4 checks passed

emanuel-schmid deleted the feature/glofas-river-flood branch March 6, 2024 08:25

peanutfun mentioned this pull request Apr 17, 2024

Add module for computing river flood hazards from GloFAS discharge data #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add module for computing river flood footprints from GloFAS river discharge data #64

Add module for computing river flood footprints from GloFAS river discharge data #64

peanutfun commented Jan 5, 2023 •

edited by ThomasRoosli

Loading

peanutfun commented Jan 16, 2023

peanutfun commented Dec 8, 2023 •

edited

Loading

tovogt commented Dec 11, 2023 •

edited

Loading

tovogt commented Dec 12, 2023

peanutfun commented Dec 13, 2023

tovogt commented Dec 13, 2023

tovogt left a comment

Add module for computing river flood footprints from GloFAS river discharge data #64

Add module for computing river flood footprints from GloFAS river discharge data #64

Conversation

peanutfun commented Jan 5, 2023 • edited by ThomasRoosli Loading

peanutfun commented Jan 16, 2023

peanutfun commented Dec 8, 2023 • edited Loading

tovogt commented Dec 11, 2023 • edited Loading

tovogt commented Dec 12, 2023

peanutfun commented Dec 13, 2023

tovogt commented Dec 13, 2023

tovogt left a comment

Choose a reason for hiding this comment

peanutfun commented Jan 5, 2023 •

edited by ThomasRoosli

Loading

peanutfun commented Dec 8, 2023 •

edited

Loading

tovogt commented Dec 11, 2023 •

edited

Loading