Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TropCyclone for external use in other modules #749

Merged
merged 42 commits into from
Jul 17, 2023

Conversation

tovogt
Copy link
Collaborator

@tovogt tovogt commented Jul 5, 2023

Changes proposed in this PR:

  • The code for the TC wind computation is refactored with three main points in mind:
    1. The wind computation shouldn't use excessive amounts of memory even if the temporal or spatial resolution is increased.
    2. The code for the wind profiles should be reusable. I will use this in the forthcoming implementation of TCRain in climada_petals (see Implement a new tropical cyclone rain model climada_petals#85).
    3. The wind computation should use SI units everywhere to make the physical equations more transparent. This is in accordance with the new TCRain code (see Implement a new tropical cyclone rain model climada_petals#85).
  • This also replaces all usage of sparse.csr.csr_matrix with sparse.csr_matrix to avoid warning messages.

PR Author Checklist

PR Reviewer Checklist

@bguillod
Copy link
Collaborator

bguillod commented Jul 6, 2023

I very much like the changes. In particular tackling the memory and the physical units points is great and very much needed. Thanks a lot for the hard work @tovogt! From my first quick look it looks good to me, but I've not done a proper review. If you want me to, let me know and I'll see what I can do. If you have enough other people to review then I'm also ok not to dig deeper.

@tovogt tovogt requested a review from ThomasRoosli July 6, 2023 08:13
@tovogt
Copy link
Collaborator Author

tovogt commented Jul 6, 2023

@bguillod Thanks a lot for your feedback, Benoit! @ThomasRoosli will probably have a look, but your review would be very much appreciated if you can afford the time.

Copy link
Collaborator

@ThomasRoosli ThomasRoosli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review part 1, I hope I have more time early next week to go into the details of the compute_windfields-family of functions...

  • Most files just contain the same change of a sparse command. The main changes in the PR are in trop_cyclone.py and test_trop_cyclone.py.
  • In test_trop_cyclone.py a lot of the previous test (especially test_windfield_models remain unchanged, which ensures that the refactoring did not introduce a different behaviour.
  • I advise to work on the docstrings of the functions that now accept a single track instead of several items). (see specific comment in the code).
  • The main refactoring happens within the method from_single_track and its substructure
  • I find the names compute_windfield, _compute_windfields_sparse, _compute_windfields_sparse_chunked and _compute_windfields_chunked confusingly similar. Especially since the they seems to call each other in a circular fashion. Maybe I have a suggestion after a more thorough review.
  • As a general comment: I wonder if it would make sense to introduce a config entry for max_memory_gb so that the memory usage could be configured without changing code.
  • As an even more general comment: I wonder if we should introduce a "peak_max_memory_gb" configuration value in the climada config, so that max_memory_gb could be derived from that and also other memory intensive functions could be configured the same way. This would of course only trigger an additional issue for now, and does not need to be implemented.

climada/hazard/trop_cyclone.py Show resolved Hide resolved
climada/hazard/test/test_trop_cyclone.py Show resolved Hide resolved
@tovogt
Copy link
Collaborator Author

tovogt commented Jul 10, 2023

Thanks a lot for your feedback, @ThomasRoosli !

  • I advise to work on the docstrings of the functions that now accept a single track instead of several items). (see specific comment in the code).

Good point! I modified the docstrings accordingly.

  • I find the names compute_windfield, _compute_windfields_sparse, _compute_windfields_sparse_chunked and _compute_windfields_chunked confusingly similar. Especially since the they seems to call each other in a circular fashion. Maybe I have a suggestion after a more thorough review.

This comment made me reconsider the structure. Now I'm back to only three functions: _compute_windfields, _compute_windfields_sparse, and _compute_windfields_sparse_chunked. This should be considerably less complicated. Still, here is a little explanation:

  • _compute_windfields_sparse is a wrapper for _compute_windfields that does the memory management. It pre-filters the centroids for a single track to those centroids that are potentially affected by the wind field. It also converts the result of _compute_windfields to sparse arrays.
  • _compute_windfields_sparse_chunked is a helper function that is called if _compute_windfields_sparse finds that the available memory is not large enough to compute everything in one call to _compute_windfields. In that case, _compute_windfields_sparse_chunked decides where each chunk starts and ends, calls _compute_windfields_sparse for each chunk. and recombines the results.

I think that the names are fine, but if you have a different suggestion, I'm happy to discuss this.

  • As a general comment: I wonder if it would make sense to introduce a config entry for max_memory_gb so that the memory usage could be configured without changing code.

In my case, max_memory_gb shouldn't be a global config parameter. Since I run my stuff on a high-performance cluster with job management, I do not always allocate the same number of threads and/or memory to my code. There might be projects where I think that many threads with less memory are fine, or there might be projects where I only use a single thread with a lot of memory. So, if anything, the default value of max_memory_gb should be a config parameter, in my opinion.

Copy link
Collaborator

@ThomasRoosli ThomasRoosli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tovogt for the adapted changes.

  • The restructuring of the _compute_windfields... family (and your explanation) helped me alot. Also with the new changes the loop into _compute_windfields_sparse_chunked is only done once (and it is more directly visible that it does so), which helps the overall understandability alot.
  • The doc-strings of functions changeing the track-Dataset now contain very useful information about the elements used and added.
  • I am happy with leaving the max_memory_gb as a parameter to the function without linking it to a global configuration variable
  • In terms of code quality, understandability and tests and pylint I have nothing else to add.
    Thanks for this contribution.

climada/hazard/trop_cyclone.py Show resolved Hide resolved
@tovogt
Copy link
Collaborator Author

tovogt commented Jul 12, 2023

Thanks for your review, @ThomasRoosli !

@emanuel-schmid The failing Jenkins "Post Actions" are nothing to worry about, right? It seems to me like they are not related to this PR.

@tovogt
Copy link
Collaborator Author

tovogt commented Jul 14, 2023

For testing, I computed all IBTrACS tracks for 1950-2021, at 300 arc-seconds and 0.5-hourly resolution, and with limited memory. I didn't truncate for high latitudes (the default setting for the wind field computations is truncation at 60° latitude). The chunking worked fine, except from Hurricane FAITH (1966):
image
This hurricane is quite a beast, traveling to more than 80 degrees latitude, so that I had to adapt the centroid pre-filtering step again in a3d0f0e.

Note, however, that most people won't do the TC wind field computation north of 60° latitude because that is usually after extratropical transition anyway. And most "reachable" centroids come from that part of the track since the km-distance between centroids becomes very small close to the pole when using lat/lon grids.

@tovogt tovogt merged commit 4efcb66 into develop Jul 17, 2023
@tovogt tovogt deleted the feature/tc_winds_chunked branch July 20, 2023 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants