-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fraction is none #541
Conversation
When fraction is empty, it is ignored in the impact calculation. The shape must still be n_events x n_exp
Do I get this right: after the PR, a hazard with a fraction of |
It would not really be a discontinuity, but a convention. A But yes, one could also make it |
Not from a user's perspective, right, but the maths make perfect sense. Yet.
True. But the minimalistic approach is also a bit hacky in this case, right? Apart from the introduced mathematical discontinuity, the fact that the proposed convention is not completely intuitive (I cannot possibly guess that all zero means all one in a sparse matrix I must know.) bothers me and - more than that - the mashing of control elements within the raw data. I do expect future development in the impact calculation and the hazard implementation. This change will not unlikely cause headaches then. Even though a cleaner approach means more work it bears the hope that it may eventually pay off. |
For reducing the file size, I'd rather suggest to change the hazard's read and write methods and leave the impact calculation as it is. |
Or perhaps try to change |
I must say I do not understand what this means... |
Also here I do not understand what is suggested... |
It might indeed. It will however also lead to bugs on the way. But I can try to change it to |
File size & read write methods: Hazard.get_fraction() |
Thanks for claryfing!!
The point is not only to reduce file size, but to remove redundant information from an object that does not need it. In addition, it is not only file size, but also very much RAM usage that should be reduced, as well as computation time reduced. Thus, it should be handled at the source, and not hacked into the file read/write.
This would not solve the RAM nor the computation time issues. Besides, it would make the object rather inconsistent, since the attribute On the other hand, if |
climada/engine/impact_calc.py
Outdated
if self.hazard.fraction.nonzero()[0].size == 0: | ||
return mdr.multiply(exp_values_csr) | ||
|
||
fract = self.hazard.get_fraction(cent_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.hazard.fraction.nonzero()[0].size == 0: | |
return mdr.multiply(exp_values_csr) | |
fract = self.hazard.get_fraction(cent_idx) | |
fract = self.hazard.get_fraction(cent_idx) | |
if fract is None: | |
return mdr.multiply(exp_values_csr) |
This plus adding this line at the top of get_fraction
:
if self.fraction.nnz == 0: return None
I'd still prefer to leave as much responsibility as possible to the hazard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that works well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just that the hazard.get_fraction
is not the standard getter for the attribute hazard.fraction
. It is to get the fraction for a subset of centroids. Maybe this should be slightly updated too, so that get_fraction()
returns simply fraction
(or None
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good idea. Should be fairly easy too, since the fraction is hardly ever read. I've been looking for occurrences of fraction-readings throughout the code base and apart from this line here I found them only in tests and in hazards for rewriting or plotting.
But that's fine, even to be expected in OO programming, no? See suggested change in impact_calc, line 417ff . |
If I could add my bits to this discussion...
I very much like the idea by @emanuel-schmid of a "get" method for the fraction, because this can handle the fraction logic internally without exposing it to the code using the fraction values. However, a "get" method is a very "C++" thing. Python can actually make a getter/setter function look like a regular property with the Here's how that works: class Hazard:
_fraction = None # NOTE: Now marked private
@property
def fraction(self):
"""Return the fraction or construct one from the intensity if it is not set"""
if _fraction is None:
row_idx, col_idx, _ = scipy.sparse.find(self.intensity)
return scipy.sparse.csr_matrix((np.ones(len(row_idx)), (row_idx, col_idx)), shape=self.intensity.shape)
return self._fraction
# Logic is completely hidden:
hazard = Hazard(...)
fraction = hazard.fraction # NOTE: `self.fraction` can also be called from within the Hazard object Edit: For reference: https://www.freecodecamp.org/news/python-property-decorator/ |
I just discussed my proposal with @chahank and we concluded that it does not solve the issue of reducing RAM usage and memory required to store the hazard object. The solution provided here should be as non-intrusive as possible. We will later update the |
All tests are passing. Should we proceed with this? I.e., accept the changes, update the docstrings and the tutorials, check climada petals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I agree that this is the least intrusive solution, as only very few lines of production code change. Well done! I found two small issues, see the comments.
Co-authored-by: Lukas Riedel <[email protected]>
Thanks! Agree with both comments and implemented the changes. |
I think this PR is mostly ready now, with all tests updated and the tutorials updated too. @emanuel-schmid : what is missing is a clear handling of the test files. I created a new test file based on the previous one. The changes are that the test file is in .hdf5 format instead of .mat, and the fraction is |
Just pitching in: I think it's safest to load test files in test fixtures. This can be done by adding a |
Thanks! This would be one way, but I think requires quite a lot of rewriting. Let me make my question more precise. Should the test data be loaded:
|
Maybe it is best to first close this PR, and then make the change to the Hazard |
Why close? Depending on which PR will be merged first, the other one has to be adapted. I think this one can go first, but the other way round also would not be an issue. |
Note, the pylint message is then disabled for the whole function. Co-authored-by: Lukas Riedel <[email protected]>
@emanuel-schmid : can you please have a final look, in particular for the test file handling? |
👍 checking... |
🎉 |
* Assume fraction is one if None * Allow fraction to be empty When fraction is empty, it is ignored in the impact calculation. The shape must still be n_events x n_exp * Correct shape to argument * Remove unecessary else * Return None in get_fraction * Set default return full fraction * Update test TC for fraction None * Remove test nozero fraction elements * Make get_fraction private to _get_fraction * Correct typo get_fraction -> _get_fraction in test * Add note that fraction is optional in hazard tutorial * Add note that probabilistic does not mean no storyline * Fix equation rendering * Fix linter issues: Single statement per line Co-authored-by: Lukas Riedel <[email protected]> * Set test to assert_array_equal for get_fraction * Update hazard test file to hdf5 without fraction * Update climada/engine/impact_calc.py Note, the pylint message is then disabled for the whole function. Co-authored-by: Lukas Riedel <[email protected]> * Make pylint ignore warning for one line only * Update loading of tc hazard file * Update loading of test hazard file * Fix test file path * test file handling consolidation * hazard.base: get_fraction has become private Co-authored-by: Chahan Kropf <[email protected]> Co-authored-by: Lukas Riedel <[email protected]> Co-authored-by: emanuel-schmid <[email protected]>
Changes proposed in this PR:
hazard.fraction
attribute is by default empty. For the impact calculation, fraction is ignored if empty (this might in some very edge cases lead to inaccurate impacts)hazard.fraction
should however be of the same shape ashazard.intensity
, even when empty. This prevents code using automatic slicing of allsparse.csr_matrix
attributes to fail, while still providing accurate results. For instance,hazard.select
orhazard.check
.The changes are also implemented in the
climada_petals
hazard modules on the branch https://github.com/CLIMADA-project/climada_petals/tree/feature/fraction_is_nonePR Author Checklist
develop
)PR Reviewer Checklist