Feature/fraction is none #541

chahank · 2022-09-30T15:30:44Z

Changes proposed in this PR:

The hazard.fraction attribute is by default empty. For the impact calculation, fraction is ignored if empty (this might in some very edge cases lead to inaccurate impacts)
The hazard.fraction should however be of the same shape as hazard.intensity, even when empty. This prevents code using automatic slicing of all sparse.csr_matrix attributes to fail, while still providing accurate results. For instance, hazard.select or hazard.check.
Ignoring fraction when it is anyway not needed allows to divide by a factor 2 the size of files, and speeds-up the impact computations.
Not all tests have been updated yet, waiting for confirmation that this is a wanted change. Thus, some unit tests might fail currently.
The docstrings and the documentation have not yet been updated.
If this PR is eventually accepted, the data on the API should be adapted accordingly in order to significantly reduce the dataset sizes.

The changes are also implemented in the climada_petals hazard modules on the branch https://github.com/CLIMADA-project/climada_petals/tree/feature/fraction_is_none

PR Author Checklist

Read the Contribution Guide
Correct target branch selected (if unsure, select develop)
Source branch up-to-date with target branch
Documentation updated
Tests updated
Tests passing
No new linter issues

PR Reviewer Checklist

When fraction is empty, it is ignored in the impact calculation. The shape must still be n_events x n_exp

emanuel-schmid · 2022-10-03T19:06:55Z

Do I get this right: after the PR, a hazard with a fraction of [0] would yield the exact same impact as one with a fraction of [1]?
In other words we would introduce a discontinuity?
Hm... Somehow this doesn't feel quite right. Couldn't we make the whole fraction None instead?

chahank · 2022-10-04T07:33:24Z

Do I get this right: after the PR, a hazard with a fraction of [0] would yield the exact same impact as one with a fraction of [1]?
In other words we would introduce a discontinuity?
Hm... Somehow this doesn't feel quite right. Couldn't we make the whole fraction None instead?

It would not really be a discontinuity, but a convention. A Hazard with fraction zero everywhere does not make much sense, as it would mean that basically, it is affecting 0% of the area...

But yes, one could also make it None. This would however require a lot more rewriting of code. This was just a minimalistic proposal.

emanuel-schmid · 2022-10-05T06:17:49Z

A Hazard with fraction zero everywhere does not make much sense,

Not from a user's perspective, right, but the maths make perfect sense. Yet.

This would however require a lot more rewriting of code. This was just a minimalistic proposal.

True. But the minimalistic approach is also a bit hacky in this case, right?

Apart from the introduced mathematical discontinuity, the fact that the proposed convention is not completely intuitive (I cannot possibly guess that all zero means all one in a sparse matrix I must know.) bothers me and - more than that - the mashing of control elements within the raw data.

I do expect future development in the impact calculation and the hazard implementation. This change will not unlikely cause headaches then. Even though a cleaner approach means more work it bears the hope that it may eventually pay off.

emanuel-schmid · 2022-10-05T06:25:56Z

For reducing the file size, I'd rather suggest to change the hazard's read and write methods and leave the impact calculation as it is.

emanuel-schmid · 2022-10-05T06:30:07Z

Or perhaps try to change Hazard.get_fraction() to achieve the goal... ?

chahank · 2022-10-05T09:05:25Z

For reducing the file size, I'd rather suggest to change the hazard's read and write methods and leave the impact calculation as it is.

I must say I do not understand what this means...

chahank · 2022-10-05T09:07:09Z

Or perhaps try to change Hazard.get_fraction() to achieve the goal... ?

Also here I do not understand what is suggested...

chahank · 2022-10-05T09:09:52Z

A Hazard with fraction zero everywhere does not make much sense,

Not from a user's perspective, right, but the maths make perfect sense. Yet.

This would however require a lot more rewriting of code. This was just a minimalistic proposal.

True. But the minimalistic approach is also a bit hacky in this case, right?

Apart from the introduced mathematical discontinuity, the fact that the proposed convention is not completely intuitive (I cannot possibly guess that all zero means all one in a sparse matrix I must know.) bothers me and - more than that - the mashing of control elements within the raw data.

I do expect future development in the impact calculation and the hazard implementation. This change will not unlikely cause headaches then. Even though a cleaner approach means more work it bears the hope that it may eventually pay off.

It might indeed. It will however also lead to bugs on the way. But I can try to change it to None, although it will take quite some time and some review ;).

emanuel-schmid · 2022-10-06T09:37:53Z

File size & read write methods:
What I wanted to say is that for reducing file size alone, the impact calculation should not be changed. The Hazard <-> File interactions should not influence the interactions between a Hazard object and any other object.
It's maybe tedious but certainly possible to change write_raster, write_hdf5, read_raster and read_hdf5 while keeping all Hazard objects as they are and only change their respective files.

Hazard.get_fraction()
That is an alternative I was thinking about: instead of treating 0 as 1 during impact calculation, one could transform undefined into 1 within get_fraction, by creating a sparse matrix on the fly that contains 1s for all intensity values where fraction is undefined. And even None in case no fraction is defined at all. (If it's None - skip the multiplication step.)

chahank · 2022-10-06T10:38:01Z

Thanks for claryfing!!

File size & read write methods: What I wanted to say is that for reducing file size alone, the impact calculation should not be changed. The Hazard <-> File interactions should not influence the interactions between a Hazard object and any other object. It's maybe tedious but certainly possible to change write_raster, write_hdf5, read_raster and read_hdf5 while keeping all Hazard objects as they are and only change their respective files.

The point is not only to reduce file size, but to remove redundant information from an object that does not need it. In addition, it is not only file size, but also very much RAM usage that should be reduced, as well as computation time reduced. Thus, it should be handled at the source, and not hacked into the file read/write.

Hazard.get_fraction() That is an alternative I was thinking about: instead of treating 0 as 1 during impact calculation, one could transform undefined into 1 within get_fraction, by creating a sparse matrix on the fly that contains 1s for all intensity values where fraction is undefined. And even None in case no fraction is defined at all. (If it's None - skip the multiplication step.)

This would not solve the RAM nor the computation time issues. Besides, it would make the object rather inconsistent, since the attribute fraction would be different from what is obtained from get_fraction. The fraction cannot have undefined values as it is a scipy.sparse.csr_matrix object.

On the other hand, if fraction is None, it can be skipped in the multiplication step and it does resolve the memory issues. The idea with using an empty scipy matrix instead was just that it is much much easier to implement. It does not treat '0' as '1'. It would just a convention for the None identifier. It makes no sense to define a Hazard object with fraction being zero everywhere. Thus, having fraction 0 as a surrogate for None is not a big problem in principle. But, if you prefer, we can go for the None implementation. It will just require a lot of if None checks in the read/write methods, the plotting methods, the checking methods, the concatenation/splitting methods. I do not mind, as I said, the idea with using fraction 0 instead makes things much easier, but is not a must.

climada/engine/impact_calc.py

emanuel-schmid · 2022-10-06T12:21:53Z

climada/engine/impact_calc.py

+        if self.hazard.fraction.nonzero()[0].size == 0:
+            return mdr.multiply(exp_values_csr)
+
+        fract = self.hazard.get_fraction(cent_idx)


Suggested change

if self.hazard.fraction.nonzero()[0].size == 0:

return mdr.multiply(exp_values_csr)

fract = self.hazard.get_fraction(cent_idx)

fract = self.hazard.get_fraction(cent_idx)

if fract is None:

return mdr.multiply(exp_values_csr)

This plus adding this line at the top of get_fraction:

if self.fraction.nnz == 0: return None

I'd still prefer to leave as much responsibility as possible to the hazard.

Sure, that works well!

It is just that the hazard.get_fraction is not the standard getter for the attribute hazard.fraction. It is to get the fraction for a subset of centroids. Maybe this should be slightly updated too, so that get_fraction() returns simply fraction (or None)?

👍 Good idea. Should be fairly easy too, since the fraction is hardly ever read. I've been looking for occurrences of fraction-readings throughout the code base and apart from this line here I found them only in tests and in hazards for rewriting or plotting.

emanuel-schmid · 2022-10-06T12:50:28Z

... since the attribute fraction would be different from what is obtained from get_fraction.

But that's fine, even to be expected in OO programming, no? See suggested change in impact_calc, line 417ff .

peanutfun · 2022-10-13T10:39:49Z

If I could add my bits to this discussion...

Hazard.get_fraction()

I very much like the idea by @emanuel-schmid of a "get" method for the fraction, because this can handle the fraction logic internally without exposing it to the code using the fraction values. However, a "get" method is a very "C++" thing. Python can actually make a getter/setter function look like a regular property with the @property decorator.

Here's how that works:

class Hazard:
    _fraction = None  # NOTE: Now marked private

    @property
    def fraction(self):
        """Return the fraction or construct one from the intensity if it is not set"""
        if _fraction is None:
            row_idx, col_idx, _ = scipy.sparse.find(self.intensity)
            return scipy.sparse.csr_matrix((np.ones(len(row_idx)), (row_idx, col_idx)), shape=self.intensity.shape)
        return self._fraction


# Logic is completely hidden:
hazard = Hazard(...)
fraction = hazard.fraction  # NOTE: `self.fraction` can also be called from within the Hazard object

Edit: For reference: https://www.freecodecamp.org/news/python-property-decorator/

peanutfun · 2022-10-13T11:45:46Z

I just discussed my proposal with @chahank and we concluded that it does not solve the issue of reducing RAM usage and memory required to store the hazard object. The solution provided here should be as non-intrusive as possible. We will later update the Hazard.__init__ in the same way as Impact.__init__, such that all attributes have to be set during initialization. This gives the opportunity to build a consistent Hazard object when the intensity is not set.

chahank · 2022-10-13T13:47:00Z

In 9497976 I set the default for get_fraction to return the whole fraction. Furthermore, in b6021fb I set get_fraction to return None if the fraction is empty, and the impact_calc to then ignore fraction in the computation of the impact.

chahank · 2022-10-24T16:45:11Z

All tests are passing. Should we proceed with this? I.e., accept the changes, update the docstrings and the tutorials, check climada petals.

…s_none

peanutfun

Looks good! I agree that this is the least intrusive solution, as only very few lines of production code change. Well done! I found two small issues, see the comments.

climada/hazard/base.py

climada/hazard/test/test_base.py

Co-authored-by: Lukas Riedel <[email protected]>

chahank · 2022-10-28T14:05:07Z

Thanks! Agree with both comments and implemented the changes.

chahank · 2022-10-31T10:01:16Z

I think this PR is mostly ready now, with all tests updated and the tutorials updated too.

@emanuel-schmid : what is missing is a clear handling of the test files. I created a new test file based on the previous one. The changes are that the test file is in .hdf5 format instead of .mat, and the fraction is None. However, the handling of the loading of the test datafiles was not homogeneous through the unit test files. I would like to harmonize this before merging. What is the best practice here?

peanutfun · 2022-10-31T11:18:46Z

Just pitching in: I think it's safest to load test files in test fixtures. This can be done by adding a setUp (and tearDown) method to a UnitTest class. See my changes to climada/hazard/test/test_base.py for an example

chahank · 2022-10-31T11:24:13Z

Thanks! This would be one way, but I think requires quite a lot of rewriting. Let me make my question more precise. Should the test data be loaded:

from the API?
from the filename defined in the constants?

chahank · 2022-10-31T11:27:39Z

Just pitching in: I think it's safest to load test files in test fixtures. This can be done by adding a setUp (and tearDown) method to a UnitTest class. See my changes to climada/hazard/test/test_base.py for an example

Maybe it is best to first close this PR, and then make the change to the Hazard __init__, or vice-versa?

peanutfun · 2022-11-01T09:50:15Z

Maybe it is best to first close this PR, and then make the change to the Hazard init, or vice-versa?

Why close? Depending on which PR will be merged first, the other one has to be adapted. I think this one can go first, but the other way round also would not be an issue.

climada/engine/impact_calc.py

Note, the pylint message is then disabled for the whole function. Co-authored-by: Lukas Riedel <[email protected]>

chahank · 2022-11-03T20:51:30Z

@emanuel-schmid : can you please have a final look, in particular for the test file handling?

emanuel-schmid · 2022-11-04T08:56:34Z

👍 checking...

emanuel-schmid · 2022-11-04T13:31:40Z

🎉

* Assume fraction is one if None * Allow fraction to be empty When fraction is empty, it is ignored in the impact calculation. The shape must still be n_events x n_exp * Correct shape to argument * Remove unecessary else * Return None in get_fraction * Set default return full fraction * Update test TC for fraction None * Remove test nozero fraction elements * Make get_fraction private to _get_fraction * Correct typo get_fraction -> _get_fraction in test * Add note that fraction is optional in hazard tutorial * Add note that probabilistic does not mean no storyline * Fix equation rendering * Fix linter issues: Single statement per line Co-authored-by: Lukas Riedel <[email protected]> * Set test to assert_array_equal for get_fraction * Update hazard test file to hdf5 without fraction * Update climada/engine/impact_calc.py Note, the pylint message is then disabled for the whole function. Co-authored-by: Lukas Riedel <[email protected]> * Make pylint ignore warning for one line only * Update loading of tc hazard file * Update loading of test hazard file * Fix test file path * test file handling consolidation * hazard.base: get_fraction has become private Co-authored-by: Chahan Kropf <[email protected]> Co-authored-by: Lukas Riedel <[email protected]> Co-authored-by: emanuel-schmid <[email protected]>

Chahan Kropf added 3 commits September 26, 2022 18:51

Assume fraction is one if None

f483862

Allow fraction to be empty

705fdbf

When fraction is empty, it is ignored in the impact calculation. The shape must still be n_events x n_exp

Correct shape to argument

bd87baa

chahank requested a review from emanuel-schmid September 30, 2022 15:30

Remove unecessary else

b3923e1

emanuel-schmid reviewed Oct 6, 2022

View reviewed changes

climada/engine/impact_calc.py Outdated Show resolved Hide resolved

emanuel-schmid reviewed Oct 6, 2022

View reviewed changes

Chahan Kropf added 2 commits October 13, 2022 15:37

Return None in get_fraction

b6021fb

Set default return full fraction

9497976

Chahan Kropf added 2 commits October 13, 2022 16:50

Update test TC for fraction None

b1dfafb

Remove test nozero fraction elements

8557ad6

Chahan Kropf added 3 commits October 27, 2022 18:57

Merge remote-tracking branch 'origin/develop' into feature/fraction_i…

889c7c7

…s_none

Make get_fraction private to _get_fraction

78161ea

Correct typo get_fraction -> _get_fraction in test

2512a71

chahank requested a review from peanutfun October 28, 2022 11:13

Chahan Kropf added 3 commits October 28, 2022 14:24

Add note that probabilistic does not mean no storyline

3d662a7

Fix equation rendering

3d1cd57

Merge remote-tracking branch 'origin/develop' into feature/fraction_i…

9df0f50

…s_none

chahank mentioned this pull request Oct 28, 2022

Set all Hazard attributes in __init__ #549

Merged

11 tasks

peanutfun requested changes Oct 28, 2022

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

climada/hazard/test/test_base.py Outdated Show resolved Hide resolved

chahank and others added 2 commits October 28, 2022 16:00

Fix linter issues: Single statement per line

5692859

Co-authored-by: Lukas Riedel <[email protected]>

Set test to assert_array_equal for get_fraction

f2bfc94

Update hazard test file to hdf5 without fraction

697f347

peanutfun reviewed Nov 1, 2022

View reviewed changes

climada/engine/impact_calc.py Outdated Show resolved Hide resolved

chahank and others added 5 commits November 3, 2022 09:57

Update climada/engine/impact_calc.py

85b8d02

Note, the pylint message is then disabled for the whole function. Co-authored-by: Lukas Riedel <[email protected]>

Make pylint ignore warning for one line only

91f7408

Update loading of tc hazard file

eed36b6

Update loading of test hazard file

2a27484

Fix test file path

0a903ba

emanuel-schmid added 3 commits November 4, 2022 09:59

Merge branch 'develop' into feature/fraction_is_none

48aaf02

test file handling consolidation

121ffc2

hazard.base: get_fraction has become private

05dcd3b

emanuel-schmid merged commit eaefbef into develop Nov 4, 2022

emanuel-schmid deleted the feature/fraction_is_none branch November 4, 2022 13:31

peanutfun mentioned this pull request Jan 13, 2023

Load hazard raster data lazily #578

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/fraction is none #541

Feature/fraction is none #541

chahank commented Sep 30, 2022 •

edited

Loading

emanuel-schmid commented Oct 3, 2022

chahank commented Oct 4, 2022 •

edited

Loading

emanuel-schmid commented Oct 5, 2022

emanuel-schmid commented Oct 5, 2022

emanuel-schmid commented Oct 5, 2022

chahank commented Oct 5, 2022

chahank commented Oct 5, 2022

chahank commented Oct 5, 2022

emanuel-schmid commented Oct 6, 2022

chahank commented Oct 6, 2022 •

edited

Loading

emanuel-schmid Oct 6, 2022 •

edited

Loading

chahank Oct 7, 2022

chahank Oct 7, 2022

emanuel-schmid Oct 7, 2022

emanuel-schmid commented Oct 6, 2022 •

edited

Loading

peanutfun commented Oct 13, 2022 •

edited

Loading

peanutfun commented Oct 13, 2022

chahank commented Oct 13, 2022

chahank commented Oct 24, 2022 •

edited

Loading

peanutfun left a comment

chahank commented Oct 28, 2022

chahank commented Oct 31, 2022

peanutfun commented Oct 31, 2022

chahank commented Oct 31, 2022 •

edited

Loading

chahank commented Oct 31, 2022

peanutfun commented Nov 1, 2022

chahank commented Nov 3, 2022

emanuel-schmid commented Nov 4, 2022

emanuel-schmid commented Nov 4, 2022

Feature/fraction is none #541

Feature/fraction is none #541

Conversation

chahank commented Sep 30, 2022 • edited Loading

PR Author Checklist

PR Reviewer Checklist

emanuel-schmid commented Oct 3, 2022

chahank commented Oct 4, 2022 • edited Loading

emanuel-schmid commented Oct 5, 2022

emanuel-schmid commented Oct 5, 2022

emanuel-schmid commented Oct 5, 2022

chahank commented Oct 5, 2022

chahank commented Oct 5, 2022

chahank commented Oct 5, 2022

emanuel-schmid commented Oct 6, 2022

chahank commented Oct 6, 2022 • edited Loading

emanuel-schmid Oct 6, 2022 • edited Loading

Choose a reason for hiding this comment

chahank Oct 7, 2022

Choose a reason for hiding this comment

chahank Oct 7, 2022

Choose a reason for hiding this comment

emanuel-schmid Oct 7, 2022

Choose a reason for hiding this comment

emanuel-schmid commented Oct 6, 2022 • edited Loading

peanutfun commented Oct 13, 2022 • edited Loading

peanutfun commented Oct 13, 2022

chahank commented Oct 13, 2022

chahank commented Oct 24, 2022 • edited Loading

peanutfun left a comment

Choose a reason for hiding this comment

chahank commented Oct 28, 2022

chahank commented Oct 31, 2022

peanutfun commented Oct 31, 2022

chahank commented Oct 31, 2022 • edited Loading

chahank commented Oct 31, 2022

peanutfun commented Nov 1, 2022

chahank commented Nov 3, 2022

emanuel-schmid commented Nov 4, 2022

emanuel-schmid commented Nov 4, 2022

chahank commented Sep 30, 2022 •

edited

Loading

chahank commented Oct 4, 2022 •

edited

Loading

chahank commented Oct 6, 2022 •

edited

Loading

emanuel-schmid Oct 6, 2022 •

edited

Loading

emanuel-schmid commented Oct 6, 2022 •

edited

Loading

peanutfun commented Oct 13, 2022 •

edited

Loading

chahank commented Oct 24, 2022 •

edited

Loading

chahank commented Oct 31, 2022 •

edited

Loading