Ensure csr matrices are in "canonical format" before impact calculation #893

peanutfun · 2024-06-14T11:58:55Z

Changes proposed in this PR:

Add Hazard.check_matrices method for bringing matrices into "canonical format". This ensures that the csr_matrix.data attribute reflects the exact values present in the matrix.
Call check_matrices from the ImpactCalc constructor.
Update tests

This PR fixes inconsistencies in the csr matrix reported in #891, but does not fix the problem of unintentionally summed values (it just ensures this problem is visible when looking at the matrix data).

PR Author Checklist

PR Reviewer Checklist

* Add getter/setter attributes for csr_matrices in Hazard. * Check format and sum duplicates when assigning matrices. * Update unit tests.

peanutfun · 2024-06-14T12:18:23Z

Petals compatibility tests are failing because we sometimes assign LIL matrices there instead of CSR matrices. This can be fixed before or after merging this PR.

peanutfun · 2024-06-14T13:07:39Z

Petals compatibility is addressed here: CLIMADA-project/climada_petals#129

chahank

Good idea to add more consistency to the hazard data! One suggestion / comment.

climada/hazard/base.py

Update hazard docstring

peanutfun · 2024-06-17T13:44:36Z

@chahank What do you think about this updated version? I still have to adapt the test.

climada/hazard/base.py

climada/hazard/test/test_base.py

Co-authored-by: Chahan M. Kropf <[email protected]>

chahank · 2024-06-21T14:37:36Z

climada/engine/test/test_impact_calc.py

@@ -70,6 +70,16 @@ def test_init(self):
        np.testing.assert_array_equal(HAZ.event_id, icalc.hazard.event_id)
        np.testing.assert_array_equal(HAZ.event_name, icalc.hazard.event_name)

+        # Test check matrices


Suggested change

# Test check matrices

# Test check matrices

# Check fraction and intensity have the same shape

# Check that explicit zeroes are pruned

I am a bit confused why the pruning is tested here...

I can also check via a mock if check_matrices is called, but I recall strong sentiments against such tests 😄

I think my confusion comes from the fact that I am uneasy with having the impact class changing the hazard objects. See comment below.

chahank · 2024-06-21T14:39:41Z

climada/hazard/base.py

+    to sum up these values. To avoid any inconsistencies, call :py:meth:`check_matrices`
+    once you inserted all data. This will explicitly sum all values at the same matrix


Do we want to recommend checking that if the class does check it anyway? I am a bit sceptical with these check_object methods that we have in climada. Imho these should be handled in the init and that is it.

If people modify the matrix after the initialization or assignment, we have no way of ensuring consistency within Hazard alone. (We later call check_matrices from ImpactCalc for that exact reason)

hmmm... not yet fully convinced. If the user messes up an object, which the user always can in Python, then it is their problem. I would rather have fewer checks at random places in the code that hide bad habits and save people from doing things they should not do, and instead force good habits by having checks in the inits.

I generally agree that we should only check if necessary. However, the MDR calculation operates on the data attribute. It is therefore essential that canonical format is ensured before starting the impact calculation.

If you really want to boil it down to a minimum, I would suggest to remove the pruning from the Hazard setters and instead only call check_matrices from ImpactCalc. But this might increase confusion again, because consistency is never ensured before starting ImpactCalc.

Another thing to keep in mind is that sum_duplicates is O(N^2) for a non-canonical matrix, and a no-op O(1) for a canonical matrix, and eliminate_zeros is O(N), where N is the number of stored non-zeros entries.

Good points, and thanks for looking up the time cost of the methods.

Ok, let's try it like this. Maybe we can keep an eye on this change in the separate efforts to optimize computation times for ImpactCalc.

In this case, I'll revert the properties and make intensity and fraction proper attributes again.

climada/hazard/base.py

climada/util/checker.py

climada/util/test/test_checker.py

Co-authored-by: Chahan M. Kropf <[email protected]>

peanutfun · 2024-07-02T14:00:21Z

@chahank Waiting for your responses to move forward 😇 👉 👈

* Only call `Hazard.check_matrices` from `ImpactCalc.__init__`. * Update tests and docstrings accordingly.

CHANGELOG.md

chahank · 2024-07-15T14:20:19Z

Great fix! Ready to merge. Just one possible addition to the changelog.

peanutfun added 2 commits June 14, 2024 12:10

Ensure that csr_matrices are in canonical format

4f6952e

* Add getter/setter attributes for csr_matrices in Hazard. * Check format and sum duplicates when assigning matrices. * Update unit tests.

Ensure canonical format in init and streamline checks

91c52e9

peanutfun marked this pull request as ready for review June 14, 2024 12:20

Explicitly remove zeros from csr matrices when assigning

3e27ac8

peanutfun mentioned this pull request Jun 14, 2024

Only assign csr matrices to Hazard objects CLIMADA-project/climada_petals#129

Merged

13 tasks

peanutfun requested a review from chahank June 14, 2024 13:08

Update CHANGELOG.md

f539dec

chahank reviewed Jun 14, 2024

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

peanutfun added 3 commits June 17, 2024 11:28

Add explicit check for updating matrices

4ced09c

Add util function for pruning csr_matrices

de06f2e

Update hazard docstring

Check matrices when instantiating ImpactCalc

ebb2cc8

peanutfun marked this pull request as draft June 17, 2024 13:44

chahank reviewed Jun 17, 2024

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

chahank reviewed Jun 17, 2024

View reviewed changes

climada/hazard/test/test_base.py Outdated Show resolved Hide resolved

peanutfun and others added 3 commits June 21, 2024 12:24

Update climada/hazard/base.py

c58b48d

Co-authored-by: Chahan M. Kropf <[email protected]>

Format docstring suggestion from review

e0ebdef

Update unit tests for matrix pruning and checks

71ce00e

peanutfun marked this pull request as ready for review June 21, 2024 13:32

peanutfun requested a review from chahank June 21, 2024 13:32

chahank reviewed Jun 21, 2024

View reviewed changes

climada/hazard/base.py Show resolved Hide resolved

chahank reviewed Jun 21, 2024

View reviewed changes

climada/util/checker.py Outdated Show resolved Hide resolved

chahank reviewed Jun 21, 2024

View reviewed changes

climada/util/test/test_checker.py Outdated Show resolved Hide resolved

Update climada/util/test/test_checker.py

fa0821e

Co-authored-by: Chahan M. Kropf <[email protected]>

Apply suggestions from code review

6ad7ab4

Revert changes to attribute structure of Hazard

79ab66f

* Only call `Hazard.check_matrices` from `ImpactCalc.__init__`. * Update tests and docstrings accordingly.

peanutfun changed the title ~~Ensure csr matrices are in "canonical format" in Hazard objects~~ Ensure csr matrices are in "canonical format" before impact calculation Jul 15, 2024

peanutfun requested a review from chahank July 15, 2024 09:13

chahank reviewed Jul 15, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

peanutfun added 2 commits July 17, 2024 11:11

Update CHANGELOG.md

9e0a58c

Merge branch 'develop' into hazard-consistent-csr-matrix

cd21903

peanutfun merged commit 700862c into develop Jul 17, 2024
18 checks passed

emanuel-schmid deleted the hazard-consistent-csr-matrix branch July 18, 2024 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure csr matrices are in "canonical format" before impact calculation #893

Ensure csr matrices are in "canonical format" before impact calculation #893

peanutfun commented Jun 14, 2024 •

edited

Loading

peanutfun commented Jun 14, 2024

peanutfun commented Jun 14, 2024

chahank left a comment

peanutfun commented Jun 17, 2024

chahank Jun 21, 2024

peanutfun Jun 21, 2024

chahank Jul 2, 2024

chahank Jun 21, 2024

peanutfun Jun 21, 2024 •

edited

Loading

chahank Jul 2, 2024

peanutfun Jul 4, 2024

chahank Jul 5, 2024

peanutfun Jul 8, 2024

peanutfun commented Jul 2, 2024

chahank commented Jul 15, 2024

		to sum up these values. To avoid any inconsistencies, call :py:meth:`check_matrices`
		once you inserted all data. This will explicitly sum all values at the same matrix

Ensure csr matrices are in "canonical format" before impact calculation #893

Ensure csr matrices are in "canonical format" before impact calculation #893

Conversation

peanutfun commented Jun 14, 2024 • edited Loading

PR Author Checklist

PR Reviewer Checklist

peanutfun commented Jun 14, 2024

peanutfun commented Jun 14, 2024

chahank left a comment

Choose a reason for hiding this comment

peanutfun commented Jun 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peanutfun Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peanutfun commented Jul 2, 2024

chahank commented Jul 15, 2024

peanutfun commented Jun 14, 2024 •

edited

Loading

peanutfun Jun 21, 2024 •

edited

Loading