Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure csr matrices are in "canonical format" before impact calculation #893

Merged
merged 15 commits into from
Jul 17, 2024

Conversation

peanutfun
Copy link
Member

@peanutfun peanutfun commented Jun 14, 2024

Changes proposed in this PR:

  • Add Hazard.check_matrices method for bringing matrices into "canonical format". This ensures that the csr_matrix.data attribute reflects the exact values present in the matrix.
  • Call check_matrices from the ImpactCalc constructor.
  • Update tests

This PR fixes inconsistencies in the csr matrix reported in #891, but does not fix the problem of unintentionally summed values (it just ensures this problem is visible when looking at the matrix data).

PR Author Checklist

PR Reviewer Checklist

* Add getter/setter attributes for csr_matrices in Hazard.
* Check format and sum duplicates when assigning matrices.
* Update unit tests.
@peanutfun
Copy link
Member Author

Petals compatibility tests are failing because we sometimes assign LIL matrices there instead of CSR matrices. This can be fixed before or after merging this PR.

@peanutfun peanutfun marked this pull request as ready for review June 14, 2024 12:20
@peanutfun
Copy link
Member Author

Petals compatibility is addressed here: CLIMADA-project/climada_petals#129

@peanutfun peanutfun requested a review from chahank June 14, 2024 13:08
Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to add more consistency to the hazard data! One suggestion / comment.

climada/hazard/base.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member Author

@chahank What do you think about this updated version? I still have to adapt the test.

@peanutfun peanutfun marked this pull request as draft June 17, 2024 13:44
climada/hazard/base.py Outdated Show resolved Hide resolved
@peanutfun peanutfun marked this pull request as ready for review June 21, 2024 13:32
@peanutfun peanutfun requested a review from chahank June 21, 2024 13:32
@@ -70,6 +70,16 @@ def test_init(self):
np.testing.assert_array_equal(HAZ.event_id, icalc.hazard.event_id)
np.testing.assert_array_equal(HAZ.event_name, icalc.hazard.event_name)

# Test check matrices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Test check matrices
# Test check matrices
# Check fraction and intensity have the same shape
# Check that explicit zeroes are pruned

I am a bit confused why the pruning is tested here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also check via a mock if check_matrices is called, but I recall strong sentiments against such tests 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my confusion comes from the fact that I am uneasy with having the impact class changing the hazard objects. See comment below.

Comment on lines 63 to 64
to sum up these values. To avoid any inconsistencies, call :py:meth:`check_matrices`
once you inserted all data. This will explicitly sum all values at the same matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to recommend checking that if the class does check it anyway? I am a bit sceptical with these check_object methods that we have in climada. Imho these should be handled in the init and that is it.

Copy link
Member Author

@peanutfun peanutfun Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If people modify the matrix after the initialization or assignment, we have no way of ensuring consistency within Hazard alone. (We later call check_matrices from ImpactCalc for that exact reason)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... not yet fully convinced. If the user messes up an object, which the user always can in Python, then it is their problem. I would rather have fewer checks at random places in the code that hide bad habits and save people from doing things they should not do, and instead force good habits by having checks in the inits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree that we should only check if necessary. However, the MDR calculation operates on the data attribute. It is therefore essential that canonical format is ensured before starting the impact calculation.

If you really want to boil it down to a minimum, I would suggest to remove the pruning from the Hazard setters and instead only call check_matrices from ImpactCalc. But this might increase confusion again, because consistency is never ensured before starting ImpactCalc.

Another thing to keep in mind is that sum_duplicates is O(N^2) for a non-canonical matrix, and a no-op O(1) for a canonical matrix, and eliminate_zeros is O(N), where N is the number of stored non-zeros entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, and thanks for looking up the time cost of the methods.

Ok, let's try it like this. Maybe we can keep an eye on this change in the separate efforts to optimize computation times for ImpactCalc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'll revert the properties and make intensity and fraction proper attributes again.

climada/util/checker.py Outdated Show resolved Hide resolved
@peanutfun
Copy link
Member Author

@chahank Waiting for your responses to move forward 😇 👉 👈

* Only call `Hazard.check_matrices` from `ImpactCalc.__init__`.
* Update tests and docstrings accordingly.
@peanutfun peanutfun changed the title Ensure csr matrices are in "canonical format" in Hazard objects Ensure csr matrices are in "canonical format" before impact calculation Jul 15, 2024
@peanutfun peanutfun requested a review from chahank July 15, 2024 09:13
CHANGELOG.md Outdated Show resolved Hide resolved
@chahank
Copy link
Member

chahank commented Jul 15, 2024

Great fix! Ready to merge. Just one possible addition to the changelog.

@peanutfun peanutfun merged commit 700862c into develop Jul 17, 2024
18 checks passed
@emanuel-schmid emanuel-schmid deleted the hazard-consistent-csr-matrix branch July 18, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants