-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure csr matrices are in "canonical format" before impact calculation #893
Conversation
* Add getter/setter attributes for csr_matrices in Hazard. * Check format and sum duplicates when assigning matrices. * Update unit tests.
Petals compatibility tests are failing because we sometimes assign LIL matrices there instead of CSR matrices. This can be fixed before or after merging this PR. |
Petals compatibility is addressed here: CLIMADA-project/climada_petals#129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to add more consistency to the hazard data! One suggestion / comment.
@chahank What do you think about this updated version? I still have to adapt the test. |
Co-authored-by: Chahan M. Kropf <[email protected]>
@@ -70,6 +70,16 @@ def test_init(self): | |||
np.testing.assert_array_equal(HAZ.event_id, icalc.hazard.event_id) | |||
np.testing.assert_array_equal(HAZ.event_name, icalc.hazard.event_name) | |||
|
|||
# Test check matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Test check matrices | |
# Test check matrices | |
# Check fraction and intensity have the same shape | |
# Check that explicit zeroes are pruned |
I am a bit confused why the pruning is tested here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also check via a mock if check_matrices
is called, but I recall strong sentiments against such tests 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my confusion comes from the fact that I am uneasy with having the impact class changing the hazard objects. See comment below.
climada/hazard/base.py
Outdated
to sum up these values. To avoid any inconsistencies, call :py:meth:`check_matrices` | ||
once you inserted all data. This will explicitly sum all values at the same matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to recommend checking that if the class does check it anyway? I am a bit sceptical with these check_object
methods that we have in climada. Imho these should be handled in the init
and that is it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If people modify the matrix after the initialization or assignment, we have no way of ensuring consistency within Hazard alone. (We later call check_matrices
from ImpactCalc
for that exact reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... not yet fully convinced. If the user messes up an object, which the user always can in Python, then it is their problem. I would rather have fewer checks at random places in the code that hide bad habits and save people from doing things they should not do, and instead force good habits by having checks in the inits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree that we should only check if necessary. However, the MDR calculation operates on the data
attribute. It is therefore essential that canonical format is ensured before starting the impact calculation.
If you really want to boil it down to a minimum, I would suggest to remove the pruning from the Hazard setters and instead only call check_matrices
from ImpactCalc
. But this might increase confusion again, because consistency is never ensured before starting ImpactCalc
.
Another thing to keep in mind is that sum_duplicates
is O(N^2) for a non-canonical matrix, and a no-op O(1) for a canonical matrix, and eliminate_zeros
is O(N), where N is the number of stored non-zeros entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, and thanks for looking up the time cost of the methods.
Ok, let's try it like this. Maybe we can keep an eye on this change in the separate efforts to optimize computation times for ImpactCalc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'll revert the properties and make intensity
and fraction
proper attributes again.
Co-authored-by: Chahan M. Kropf <[email protected]>
@chahank Waiting for your responses to move forward 😇 👉 👈 |
* Only call `Hazard.check_matrices` from `ImpactCalc.__init__`. * Update tests and docstrings accordingly.
Great fix! Ready to merge. Just one possible addition to the changelog. |
Changes proposed in this PR:
Hazard.check_matrices
method for bringing matrices into "canonical format". This ensures that thecsr_matrix.data
attribute reflects the exact values present in the matrix.check_matrices
from theImpactCalc
constructor.This PR fixes inconsistencies in the csr matrix reported in #891, but does not fix the problem of unintentionally summed values (it just ensures this problem is visible when looking at the matrix data).
PR Author Checklist
develop
)PR Reviewer Checklist