-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default Hazard.event_name to string of event_id #795
Conversation
Change event_name default for Hazard.from_xarray_raster to string of event id. Change variable description in class method.
adapted event_name in test_assert_default_values and deleted event_name tests from test_missing_dims since the time dimension does not affect event_name anymore
climada/hazard/base.py
Outdated
@@ -840,7 +840,7 @@ def maybe_repeat(values: np.ndarray, times: int) -> np.ndarray: | |||
None, | |||
np.ones(num_events), | |||
np.array(range(num_events), dtype=int) + 1, | |||
list(data[coords["event"]].values), | |||
[str(x) for x in np.array(range(num_events), dtype=int) + 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does actually not behave as expected: If the dataset supplies custom event IDs (like 100...100+X), this will still set the event name to values 1...X. However, I don't see a way of inserting the event ID data as default value here. @chahank, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, but I think it does not matter that much. I was actually expecting that this PR would change the Hazard.init
to make a good default instead of just the read xarray datasets method. But, this might be a larger endeavor.
Also, you do not need to cast the generator into an array (this is actually very sub-optimal, both memory-wise, speed-wise and style-wise). Also, range already provides int
type. Keeping the original values, this would be it:
[str(x) for x in np.array(range(num_events), dtype=int) + 1], | |
[str(x) for x in range(1, num_events+ 1)], |
Another side-note to @peanutfun : why do you use a list here instead of a dictionary? How do you keep track of the correct ordering? Is there a particular reason to use a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chahank We could indeed adjust the Hazard.__init__
to handle that. I think this would be the cleanest solution. However, changing the default initialization of this class might have some repercussions elsewhere.
why do you use a list here instead of a dictionary?
Not sure I understand. The whole thing is a dictionary which is inserted into a dataframe. This transforms the dictionary keys into column names and dictionary values into the column series. Order is preserved. I agree that it might be a bit less confusing if we used DataFrame.from_records
, where the dataframe can be initialized row-wise. But I recall you insisted that I create a separate variable DEF_DATA_VARS
instead of hardcoding the data variables here, so I thought this was the cleanest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chahank We could indeed adjust the Hazard.init to handle that. I think this would be the cleanest solution. However, changing the default initialization of this class might have some repercussions elsewhere.
Agreed. This will come later when we change the hazard attributes to a better data format I suppose.
Not sure I understand.
Sorry that I was unclear! What I meant is why is the default_var
variable a list (because then the ordering matters in that list, which is not directly obvious)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This will come later when we change the hazard attributes to a better data format I suppose.
Then we need an alternative. I would not like to have event_id
set to 100..100+X and event_name
to 1..1+X. My proposal is then to simply leave the event name empty, and let the Hazard initializer handle these cases in the future.
What I meant is why is the default_var variable a list (because then the ordering matters in that list, which is not directly obvious)?
Still unclear. Do you mean default_value
? Yes, of course ordering matters, because the list relates to the DEF_DATA_VARS
. Again, I can change initialization to row-wise, that would probably make it less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need an alternative. I would not like to have event_id set to 100..100+X and event_name to 1..1+X. My proposal is then to simply leave the event name empty, and let the Hazard initializer handle these cases in the future.
I think having an empty hazard event is not an option, this would break some functionalities. At the moment, the initialization does NOT handle empty event_names properly, so we would have to wait until we fix that (or do it now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still unclear. Do you mean default_value? Yes, of course ordering matters, because the list relates to the DEF_DATA_VARS. Again, I can change initialization to row-wise, that would probably make it less confusing.
Exactly, the ordering of the values for default_value
at that point in the code is not obvious. But it is also not a big deal. Maybe just a comment referring to DEF_DATA_VARS
for the ordering/content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having an empty hazard event is not an option, this would break some functionalities. At the moment, the initialization does NOT handle empty event_names properly, so we would have to wait until we fix that (or do it now).
Okay, then I suggest the following: We create a follow-up issue to update the default initialization of Hazard. For now, we keep the functionality of from_xarray_raster
as is, i.e. event_name
stores the date. We just make sure that the event_name
is indeed a string (because we should always be able to write those). We can do that by calling dt.strftime
on the array and storing the result.
Maybe just a comment referring to DEF_DATA_VARS for the ordering/content?
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then I suggest the following: We create a follow-up issue to update the default initialization of Hazard. For now, we keep the functionality of from_xarray_raster as is, i.e. event_name stores the date. We just make sure that the event_name is indeed a string (because we should always be able to write those). We can do that by calling dt.strftime on the array and storing the result.
Good idea! Let us do this then. @sarah-hlsn can you make the event_name to be the string version of the date as suggested by @peanutfun ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do!
@sarah-hlsn This is coming along very nicely! Could you add an explicit check for |
@sarah-hlsn : could you then please update the changelog too? Then we can soon merge this bugfix. |
@chahank Yes will do, I am currently working on the additional test suggested by @peanutfun but I may need some more time because it is not working as I expected |
@sarah-hlsn Let me know if you need any help |
CHANGELOG.md
Outdated
@@ -14,6 +14,7 @@ Code freeze date: YYYY-MM-DD | |||
|
|||
### Changed | |||
|
|||
- Change default event name to string of event date when importing Hazard data from xarray [#795](https://github.com/CLIMADA-project/climada_python/pull/795) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add under fixed
the issue that was fixed with this pull request please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just mentioning this under Fixed
is enough
Looks good to me. @peanutfun merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good @sarah-hlsn! 🙌 I only have a few stylistic suggestions.
CHANGELOG.md
Outdated
@@ -14,6 +14,7 @@ Code freeze date: YYYY-MM-DD | |||
|
|||
### Changed | |||
|
|||
- Change default event name to string of event date when importing Hazard data from xarray [#795](https://github.com/CLIMADA-project/climada_python/pull/795) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just mentioning this under Fixed
is enough
@chahank, would you go trough my suggestions and merge afterwards? |
Co-authored-by: Lukas Riedel <[email protected]>
Thanks @sarah-hlsn and @peanutfun ! |
Changes proposed in this PR:
This PR fixes #789
PR Author Checklist
develop
)PR Reviewer Checklist