New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

aleeciu · 2023-06-06T09:21:23Z

In this PR I implement new scaling factors to simulate TC under future climate. The new implementation follows the same logic as the previous one but here new scaling factors are used. The new scaling factors are based on Knutson et al.(2020) and Jewson et al. (2021).

Important to note is that:

following Jewson (2022), only frequencies are scaled in order to avoid inconsistencies and possible double-counting as also raised in Knutson's TC climate change effect is double-counted #453. Scaling only frequencies anyway changes mean intensities too.
in order to avoid any overlap in frequency changes across categories, frequencies for categories 0-3 are derived from those of categories 0-5 and 4-5, and all TCs in categories 0-3 or 4-5 are then corrected with their respective factors.
this PR also fixes the occurrence of negative frequencies reported in Negative frequencies in TropCyclone.set_climate_scenario_knu() #317 when using the old scaling factors.

I created a jupyter notebook where I show how the results of such implementation lead to outputs which are consistent with the estimates from Knutson 2020 and never lead to negative frequencies. That said, since the test is done under a given baseline period, I cannot rule out the possibility that negative frequencies will never occurr, therefore I am anyway throwing an error when this occur.

Two remaining todos:

final tests, as I first want to make sure that the followed approached is accepted
the current TC tutorial shows an increase in absolute intensity of Irma under a given RCP. However, as the new implementation does not apply any increment in absolute intensity but, rather, to mean intensities and any other risk metric, I was wondering how to best rewrite that part.
possibly, the notebook I posted above could potentially be simplfied and become and TC_CC tutorial, which is currently missing

PR Author Checklist

Read the Contribution Guide
Correct target branch selected (if unsure, select develop)
Descriptive pull request title added
[] Source branch up-to-date with target branch
[] Documentation updated
[] Tests updated
[] Tests passing
No new linter issues
Changelog updated

PR Reviewer Checklist

…change

…w ones

chahank · 2023-06-06T16:55:07Z

Thank you for the proposal. Note that we have already had more than one trial to improve on this. So, here are four questions (which you might have solved):

How do you deal with the significance of the Knutson parameters?
How do you incorporate the fact that the Knutson analysis was done on completely different datasets?
If you just remove the intensity scaling, you do not replicate the distribution shift describe in the paper by Knutson. How/why is it clearly better to have only the frequency shifting?
How do you fix the negative frequencies without messing up the distribution shift? Because just cutting at 0 does not really solve the problem.
The Knutson parameters are shifts for the cumulative distributions. How is this reflected here?

I think overall the question is whether it is useful to do small changes as proposed here, that in appearance fix some issues, but does not clearly improve the model, and even potentially make it worse.

bguillod · 2023-06-13T15:03:57Z

Thanks for tackling the issues related to future TC projections.

I personally think this implementation represents a substantial improvements compared to the current code. It is not perfect, but this is to a wide extent due to the nature of the complex problem and I think the proposed solution is a good pragmatic way to solve the main issues. The implementation as you did here is in my view very much in line with the literature.

One concern I would have, however, is related to how Knutson presents changes and then how you implemented them. Knutson gives changes for "intensity above cat X", so we have changes for categories (ordered from highest to lowest category range):

4 to 5
3 to 5
1 to 5
0 (i.e. tropical storms) to 5

So for example storms of category 3 fall under both criteria (1) and (2) above. Which change should one apply?
My suggestion would be:

Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).
For events where only one frequency change is available from Knutson (i.e., categories 4 and 5 in the example above as only point 1 applies), apply that change.
Then go through critera 2,3,4 and each time, look at how frequency has already changed and how to adjust remaining categories such as to satisfy Knutson's aggregated change. That is, for point 2, I would look at cat 3 storms and see: by how much do these need to change in frequency after we've already changed cat 4 and 5 storms, such as to satisfy Knutson's frequency change (1) and (2) above. Once this is done, do the same for cat 1&2 (Knutson's change 3 above), etc.

That way you would be able to conserve the changes for all category ranges that Knutson is providing.

If you can solve that then I think this will bring the implementation to a new level.

chahank · 2023-06-13T16:03:40Z

Then go through critera 2,3,4 and each time, look at how frequency has already changed and how to adjust remaining categories such as to satisfy Knutson's aggregated change. That is, for point 2, I would look at cat 3 storms and see: by how much do these need to change in frequency after we've already changed cat 4 and 5 storms, such as to satisfy Knutson's frequency change (1) and (2) above. Once this is done, do the same for cat 1&2 (Knutson's change 3 above), etc.

Exactly this method was implemented in the past. But, it did lead to strong inconsistencies with the frequency distributions. A new implementation should be aware of this.

This was the proposal of the time. Maybe only changing the frequencies would resolve the problem.

 def _apply_knutson_criterion(self, chg_int_freq, scaling_rcp_year):
        """
        Apply changes to intensities and cumulative frequencies.

        Parameters
        ----------
        criterion : list(dict))
            list of criteria from climada.hazard.tc_clim_change
        scale : float
            scale parameter because of chosen year and RCP
        Returns
        -------
        tc_cc : climada.hazard.TropCyclone
            Tropical cyclone with frequency and intensity scaled according
            to the Knutson criterion. Returns a new instance of TropCyclone.
        """

        tc_cc = copy.deepcopy(self)

        # Criterion per basin
        for basin in np.unique(tc_cc.basin):

            bas_sel = (np.array(tc_cc.basin) == basin)

            # Apply intensity change
            inten_chg = [chg
                         for chg in chg_int_freq
                         if (chg['variable'] == 'intensity' and
                             chg['basin'] == basin)
                         ]
            for chg in inten_chg:
                sel_cat_chg = np.isin(tc_cc.category, chg['category']) & bas_sel
                inten_scaling = 1 + (chg['change'] - 1) * scaling_rcp_year
                tc_cc.intensity = sparse.diags(
                    np.where(sel_cat_chg, inten_scaling, 1)
                    ).dot(tc_cc.intensity)

            # Apply frequency change
            freq_chg = [chg
                        for chg in chg_int_freq
                        if (chg['variable'] == 'frequency' and
                            chg['basin'] == basin)
                        ]
            freq_chg.sort(reverse=False, key=lambda x: len(x['category']))

            # Iteratively scale frequencies for each category such that
            # cumulative frequencies are scaled according to Knutson criterion.

            cat_larger_list = []
            for chg in freq_chg:
                cat_chg_list = [cat
                                for cat in chg['category']
                                if cat not in cat_larger_list
                                ]
                sel_cat_chg = np.isin(tc_cc.category, cat_chg_list) & bas_sel
                if sel_cat_chg.any():
                    freq_scaling = 1 + (chg['change'] - 1) * scaling_rcp_year
                    sel_cat_all = (np.isin(tc_cc.category, chg['category'])
                                   & bas_sel)
                    sel_cat_larger = (np.isin(tc_cc.category, cat_larger_list)
                                      & bas_sel)
                    freq_scaling_cor = (
                        (np.sum(self.frequency[sel_cat_all]) * freq_scaling
                         - np.sum(tc_cc.frequency[sel_cat_larger]))
                        / np.sum(self.frequency[sel_cat_chg])
                    )
                    tc_cc.frequency[sel_cat_chg] *= freq_scaling_cor
                cat_larger_list += cat_chg_list

        if (tc_cc.frequency < 0).any():
            raise ValueError("The application of the given climate scenario"
                             "resulted in at least one negative frequenciy."
                             "This is likely due to the use of a"
                             "non-representative event set (too small, "
                             "incorrect reference period, ...)")

        return tc_cc

bguillod · 2023-06-16T08:46:16Z

Exactly this method was implemented in the past. But, it did lead to strong inconsistencies with the frequency distributions. A new implementation should be aware of this.

I agree with you @chahank that this back-then-proposed implementation you've copy-pasted above would do what I meant for frequencies. Can you please elaborate on what the inconsistencies were in the distribution? Is this discussed somewhere in an issue or an open PR or something?

This was the proposal of the time. Maybe only changing the frequencies would resolve the problem.

It might be worth a try. Surely, applying both changing in frequency and intensity must mess up the distribution, but this does not mean that doing only frequency wouldn't. So first I'd like to understand better the inconsistencies you mentioned.

Also, applying only statistically significant changes will mess up the distribution - which is why I generally strongly support the proposed addition in this PR to include also non-significant changes.

aleeciu · 2023-06-16T09:09:57Z

Hi both,

to me both the proposal by @bguillod and the code posted by @chahank make sense, but happy to hear what inconsistencies you encountered Chahan.

I am just not fully sure about this @bguillod :

Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).

are you suggesting to calc yet other - cat-specific - frequencies other than those carried by the TropCyclone object which account for all events - regardless their cat - in a basin ?

bguillod · 2023-06-16T09:15:27Z

I am just not fully sure about this @bguillod :

Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).

are you suggesting to calc yet other - cat-specific - frequencies other than those carried by the TropCyclone object which account for all events - regardless their cat - in a basin ?

No, what I meant is essentially what is implement in the code shared by @chahank, I.e. summing up all frequencies of events by basin and category ranges.

On a side-note: find the code extremely hard to understand though. We could definitely improve readability.

chahank · 2023-06-16T13:22:53Z

The main inconsistency is that you get a lot of negative frequencies. And just putting them to zero is not satisfying.

But, it was quite some time ago. Maybe it is worth trying it out again and maybe it can be made consistent.

bguillod · 2023-06-19T07:01:09Z

Do you remember for which basin and scenario & year this was the case?
Perhaps @aleeciu could use historical data, apply changes in frequencies only using this methodology and see if and for which cases negative frequencies occur?

chahank · 2023-06-19T13:47:35Z

Unfortunately no, it was quite some time ago. The cleanest would be imho to redo the analysis with the latest version of the random walk algorithm.

climada/hazard/trop_cyclone.py

chahank · 2023-08-16T15:05:28Z

Is it still planed to finish this pull request?

aleeciu · 2023-08-17T07:24:41Z

Is it still planed to finish this pull request?

Yes, but I need to figure out how to address the points you raised above.

aleeciu · 2024-08-07T12:04:44Z

Hi, I updated all tests following the guidelines as well as the advise of testing "meaningfull" data and not just prodiving the hardcoded value. I also used np.testing as much as I could following the advide of @peanutfun. From my side, I think I covered all raised points, please let me know.

aleeciu · 2024-08-07T12:14:25Z

The only instances where I still check for hardcoded values occur when testing get_knutson_data and get_gmst_info whose function is to load data. Based on the previous discussion, having proper tests for these functions was not considered as needed, but I still decided to keep them. They can be removed if needed.

chahank

Thanks for the improvements!

Some small typos were found.

Also, the tutorial file TropCyclone.ipynb shows 5301 lines of code changed. This is very suboptimal. Please make sure that only those lines that you changed are changed in this file.

chahank · 2024-08-14T16:11:38Z

climada/hazard/test/test_tc_cc.py

+            ind2 = target_year_ind + smoothing + 1
+
+            prediction = np.mean(tc_properties[:, ind1:ind2], 1)
+            predicted_change = ((prediction - baseline) / baseline) * 100


While not critical, using a variable name predicted_change and one predicted_changes is very confusing and quite impossible to read. For the future, maybe try using clear variable names.

updated with (hopefully) more informative names: target_predicted_changes and calculated_predicted_change

chahank · 2024-08-14T16:14:00Z

climada/hazard/trop_cyclone/trop_cyclone.py

+        Note: Only frequency changes are applied as suggested by Jewson 2022
+        (https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway
+        changes mean intensities and most importantly avoids possible inconsistencies
+        (including possible counting) that may arise from the application of both


Do you mean 'double-counting' ?

yes - adjusted

chahank · 2024-08-14T16:14:14Z

climada/hazard/trop_cyclone/trop_cyclone.py

+        (https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway
+        changes mean intensities and most importantly avoids possible inconsistencies
+        (including possible counting) that may arise from the application of both
+        frequency and intensity changes, as the relatioship between these two is non


relationship

chahank · 2024-08-16T08:41:35Z

Oh, and please update the CHANGELOG, don't hide your great work!

https://github.com/CLIMADA-project/climada_python/blob/develop/CHANGELOG.md

spjuhel · 2024-09-04T09:37:53Z

climada/hazard/trop_cyclone/trop_cyclone.py

+        sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])
+        sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])
+        sel_cat45 = np.isin(tc_cc.category, [4, 5])
+


Hello, I started using this improvement/fix for the generation of future TCs and discovered that if TropCyclone.category == [], this method does not complain at all and just return the exact same object.

This happened to me when I wrote/read hdf5 files using a mix of Hazard.write|from_hdf5 and TropCyclone.write|from_hdf5 instead of only TropCyclone.write|from_hdf5, but this could probably happen in other case and is easily addressed by raising an Error if categories are not set, or at the very least a warning, to inform the user.

Here a suggestion (which can probably be improved):

Suggested change

sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])

sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])

sel_cat45 = np.isin(tc_cc.category, [4, 5])

sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])

sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])

sel_cat45 = np.isin(tc_cc.category, [4, 5])

if sel_cat05.size + sel_cat03.size + sel_cat45.size == 0:

raise ValueError("Cyclone categories are missing, cannot apply climate change scenario")

Thanks for spotting and also for having started using it. I decided to go for the warning.

aleeciu · 2024-09-16T16:14:15Z

Thanks for the improvements!

Some small typos were found.

Also, the tutorial file TropCyclone.ipynb shows 5301 lines of code changed. This is very suboptimal. Please make sure that only those lines that you changed are changed in this file.

Thanks a lot @chahank . I fixed the typos and the point raised by @spjuhel . Regarding the notebook, I don't know what's going on. I literally changed only one line. So even though it seems huge, it's literally almost the same as before.

I would think we can now merge this one?

aleeciu · 2024-09-17T10:47:12Z

It seems that changes to the changelog file were already pushed to develop. It was probably my fault. I can revert it if needed.

emanuel-schmid · 2024-09-17T12:19:56Z

It seems that changes to the changelog file were already pushed to develop. It was probably my fault. I can revert it if needed.

Done.

aleeciu · 2024-09-18T12:08:38Z

Everything used to work fine but now I suddenly have a failing test. @emanuel-schmid

emanuel-schmid · 2024-09-18T13:19:02Z

Everything used to work fine but now I suddenly have a failing test. @emanuel-schmid

👀 Yes, that worries me too. It has nothing to do with the PR but seems to be a glitch of the climada data server, most likely because of high load. We have been spared from such failures at large in the past, but lately they accumulated.

chahank · 2024-09-20T12:54:21Z

Regarding the notebook, I don't know what's going on. I literally changed only one line. So even though it seems huge, it's literally almost the same as before.

Thanks for all the updates. As you can see in the files changes, the notebook has 5,056 additions, 245 deletions. Please make sure that only the lines that need to be changed are changed and make a new commit with the correct changes in the notebook.

spjuhel · 2024-09-20T15:29:48Z

A notebook is basically a json file with the content of the cells (Both input and output). They are potentially subject to a LOT of changes when simply opened with jupyter (as solely opening it changes its content already), even more so if they are run. This is a common hassle with notebook files, for which there is very limited practical solution.

If you changed only one line, I suggest you go back to the previous version and change the line directly in the file on GitHub for instance.

You can also commit only specific parts of a file: https://stackoverflow.com/questions/1085162/commit-only-part-of-a-files-changes-in-git

aleeciu · 2024-09-27T09:37:57Z

I have now updated the tutorial withouth re-running it, which seems to have done the trick.

bguillod · 2024-10-10T11:05:10Z

I have solved the merge conflicts. For me it's ready to merge, and I think the changes requested by @chahank and @peanutfun have been addressed - can you please confirm by approving and/or by directly merging so we can close this one?

aleeciu added 10 commits June 5, 2023 12:29

avoid negative frequencies

4d2d2f3

remove frequency correction

226f3e0

remove intensity correction factors

047b589

remove use of "variable" key

2da114f

add non-signficant changes plus correct error in increase percentage …

95e6c6e

…change

fix freq correction

1b17477

fix bug in test: assert true should be assert equal

29fb1a1

update test due to application of only freq changes - and sometime ne…

288c2ed

…w ones

adapt tests in TropCyclone

fdac719

fix name and adapt climate scenario test

f1934f9

aleeciu changed the title ~~Fix negative freqs, double counting and two small bugs in Knutson scaling~~ Fix negative freqs, double counting and three small bugs in Knutson scaling Jun 6, 2023

reinsert check that no neg freq are modeled

50a6fd7

emanuel-schmid and others added 6 commits June 26, 2023 17:22

Merge branch 'develop' into feature/fix_knutson_scaling

2e8152b

test_trop_cyclone: adaptation to tag refactoring

cd0f286

pylint

c04b7dc

Merge branch 'develop' into feature/fix_knutson_scaling

0718f2c

throw warning when negative freq

e12b58f

handle neg freq_scaling; refactor code

3f5e4a7

chahank reviewed Jul 11, 2023

View reviewed changes

climada/hazard/trop_cyclone.py Outdated Show resolved Hide resolved

aleeciu added 6 commits August 6, 2024 16:27

bring back new functions lost after merging develop

7e572bf

update tutorial

4c35fca

update unit tests tc_cc

f9fe6fb

simply var name and code

018e400

improve unit tests for cc scaling factors

1bce71f

update tests for rescaled future tropical cyclones

9b32e93

chahank requested changes Aug 15, 2024

View reviewed changes

spjuhel reviewed Sep 4, 2024

View reviewed changes

aleeciu added 4 commits September 16, 2024 17:52

update var names

93eba07

update docstring

2ba59d0

update docstrng

030611d

add warning when no categories

d38a6ec

Update CHANGELOG.md

9b84ef7

aleeciu added 2 commits September 27, 2024 11:34

bring back original notebook

81434ce

update notebook

a10b613

Merge branch 'develop' into feature/fix_knutson_scaling

ea471ae

chahank merged commit 487a51b into develop Oct 18, 2024
18 checks passed

emanuel-schmid deleted the feature/fix_knutson_scaling branch October 20, 2024 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

aleeciu commented Jun 6, 2023 •

edited

Loading

chahank commented Jun 6, 2023 •

edited

Loading

bguillod commented Jun 13, 2023

chahank commented Jun 13, 2023

bguillod commented Jun 16, 2023 •

edited

Loading

aleeciu commented Jun 16, 2023

bguillod commented Jun 16, 2023

chahank commented Jun 16, 2023 •

edited

Loading

bguillod commented Jun 19, 2023

chahank commented Jun 19, 2023

chahank commented Aug 16, 2023

aleeciu commented Aug 17, 2023

aleeciu commented Aug 7, 2024

aleeciu commented Aug 7, 2024 •

edited

Loading

chahank left a comment

chahank Aug 14, 2024

aleeciu Sep 16, 2024

chahank Aug 14, 2024

aleeciu Sep 16, 2024

chahank Aug 14, 2024

aleeciu Sep 16, 2024

chahank commented Aug 16, 2024

spjuhel Sep 4, 2024

aleeciu Sep 16, 2024

aleeciu commented Sep 16, 2024

aleeciu commented Sep 17, 2024

emanuel-schmid commented Sep 17, 2024

aleeciu commented Sep 18, 2024

emanuel-schmid commented Sep 18, 2024

chahank commented Sep 20, 2024

spjuhel commented Sep 20, 2024 •

edited

Loading

aleeciu commented Sep 27, 2024

bguillod commented Oct 10, 2024

New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

Conversation

aleeciu commented Jun 6, 2023 • edited Loading

PR Author Checklist

PR Reviewer Checklist

chahank commented Jun 6, 2023 • edited Loading

bguillod commented Jun 13, 2023

chahank commented Jun 13, 2023

bguillod commented Jun 16, 2023 • edited Loading

aleeciu commented Jun 16, 2023

bguillod commented Jun 16, 2023

chahank commented Jun 16, 2023 • edited Loading

bguillod commented Jun 19, 2023

chahank commented Jun 19, 2023

chahank commented Aug 16, 2023

aleeciu commented Aug 17, 2023

aleeciu commented Aug 7, 2024

aleeciu commented Aug 7, 2024 • edited Loading

chahank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chahank commented Aug 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleeciu commented Sep 16, 2024

aleeciu commented Sep 17, 2024

emanuel-schmid commented Sep 17, 2024

aleeciu commented Sep 18, 2024

emanuel-schmid commented Sep 18, 2024

chahank commented Sep 20, 2024

spjuhel commented Sep 20, 2024 • edited Loading

aleeciu commented Sep 27, 2024

bguillod commented Oct 10, 2024

aleeciu commented Jun 6, 2023 •

edited

Loading

chahank commented Jun 6, 2023 •

edited

Loading

bguillod commented Jun 16, 2023 •

edited

Loading

chahank commented Jun 16, 2023 •

edited

Loading

aleeciu commented Aug 7, 2024 •

edited

Loading

spjuhel commented Sep 20, 2024 •

edited

Loading