Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of Knutson scaling incl. fix to negative freqs and double counting #734

Merged
merged 76 commits into from
Oct 18, 2024

Conversation

aleeciu
Copy link
Collaborator

@aleeciu aleeciu commented Jun 6, 2023

In this PR I implement new scaling factors to simulate TC under future climate. The new implementation follows the same logic as the previous one but here new scaling factors are used. The new scaling factors are based on Knutson et al.(2020) and Jewson et al. (2021).

Important to note is that:

I created a jupyter notebook where I show how the results of such implementation lead to outputs which are consistent with the estimates from Knutson 2020 and never lead to negative frequencies. That said, since the test is done under a given baseline period, I cannot rule out the possibility that negative frequencies will never occurr, therefore I am anyway throwing an error when this occur.

Two remaining todos:

  • final tests, as I first want to make sure that the followed approached is accepted
  • the current TC tutorial shows an increase in absolute intensity of Irma under a given RCP. However, as the new implementation does not apply any increment in absolute intensity but, rather, to mean intensities and any other risk metric, I was wondering how to best rewrite that part.
  • possibly, the notebook I posted above could potentially be simplfied and become and TC_CC tutorial, which is currently missing

PR Author Checklist

PR Reviewer Checklist

@aleeciu aleeciu changed the title Fix negative freqs, double counting and two small bugs in Knutson scaling Fix negative freqs, double counting and three small bugs in Knutson scaling Jun 6, 2023
@chahank
Copy link
Member

chahank commented Jun 6, 2023

Thank you for the proposal. Note that we have already had more than one trial to improve on this. So, here are four questions (which you might have solved):

  • How do you deal with the significance of the Knutson parameters?
  • How do you incorporate the fact that the Knutson analysis was done on completely different datasets?
  • If you just remove the intensity scaling, you do not replicate the distribution shift describe in the paper by Knutson. How/why is it clearly better to have only the frequency shifting?
  • How do you fix the negative frequencies without messing up the distribution shift? Because just cutting at 0 does not really solve the problem.
  • The Knutson parameters are shifts for the cumulative distributions. How is this reflected here?

I think overall the question is whether it is useful to do small changes as proposed here, that in appearance fix some issues, but does not clearly improve the model, and even potentially make it worse.

@bguillod
Copy link
Collaborator

Thanks for tackling the issues related to future TC projections.

I personally think this implementation represents a substantial improvements compared to the current code. It is not perfect, but this is to a wide extent due to the nature of the complex problem and I think the proposed solution is a good pragmatic way to solve the main issues. The implementation as you did here is in my view very much in line with the literature.

One concern I would have, however, is related to how Knutson presents changes and then how you implemented them. Knutson gives changes for "intensity above cat X", so we have changes for categories (ordered from highest to lowest category range):

  1. 4 to 5
  2. 3 to 5
  3. 1 to 5
  4. 0 (i.e. tropical storms) to 5

So for example storms of category 3 fall under both criteria (1) and (2) above. Which change should one apply?
My suggestion would be:

  • Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).
  • For events where only one frequency change is available from Knutson (i.e., categories 4 and 5 in the example above as only point 1 applies), apply that change.
  • Then go through critera 2,3,4 and each time, look at how frequency has already changed and how to adjust remaining categories such as to satisfy Knutson's aggregated change. That is, for point 2, I would look at cat 3 storms and see: by how much do these need to change in frequency after we've already changed cat 4 and 5 storms, such as to satisfy Knutson's frequency change (1) and (2) above. Once this is done, do the same for cat 1&2 (Knutson's change 3 above), etc.

That way you would be able to conserve the changes for all category ranges that Knutson is providing.

If you can solve that then I think this will bring the implementation to a new level.

@chahank
Copy link
Member

chahank commented Jun 13, 2023

Then go through critera 2,3,4 and each time, look at how frequency has already changed and how to adjust remaining categories such as to satisfy Knutson's aggregated change. That is, for point 2, I would look at cat 3 storms and see: by how much do these need to change in frequency after we've already changed cat 4 and 5 storms, such as to satisfy Knutson's frequency change (1) and (2) above. Once this is done, do the same for cat 1&2 (Knutson's change 3 above), etc.

Exactly this method was implemented in the past. But, it did lead to strong inconsistencies with the frequency distributions. A new implementation should be aware of this.

This was the proposal of the time. Maybe only changing the frequencies would resolve the problem.

 def _apply_knutson_criterion(self, chg_int_freq, scaling_rcp_year):
        """
        Apply changes to intensities and cumulative frequencies.

        Parameters
        ----------
        criterion : list(dict))
            list of criteria from climada.hazard.tc_clim_change
        scale : float
            scale parameter because of chosen year and RCP
        Returns
        -------
        tc_cc : climada.hazard.TropCyclone
            Tropical cyclone with frequency and intensity scaled according
            to the Knutson criterion. Returns a new instance of TropCyclone.
        """

        tc_cc = copy.deepcopy(self)

        # Criterion per basin
        for basin in np.unique(tc_cc.basin):

            bas_sel = (np.array(tc_cc.basin) == basin)

            # Apply intensity change
            inten_chg = [chg
                         for chg in chg_int_freq
                         if (chg['variable'] == 'intensity' and
                             chg['basin'] == basin)
                         ]
            for chg in inten_chg:
                sel_cat_chg = np.isin(tc_cc.category, chg['category']) & bas_sel
                inten_scaling = 1 + (chg['change'] - 1) * scaling_rcp_year
                tc_cc.intensity = sparse.diags(
                    np.where(sel_cat_chg, inten_scaling, 1)
                    ).dot(tc_cc.intensity)

            # Apply frequency change
            freq_chg = [chg
                        for chg in chg_int_freq
                        if (chg['variable'] == 'frequency' and
                            chg['basin'] == basin)
                        ]
            freq_chg.sort(reverse=False, key=lambda x: len(x['category']))

            # Iteratively scale frequencies for each category such that
            # cumulative frequencies are scaled according to Knutson criterion.

            cat_larger_list = []
            for chg in freq_chg:
                cat_chg_list = [cat
                                for cat in chg['category']
                                if cat not in cat_larger_list
                                ]
                sel_cat_chg = np.isin(tc_cc.category, cat_chg_list) & bas_sel
                if sel_cat_chg.any():
                    freq_scaling = 1 + (chg['change'] - 1) * scaling_rcp_year
                    sel_cat_all = (np.isin(tc_cc.category, chg['category'])
                                   & bas_sel)
                    sel_cat_larger = (np.isin(tc_cc.category, cat_larger_list)
                                      & bas_sel)
                    freq_scaling_cor = (
                        (np.sum(self.frequency[sel_cat_all]) * freq_scaling
                         - np.sum(tc_cc.frequency[sel_cat_larger]))
                        / np.sum(self.frequency[sel_cat_chg])
                    )
                    tc_cc.frequency[sel_cat_chg] *= freq_scaling_cor
                cat_larger_list += cat_chg_list

        if (tc_cc.frequency < 0).any():
            raise ValueError("The application of the given climate scenario"
                             "resulted in at least one negative frequenciy."
                             "This is likely due to the use of a"
                             "non-representative event set (too small, "
                             "incorrect reference period, ...)")

        return tc_cc

@bguillod
Copy link
Collaborator

bguillod commented Jun 16, 2023

Exactly this method was implemented in the past. But, it did lead to strong inconsistencies with the frequency distributions. A new implementation should be aware of this.

I agree with you @chahank that this back-then-proposed implementation you've copy-pasted above would do what I meant for frequencies. Can you please elaborate on what the inconsistencies were in the distribution? Is this discussed somewhere in an issue or an open PR or something?

This was the proposal of the time. Maybe only changing the frequencies would resolve the problem.

It might be worth a try. Surely, applying both changing in frequency and intensity must mess up the distribution, but this does not mean that doing only frequency wouldn't. So first I'd like to understand better the inconsistencies you mentioned.

Also, applying only statistically significant changes will mess up the distribution - which is why I generally strongly support the proposed addition in this PR to include also non-significant changes.

@aleeciu
Copy link
Collaborator Author

aleeciu commented Jun 16, 2023

Hi both,

to me both the proposal by @bguillod and the code posted by @chahank make sense, but happy to hear what inconsistencies you encountered Chahan.

I am just not fully sure about this @bguillod :

Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).

are you suggesting to calc yet other - cat-specific - frequencies other than those carried by the TropCyclone object which account for all events - regardless their cat - in a basin ?

@bguillod
Copy link
Collaborator

I am just not fully sure about this @bguillod :

Calculate the frequency of events by category and basin in the input dataset (i.e. how many storms in your event set by category and basin).

are you suggesting to calc yet other - cat-specific - frequencies other than those carried by the TropCyclone object which account for all events - regardless their cat - in a basin ?

No, what I meant is essentially what is implement in the code shared by @chahank, I.e. summing up all frequencies of events by basin and category ranges.

On a side-note: find the code extremely hard to understand though. We could definitely improve readability.

@chahank
Copy link
Member

chahank commented Jun 16, 2023

The main inconsistency is that you get a lot of negative frequencies. And just putting them to zero is not satisfying.

But, it was quite some time ago. Maybe it is worth trying it out again and maybe it can be made consistent.

@bguillod
Copy link
Collaborator

Do you remember for which basin and scenario & year this was the case?
Perhaps @aleeciu could use historical data, apply changes in frequencies only using this methodology and see if and for which cases negative frequencies occur?

@chahank
Copy link
Member

chahank commented Jun 19, 2023

Unfortunately no, it was quite some time ago. The cleanest would be imho to redo the analysis with the latest version of the random walk algorithm.

@chahank
Copy link
Member

chahank commented Aug 16, 2023

Is it still planed to finish this pull request?

@aleeciu
Copy link
Collaborator Author

aleeciu commented Aug 17, 2023

Is it still planed to finish this pull request?

Yes, but I need to figure out how to address the points you raised above.

@aleeciu
Copy link
Collaborator Author

aleeciu commented Aug 7, 2024

Hi, I updated all tests following the guidelines as well as the advise of testing "meaningfull" data and not just prodiving the hardcoded value. I also used np.testing as much as I could following the advide of @peanutfun. From my side, I think I covered all raised points, please let me know.

@aleeciu
Copy link
Collaborator Author

aleeciu commented Aug 7, 2024

The only instances where I still check for hardcoded values occur when testing get_knutson_data and get_gmst_info whose function is to load data. Based on the previous discussion, having proper tests for these functions was not considered as needed, but I still decided to keep them. They can be removed if needed.

Copy link
Member

@chahank chahank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements!

Some small typos were found.

Also, the tutorial file TropCyclone.ipynb shows 5301 lines of code changed. This is very suboptimal. Please make sure that only those lines that you changed are changed in this file.

ind2 = target_year_ind + smoothing + 1

prediction = np.mean(tc_properties[:, ind1:ind2], 1)
predicted_change = ((prediction - baseline) / baseline) * 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not critical, using a variable name predicted_change and one predicted_changes is very confusing and quite impossible to read. For the future, maybe try using clear variable names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with (hopefully) more informative names: target_predicted_changes and calculated_predicted_change

Note: Only frequency changes are applied as suggested by Jewson 2022
(https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway
changes mean intensities and most importantly avoids possible inconsistencies
(including possible counting) that may arise from the application of both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean 'double-counting' ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - adjusted

(https://doi.org/10.1007/s00477-021-02142-6). Applying only frequency anyway
changes mean intensities and most importantly avoids possible inconsistencies
(including possible counting) that may arise from the application of both
frequency and intensity changes, as the relatioship between these two is non
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relationship

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@chahank
Copy link
Member

chahank commented Aug 16, 2024

Oh, and please update the CHANGELOG, don't hide your great work!

https://github.com/CLIMADA-project/climada_python/blob/develop/CHANGELOG.md

Comment on lines +425 to +428
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])
sel_cat45 = np.isin(tc_cc.category, [4, 5])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I started using this improvement/fix for the generation of future TCs and discovered that if TropCyclone.category == [], this method does not complain at all and just return the exact same object.

This happened to me when I wrote/read hdf5 files using a mix of Hazard.write|from_hdf5 and TropCyclone.write|from_hdf5 instead of only TropCyclone.write|from_hdf5, but this could probably happen in other case and is easily addressed by raising an Error if categories are not set, or at the very least a warning, to inform the user.

Here a suggestion (which can probably be improved):

Suggested change
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])
sel_cat45 = np.isin(tc_cc.category, [4, 5])
sel_cat05 = np.isin(tc_cc.category, [0, 1, 2, 3, 4, 5])
sel_cat03 = np.isin(tc_cc.category, [0, 1, 2, 3])
sel_cat45 = np.isin(tc_cc.category, [4, 5])
if sel_cat05.size + sel_cat03.size + sel_cat45.size == 0:
raise ValueError("Cyclone categories are missing, cannot apply climate change scenario")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting and also for having started using it. I decided to go for the warning.

@aleeciu
Copy link
Collaborator Author

aleeciu commented Sep 16, 2024

Thanks for the improvements!

Some small typos were found.

Also, the tutorial file TropCyclone.ipynb shows 5301 lines of code changed. This is very suboptimal. Please make sure that only those lines that you changed are changed in this file.

Thanks a lot @chahank . I fixed the typos and the point raised by @spjuhel . Regarding the notebook, I don't know what's going on. I literally changed only one line. So even though it seems huge, it's literally almost the same as before.

I would think we can now merge this one?

@aleeciu
Copy link
Collaborator Author

aleeciu commented Sep 17, 2024

It seems that changes to the changelog file were already pushed to develop. It was probably my fault. I can revert it if needed.

@emanuel-schmid
Copy link
Collaborator

It seems that changes to the changelog file were already pushed to develop. It was probably my fault. I can revert it if needed.

Done.

@aleeciu
Copy link
Collaborator Author

aleeciu commented Sep 18, 2024

Everything used to work fine but now I suddenly have a failing test. @emanuel-schmid

@emanuel-schmid
Copy link
Collaborator

Everything used to work fine but now I suddenly have a failing test. @emanuel-schmid

👀 Yes, that worries me too. It has nothing to do with the PR but seems to be a glitch of the climada data server, most likely because of high load. We have been spared from such failures at large in the past, but lately they accumulated.

@chahank
Copy link
Member

chahank commented Sep 20, 2024

Regarding the notebook, I don't know what's going on. I literally changed only one line. So even though it seems huge, it's literally almost the same as before.

Thanks for all the updates. As you can see in the files changes, the notebook has 5,056 additions, 245 deletions. Please make sure that only the lines that need to be changed are changed and make a new commit with the correct changes in the notebook.

@spjuhel
Copy link
Collaborator

spjuhel commented Sep 20, 2024

A notebook is basically a json file with the content of the cells (Both input and output). They are potentially subject to a LOT of changes when simply opened with jupyter (as solely opening it changes its content already), even more so if they are run. This is a common hassle with notebook files, for which there is very limited practical solution.

If you changed only one line, I suggest you go back to the previous version and change the line directly in the file on GitHub for instance.

You can also commit only specific parts of a file: https://stackoverflow.com/questions/1085162/commit-only-part-of-a-files-changes-in-git

@aleeciu
Copy link
Collaborator Author

aleeciu commented Sep 27, 2024

I have now updated the tutorial withouth re-running it, which seems to have done the trick.

@bguillod
Copy link
Collaborator

I have solved the merge conflicts. For me it's ready to merge, and I think the changes requested by @chahank and @peanutfun have been addressed - can you please confirm by approving and/or by directly merging so we can close this one?

@chahank chahank merged commit 487a51b into develop Oct 18, 2024
18 checks passed
@emanuel-schmid emanuel-schmid deleted the feature/fix_knutson_scaling branch October 20, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants