Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several redundant functions for fitting exceedance freq to return periods #904

Open
ValentinGebhart opened this issue Jun 27, 2024 · 4 comments
Assignees

Comments

@ValentinGebhart
Copy link
Collaborator

ValentinGebhart commented Jun 27, 2024

Is your feature request related to a problem? Please describe.
There are several functions that do the same computation (fitting the relation between exceedance frequency and return periods) but in different interpolation ways. They might be worth to combine. The ones I can find now are:

  1. Impact.calc_freq_curve
    impact exceedance frequency curve (aggregated over centroids)
    method: np.interp(freq_cum, imp)
  2. Impact.local_exceedance_imp using loc_return_impusing _cen_return_imp
    impact exceedance frequency per centroid for several return periods
    method: np.polyfit( np.log(freq_cum), imp, deg = 1)
  3. Hazard.local_exceedance_inten using _loc_return_inten using _cen_return_inten
    hazard exceedance frequency per centroid for several return periods
    method: np.polyfit( np.log(freq_cum), haz, deg = 1)
  4. Hazard.local_return_period using _loc_return_period
    return period per centroid for several threshold intensities
    method: np.searchsorted() (i.e. fitting a step function between haz and freq_cum)

Describe the solution you'd like
We could write one or two flexible functions that do the computation for all above cases, and maybe some wrapper functions.

Describe alternatives you've considered
None
Additional context
related to issue #209

@peanutfun
Copy link
Member

We can adapt the Impact.calc_freq_curve to become a "kernel" for exceedance frequency curves that can be applied to Impact.at_event, an impact time series of a specific exposure point, or a hazard intensity time series of a specific centroid. This function can then be applied to the appropriate data in suitable wrapper functions. It can even be adapted to "flip" the axes, and return the return period for specific values of the time series (i.e., the inverse problem that it currently solves)

@chahank
Copy link
Member

chahank commented Jun 30, 2024

There is also a reason for having two methods. One is just an ordering of the values with a linear interpolation between these values Impact.calc_freq_curve. The other is a fit of a curve due to the potentially very small number of values Impact.local_exceedance_imp. Thus, I think that there remains a good reason to have two methods. I would make maybe one single method as a util function that can either interpolate or fit. Then, as @peanutfun suggested, one can use it in the classes and flip the axes if needed.

@bguillod
Copy link
Collaborator

bguillod commented Jul 2, 2024

I personally find the return period calculations rather obscure to the user, so I would welcome improvements.

Personally, in addition to the above points, I would consider three options and leave it to the user to specify which one (with a good default of course):

  1. The interpolated case, albeit with a clear definition in which space (e.g. log-log) it is performed (or the space being an additional optional argument).
  2. The non-interpolated case.
  3. The fitting of an extreme value distribution.

In my view the default should probably be (1) or (3).

In addition, I would suggest the following:

  • If the user asks for a return period beyond the maximum one available in cases 1 and 2 above, a warning is raised and np.nan is returned, rather than returning the maximum value silently. Perhaps also in case (3) if the return period is way beyond the max available empirically, although extreme value distributions are designed for extrapolation...
  • The uncertainty of the estimate should ideally be optionally provided.
  • In case of multiple identical values, these should be grouped and their frequencies summed first.

Some of these points we've implemented on our end already, so if you undertake a PR, happy to contribute.

@bguillod
Copy link
Collaborator

bguillod commented Jul 2, 2024

(Just to be clear: when I write three options I mean as an input parameter to the function(s) rather than separate functions at the top-level code API - which doesn't prevent the options to be split into individual private functions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants