Skip to content
Pierre-Elouan Réthoré edited this page Jun 22, 2015 · 9 revisions

Comments

Comments on this FWEP are welcomed on Issue #76

Motivation

We are interested to represent the inputs and outputs of the models in FUSED-Wind with information about their probability distribution and / or their uncertainty. The question that this extension proposal is trying to address is how could we complement the I/O definition dictionary with those additional information. How this information is then taken into account by the component is not the focus of this discussion.

Proposal

Basic Idea

The idea is that we would be extending the definition of a scalar value, e.g.

wind_speed:
    name:   wind_speed
    desc:   wind speed value
    type:   float
    value:  5.0
    min:    4.0
    max:    25.0
    units:  m/s

to definitions about the variables that represent a distribution instead, e.g.

wind_speed:
    name:   wind_speed
    desc:   wind speed distribution (truncated Weibull)
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    units:  m/s

Concepts

Different representation of distributions

  • Gaussian
    type:   gaussian
    mean:   10.0
    sigma:  2.0
  • Weibull
    type:   weibull
    A:      4.0
    k:      2.0
  • Beta...

  • Kernel Density Estimator

    type: KDE
    value: array([])  # size:n*3 array, with n the number of normals centers
  • Gaussian process ?
    type:       GP
    value:      array([])    # size:n*2 array, with n the number of points of the GP
    covariance: cubic   # type of covariance matrix used

Additive uncertainty

The additive uncertainty can be represented as an overall distribution. In this example we assume to have a time series of wind direction measurement with an estimated overall uncertainty assumed to be normal.

wind_directions:
    name:   wind_directions
    desc:   wind directions with measurement uncertainty
    type:   array
    value:  array([])
    uncertainty:
            type:   gaussian
            mean:   0.0
            sigma:  10.0

Distribution hyper-parameter uncertainty

What if we have an uncertainty in the parameters of the distribution? This information could come from the fit of the distribution achieved over a dataset.

wind_speed:
    name:   wind_speed
    desc:   wind speed truncated weibull distribution fit with parameter uncertainty
    type:   weibull
    A:
            type:   gaussian
            mean:   4.0
            sigma:  1.0
    k:
            type:   gaussian
            mean:   2.0
            sigma:  0.2
    min:    4.0
    max:    25.0
    units:  m/s

Distribution additive uncertainty

Another way of describing the uncertainty of the fit would be to estimate the overall error of the fit:

wind_speed:
    name:   wind_speed
    desc:   wind speed weibull distribution with overall fit uncertainty
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    uncertainty:
            type:   gaussian
            mean:   0.0
            sigma:  2.0

If for some reason we know that the uncertainty of the fit is higher for some region of the distribution (f.ex not enough data point in the measurement, due to a model uncertainty, or propagated uncertainty) we can represent that uncertainty using another distribution (e.g. here a Kernel Density Estimator, KDE).

wind_speed:
    name:   wind_speed
    desc:   wind speed weibull distribution with distribution dependent fit uncertainty
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    uncertainty:
            type:   KDE           # kernel density estimator
            value:  array([...]), # size:n*3 array, with n the number of normals centers

Joint distributions

Joint distribution combine several inputs together. For instance a wind rose combine wind speed and wind direction together

wind_rose:
    name:   wind_rose
    desc:   local wind resource
    type:   binned_weibull
    dimensions: ['wind_speed', 'wind_direction'],
    wind_speed: weibull
    wind_direction: cdf
    values: np.array([]),
    columns:['wind_direction', 'frequency', 'A', 'k']

More complicated examples:

A joint distribution with 4 dimensions built from the marginal distribution of each dimension and a copula correlating them together

wind_atlas:
    name:   wind_atlas
    desc:   joint distribution built from a Copula for U,D,TI,S. U.
            U is a truncated Weibull, TI is a lognormal and D&S are KDEs.
    dimensions: ['wind_speed', 'wind_direction','TI','stability']
    type:   copula
    copula:
            type:           gaussian #gumble...
            correlation:    array([] #4x4 correlation on the cdf-1 of the variables (might be more useful to store the inversed and determinant
    marginals:
            wind_speed:
                type:   weibull
                A:      12.0
                k:      2.0
                min:    4.0
                max:    25.0
            wind_direction:
                type:   KDE
                value:  array([...]), #n*3 array, with 1 dimension, n the number of normals centers
            TI:
                type:   lognormal
                mean:   5.0
                sigma:  1.0
            stability:
                type:   KDE
                value:  array([...]), #n*3 array, with 1 dimension, n the number of normals centers

A joint 4-dimensional distribution built from a multi-dimensional KDE

wind_atlas:
    name:   wind_atlas
    desc:   joint KDE distribution for U,D,TI,S with fixed gaussian uncertainty
    type:   KDE
    value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
    dimensions: ['wind_speed', 'wind_direction','TI','stability']
    uncertainty:
            type:       multivariate_gaussian
            mean:       [0.0,...],#list(4)
            covariance: array([]),#array 4x4

A joint 4-dimensional distribution built from a multi-dimensional KDE with an additional KDE to model its uncertainty

wind_atlas:
    name:   wind_atlas
    desc:   joint distribution for U,D,TI,S with a probability uncertainty function of the inputs
    type:   KDE
    value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
    dimensions:     ['wind_speed', 'wind_direction','TI','stability']
    uncertainty:
            type:   KDE
            value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
            dimensions: ['wind_speed', 'wind_direction','TI','stability']