Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimum, maximum, extrema for AbstractMvNormal and Product #1319

Merged
merged 6 commits into from
Oct 9, 2021

Conversation

agerlach
Copy link
Contributor

@agerlach agerlach commented May 5, 2021

This PR adds minimum, maximum, extrema for AbstractMvNormal and Product. The definition extends that currently used for univariate distributions, i.e. minimum and maximum returns the min/max of the support for each dimension.

This extends a common interface between Univariate distributions and as subset of MV distributions for querying the bounds of the support for upstream applications such as DiffEqUncertainty, where quadrature is solved over the support.

@devmotion
Copy link
Member

It seems weird to me to define extrema etc. for spaces without at least a partial order. Isn't it sufficient if you define custom methods (without overloading minimum, maximum, and extrema), e.g., using mapreduce, for this purpose wherever you want to use this information?

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

I believe the implementation provided satisfies the lexicographic order of isless for vector spaces. With that said, I don't disagree that a separate function may be more appropriate. Ultimately, what I desire is a common interface for getting the corners of the bounding box of the support of a distribution.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

Would making something like the following be something you are willing to have merged into Distributions.jl?

boundingbox(d::UnivariateDistribution) = extrema(d)
boundingbox(d::AbstractMvNormal) = ...
boundingbox(d::Product) = ...

@devmotion
Copy link
Member

The docstrings of maximum, minimum, and extrema in Julia (https://docs.julialang.org/en/v1/base/collections/#Base.minimum etc.) let me think that the definitions here would be a misuse of these functions. The definitions for iterators and arrays always perform reductions of all elements (or along the given dimensions).

@mschauer
Copy link
Member

mschauer commented May 5, 2021

I first thought the same, but I convinced myself this is correct:

extrema(D) gives the numbers a, b such that isless(a, X) && isless(X <= b) for all X = rand(D), because it Base's extrema is for min and max and those are isless-extrema.
So we still have

extrema(D) ≈ extrema([rand(D) for _ in 1:N]) # limit N to Inf

After figuring it out I added a clarification to the doc string too:
I updated JuliaLang/julia#40713

@devmotion
Copy link
Member

I think a better approach would be to define support more generally for multi- and matrixvariate distributions. Instead of the current approach for univariate distributions with ranges and custom types (which is broken e.g. for Poisson: #1254) probably one should make use of packages such as IntervalSets or DomainSets which would also support multivariate distributions (#1254 (comment)).

However, this would involve more changes and probably take some time until it would be available, so maybe the easiest short-term solution for you would be to define a custom interface in your package instead of in Distributions.

@devmotion
Copy link
Member

It would also be more consistent with the current univariate implementation since there maximum etc. are coupled with support.

@mschauer
Copy link
Member

mschauer commented May 5, 2021

But it's the right thing to define, the proposed extrema does the reduction of all possible samples with respect to isless, just like our other definitions of extrema in Distributions.jl do. The 1-d extrema(::Distribution) are part of the interface and this PR is just the consequence. That it has a use case outside the package is a nice extra.

@devmotion
Copy link
Member

I didn't know that isless is actually defined for vectors and that it is used by minimum, maximum, etc. Taking this into account, the PR seems reasonable.

It would be good to add support for non-univariate distributions in another PR, the disconnect seems a bit unfortunate (I just checked, DomainSets defines minimum and maximum as well). I guess also it should be implemented for more distributions, it seems to cover only these two special types of multivariate distributions now.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

I guess also it should be implemented for more distributions, it seems to cover only these two special types of multivariate distributions now.

I am happy to add it for other distributions, however, I am not familiar with many of them and it isn't obvious what this should be in all cases. For example, should Dirichlet be

 extrema(d:: Dirichlet) = zeros(eltype(d), length(d)), ones(eltype(d), length(d))

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some suggestions.

src/multivariate/mvnormal.jl Outdated Show resolved Hide resolved
src/multivariate/mvnormal.jl Outdated Show resolved Hide resolved
src/multivariate/product.jl Outdated Show resolved Hide resolved
src/multivariate/product.jl Outdated Show resolved Hide resolved
src/multivariate/product.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

For example, should Dirichlet be

Ah, that's a nice example for why it would be useful to define support with something more advanced such as DomainSets. According to the definition of isless it should be

function Base.minimum(d::Dirichlet)
    n = length(d)
	return [i < n ? zero(eltype(d)) : one(eltype(d)) for i in 1:n]
end

function Base.maximum(d::Dirichlet)
	return [i == 1 ? one(eltype(d)) : zero(eltype(d)) for i in 1:length(d)]
end

But probably support would be much more helpful for your use case.

@mschauer
Copy link
Member

mschauer commented May 5, 2021

Yes, on slack I recommended to go for

Moritz Schauer 18 hours ago: I think you want something which gives a “eps”-bounding box [of the support] similar to

function getEndpoints(distr::UnivariateDistribution, epsilon::Real)

@devmotion
Copy link
Member

Since the support is an open set? The nice thing about DomainSets and similar packages is that they provide support (pun intended) for open and closed sets and half-open intervals.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

According to the definition of isless it should be

This doesn't seem correct to me. Maybe I'm trivializing this, but the interpretation I am using for satisfying isless is the extrema of the support of the marginals. Here that is always [0..1] -> 0,1 for each marginal

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

I agree that support would be useful, this implementation essentially defining the smallest interval set containing the support

@devmotion
Copy link
Member

the interpretation I am using for satisfying isless is the extrema of the support of the marginals. Here that is always 0..1 -> 0,1

No, isless sorts with respect to lexicographical order, i.e.

julia> isless([0, 1], [0.5, 0.5])
true

julia> isless([0.5, 0.5], [1, 0])
true

julia> isless([0.1, 0.9], [0.11, 0.89])
true

julia> isless([0.1, 0.9], [0.1, 0.91])
true

@devmotion
Copy link
Member

I agree that support would be useful, this implementation essentially defining the smallest interval set containing the support

No, we would not want a bounding box, support should really return the support of the distribution.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

No, we would not want a bounding box, support should really return the support of the distribution.

Agreed completely. Let me clarify, I meant my proposal for extrema is the corners of the smallest interval set containing the support

@devmotion
Copy link
Member

E.g., with DomainSets one would define

support(d::Dirichlet) = VectorUnitSimplex{eltype(d)}(length(d))

@devmotion
Copy link
Member

Let me clarify, I meant my proposal for extrema is the corners of the smallest interval set containing the support

But that is not how extrema (and minimum and maximum) are supposed to be defined. They should return the extrema etc. with respect to isless which uses lexicographical order.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

They should return the extrema etc. with respect to isless which uses lexicographical order.

Thats exactly what the corners of the smallest interval set containing the support is

@mschauer
Copy link
Member

mschauer commented May 5, 2021

No, we would not want a bounding box, support should really return the support of the distribution.

That means that the epsilon-bounding box needs a different name than support, not that it is a bad idea! I mean we define it already getEndpoints, thought the name can be improved

@devmotion
Copy link
Member

No, in the case of Dirichlet it is not. If we sort all elements in its support, i.e., the probability simplex, in lexicographical order, then [0, 0, ...., 0, 1] is the minimum and [1, 0, ...., 0] is the maximum and they are unique. We just sort with respect to the first element, and then break ties with the second etc. I tried to illustrate this with the example above.

@devmotion
Copy link
Member

devmotion commented May 5, 2021

That means that the epsilon-bounding box

DomainSets uses boundingbox 🙂

E.g., the boundingbox for the unit simplex is the unit cube:
https://github.com/JuliaApproximation/DomainSets.jl/blob/e9f8c5ee3c5d6a2457b2e1172d7234aff4aaac54/src/domains/simplex.jl#L45

@mschauer
Copy link
Member

mschauer commented May 5, 2021

Except that Lebesgue on the unit simplex and the unit cube are mutually singular, so you cannot just define the density =0 on the rest of the cube not covered by the simplex and integrate... :-D it's tricky, we know.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

@devmotion I follow you now!

So, how would you like to proceed. It seems that this type of definition for extrema does not extend in general. Having some general support function would be awesome long term, but sounds like will be a significant effort. Adding boundingbox would satisfy my immediate needs and seems readily doable. However, I do respect that fact that I won't be the one tasked w/ maintaining it

I could do something like the following, or use IntervalSets/DomainSets for the output

boundingbox(d::UnivariateDistribution) = extrema(d)
boundingbox(d::AbstractMvNormal) = fill(-Inf,length(d)), fill(Inf, length(d))
boundingbox(d::Product) = map(minimum, d.v), map(maximum, d.v)
boundingbox(d::Dirichlet) = zeros(length(d)), ones(length(d))

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

where boundingbox always returns the smallest interval set containing the support

@mschauer
Copy link
Member

mschauer commented May 5, 2021

Let's add the more general bounding box in a separate PR.

@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

Ok, do you want me to finalize this PR, or close it in favor of a separate boundingbox PR?

@mschauer
Copy link
Member

mschauer commented May 5, 2021

Finalize!

src/univariates.jl Outdated Show resolved Hide resolved
@agerlach
Copy link
Contributor Author

agerlach commented May 5, 2021

Tests pass on my end and docs look clean now.


Return the maximum of the support of `d`.
"""
maximum(d::Distribution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this idiom. What does it mean?

Copy link
Contributor Author

@agerlach agerlach May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't either, its from #1319 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agerlach agerlach requested a review from devmotion May 6, 2021 17:32
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@st--
Copy link
Contributor

st-- commented Oct 9, 2021

@devmotion you approved it & the CI checks pass - want to merge it, too? : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants