Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to remove "syntactic sugar" that overloads * and / for univariate distributions #1438

Closed
ablaom opened this issue Nov 24, 2021 · 10 comments

Comments

@ablaom
Copy link
Contributor

ablaom commented Nov 24, 2021

I propose that the "syntactic sugar" added here be removed in the next breaking release.

A probability distribution is just a special case of a measure. The transformations implied by the current implementations of *, +, - and / so forth do not generalise to arbitrary measures which, moreover, already have well-established meanings for these operations. Eg, for the product of a scalar $x$ with a measure $\mu$ we have $(x\mu)(A) = x \mu(A)$. And these definitions are useful also within probability; for example:

  • When constructing a mixture of probability distributions, such as a finite average: The average is a prob measure even though the partial sums are just measures. That is, it's frequently convenient to leave the affine subspace of probability measures using the standard definitions of + and *.

  • When one wants to avoid normalising the measure representing a probability measure, because it is not needed (eg for generating samples)

My own use case is in ensemble models where I am averaging the probabilistic predictions of multiple classifiers. Computing averages naively is not working smoothly because the forementioned syntactic sugar conflicts with the "usual" ones, preventing me from just doing mean(....).

Thoughts anyone?

@cscherrer

@devmotion
Copy link
Member

This was added in #1217. Affine transformations were already defined for MvNormal, and the LocationScale definitions could be improved for different univariate distributions as well (#1407).

At first glance, these transformations seem completely fine to me: they return the distribution of the transformed random variable. I guess the main concern here is that the use of +, *, etc. might indicate that the measures or densities are added, multiplied, etc.?

@cscherrer
Copy link

It might help to consider... When we write 2 + Normal() or 3 * Normal(), what kind of objects are 2 and 3? In the current Distributions notation, this seem to be implicitly converted to Dirac measures, so the operations can be interpreted as convolution. I'm not saying this is good or bad, but it does seem helpful to pin down the semantics.

In MeasureTheory, we've had some discussion of changing to for weighted measures. See
JuliaMath/MeasureTheory.jl#170

This is a little more awkward, but it has the advantage of not getting in the way of the current Distributions syntactic sugar. Our usual use for is for a likelihood "acting on" a measure through a pointwise product (hence the pointy notation). So this is really yet another syntactic sugar, interpreting 3 ⊙ Normal() as something like Returns(3) ⊙ Normal().

@cscherrer
Copy link

Addition is a little trickier, since + is used for superposition. But that always takes two measures, never a measure and a likelihood.

@devmotion
Copy link
Member

devmotion commented Nov 24, 2021

It might help to consider... When we write 2 + Normal() or 3 * Normal(), what kind of objects are 2 and 3? In the current Distributions notation, this seem to be implicitly converted to Dirac measures, so the operations can be interpreted as convolution.

No, it's not, it's much simpler. It is an affine transformation of a random variable X with distribution Normal(). So 2 + Normal() and 3 * Normal() just means "give me the distribution of 2 + X" and "give me the distribution of 3 * X". I.e., 2 and 3 are really just two real numbers.

BTW convolutions can be computed with Distributions.convolve.

@ablaom
Copy link
Contributor Author

ablaom commented Nov 24, 2021

I guess the main concern here is that the use of +, *, etc. might indicate that the measures or densities are added, multiplied, and so forth.?

Yes, that's my only concern. I have no objections to the transformations that were added, just the usurping of +, * to represent them.

@devmotion
Copy link
Member

Affine transformations are very useful and I think we should definitely support them, make them easy to use, and define more optimized versions whenever possible (e.g. as in #1407).

They were added initially for MvNormals, and this fixed a very old issue: #307

I am not completely convinced yet that the current behaviour of +, *, etc. is surprising since Distributions does not perform any computations with measures and was not designed from a measure theory view point. This different interpretation as a transformation of measures was also discussed in #307. There seemed general agreement that the syntax was fine in the end: @mschauer wrote

But +(A::Vector, B::MvNormal) and *(A::Matrix B::MvNormal) can only mean one thing imho.

and (a bit similar to my comment) @simonbyrne said

I mean, the only other interpretation it could mean would be transforming it as a measure, but (a) that isn't very useful (since it would no longer be a probability measure), and (b) we don't treat it as a measure in other contexts, e.g. defining (d::Distribution)(x::Interval) to get the probability of an interval.

even though it seems he was a bit reluctant initially:

It would be good to have some way to do this, at least for constants. Mathematically, I'm a bit reluctant to overload +/* directly: the objects are intended to be distributions, not random variables. But maybe that isn't such a big deal?

@ablaom
Copy link
Contributor Author

ablaom commented Nov 24, 2021

Okay, I understood one thing wrong. + is only overloaded for distr + scalar and not distr + distr. Sorry, I should have checked that more carefully. So it only the scalar product scalar*distr and the distr/scalar cases that I am wondering about.

After these further clarifications I better understand the arguments for the status quo in those cases. I guess this is tricky one.

@ablaom
Copy link
Contributor Author

ablaom commented Nov 25, 2021

It seems @mschauer did express essentially the same misgiving that I have in #307:

One thing, for scalar λ the expression λ*D1 + (1-λ)*D2 could denote a mixture distribution.

@ablaom ablaom changed the title Proposal to remove "syntactic sugar" that overloads + and * for univariate distributions Proposal to remove "syntactic sugar" that overloads * and / for univariate distributions Nov 25, 2021
@devmotion
Copy link
Member

I think it's not the same concern though: the comment only mentions a very specific expression. I see that one could expect a mixture distribution for an expression of this form but it requires

  • that all non-negative coefficients sum up to 1,
  • that operations are based on measures (even though generally we don't use this view in Distributions),
  • that one can sum scaled Distribution objects (which is not possible, we don't define +(::Distribution, ::Distribution); e.g. for convolutions one has to use convolve).

Therefore I'm not sure if the examples is actually an argument against the current use of a * D - it seems to indicate mainly that for such an expression one might expect a mixture distribution. However, the current implementation does not support this syntax and hence this confusion can't arise.

@ablaom
Copy link
Contributor Author

ablaom commented Nov 25, 2021

@devmotion Thanks for taking my suggestion seriously and for the helpful explanations. I stand by my objection but can see that this boat has sailed. Even MeasureTheory.jl seems resigned to avoiding Base.* to avoid confusion for users of Distributions.jl. These calls are always difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants