Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHANGE: The name distributions::Standard is misleading #1297

Open
Enter-tainer opened this issue Mar 5, 2023 · 15 comments · May be fixed by #1526
Open

CHANGE: The name distributions::Standard is misleading #1297

Enter-tainer opened this issue Mar 5, 2023 · 15 comments · May be fixed by #1526

Comments

@Enter-tainer
Copy link

Enter-tainer commented Mar 5, 2023

Summary

To address this issue, either api-breaking changes or docs update may happen.

Details & Motivation

I would like to provide some feedback regarding the distributions::Standard. I believe that the name of this feature is misleading and can be confusing for users.

I came across this feature when I use rng.gen(), and Standard is the default distribution. I'm not sure what it is, so I search it online...

image

When searching for the term "Standard Distribution" online, users are very likely to come across information about the standard normal distribution, which can lead to misunderstandings. Furthermore, this name is not commonly used in other contexts, making it difficult for users to understand what it refers to.

image

  • What changes does this require internally?

I would like to suggest that consider changing the name of this feature or provide clear warnings to users that it should not be confused with the standard normal distribution. This would greatly improve the usability and clarity of this library.

@dhardy
Copy link
Member

dhardy commented Mar 5, 2023

Shame the doc link pictured above didn't work, because the docs do explain.

The "standard normal" is just a common (default) parameterisation of the Normal distribution so it shouldn't be confusing. But search engines being what they are, I can see why "standard distribution" finds the "standard normal distribution".

I guess it could be renamed to Default... while we're at it, we could shorten distributions to distr... these are quite significant breaking changes so I'm not sure how acceptable they would be with users.

@dhardy
Copy link
Member

dhardy commented Mar 5, 2023

Another possibility is to replace it with more specific distributions.

Maybe a UniformAll distribution for these:

  • Integers should be the same (including performance) as Uniform::new_inclusive($ty::MIN, $ty::MAX).
  • Wrapping<u32> etc.: same
  • bool: we have rng.gen_bool() but could keep this (same distr as above)

FP (f32, f64): just let people write Uniform::new(0.0, 1.0). Also makes it clear that x * 2.0 - 1.0 is not the best way to sample from -1.0..1.0.

char: we already have Alphanumeric; we could add a new distribution like Codepoint (any valid Unicode code-point).

Option<T>: I don't see real use-cases (beyond testing) for this; maybe we should just make sure it's easy to write an adapter that does it.

Tuples, arrays: again, these could be generated via an adapter.

Adapters

This is where things get problematic...

We could view a closure as an adapter of sorts (play):

let unif11 = Uniform::new(-1.0, 1.0).unwrap();
let x = rng.sample_closure(move |rng| if rng.gen_bool(0.5) {
    rng.sample(&unif11)
} else {
    f64::NAN
});

Nice, but:

  1. If the closure is named with a let binding, you hit this error: type annotations needed. Why? Because sample_closure requires F: Fn(&mut Self). Fn cannot be generic over R: Rng.
  2. For the same reason, we cannot automatically construct a Distribution implementing object from a closure.

We could cheat and use dyn Rng instead, or we could do something horrible complicated with function (or rather trait-impl) currying, but it's probably easier to have users write their own Distribution impls.

@dhardy
Copy link
Member

dhardy commented Oct 31, 2023

@vks any thoughts on this? I'm tempted not to make any changes: it's significant breakage and most users appear to get on fine with the current design.

@vks
Copy link
Collaborator

vks commented Nov 2, 2023

I'm also not very happy with the unspecific name Standard, and I agree that distributions is too long. StandardUniform would be better (i.e. the unparametrized uniform distribution, similar to StandardNormal and StandardGeometric). However, this would be more restrictive than the current definition:

A generic random value distribution, implemented for many primitive types. Usually generates values with a numerically uniform distribution, and with a range appropriate to the type.

I think it's always a uniform distribution in practice (at least in Rand, and probably 3rd party impls as well).

I'm also not sure it's worth the churn, but maybe lets look at how much breakage this would cause:

  • Standard is mostly used by "convenience" wrappers like random() and Rng::gen(). I'm also not the biggest fan of them (I prefer sample()), but they are popular, and I think there would be a lot of protest against removing them (even if it's only for floats). I understand the point of discouraging random() * max, but this is a pattern that was spread by libraries for decades. Adding docs (or maybe even a lint?) seems like a better investment.
  • If we rename it, wouldn't it only affect people implementing Distribution<T> for Standard to extend random() / Rng::gen()? That would include people implementing this for enums, and bignum libraries, which does not seem so bad. Are there any other common use cases?
  • Renaming distribution to distr would probably cause a lot more breakage.

@dhardy
Copy link
Member

dhardy commented Nov 3, 2023

I think it's always a uniform distribution in practice

Is an Option having 50% chance of being None and 50% chance of being some Standard-sampled sub-value uniform? Debatable maybe but probably not. Maybe we shouldn't even support Option, but... it's probably useful in test-cases...

I'm also not the biggest fan of them

Same, and I might only consider them useful for something like fuzz-testing, where a uniform distribution may not even be the best choice. Especially for floats. But "garbage in, garbage out" kind of applies here: without contracts on input/output validity (e.g. ranges), fuzz-testing can't really tell you anything useful anyway.

It appears that quickcheck does still use Standard though a more specific distribution would have an advantage (but also the disadvantage of less third-party support).

The real argument for keeping Standard remains rand::random and Rng::gen. Is there a more appropriate name here? Maybe the Basic or Default distribution? Yes, but neither is significantly better IMO.

Renaming distribution to distr would probably cause a lot more breakage.

And you even forgot that the module name is plural, like std::collections. Yes, we could rename or alias (but not deprecate an alias). Renaming might be worthwhile.

@dhardy
Copy link
Member

dhardy commented May 24, 2024

Closing: no change.

There remains the option of renaming to StandardUniform, but that brings up more questions like:

  • Should we remove support for Option?
  • Should we add another distribution appropriate for fuzzing?
    ... so the easiest acceptable option is no change.

@Enter-tainer
Copy link
Author

I'm ok with this because I already learn the lesson. But I still think standard is not a good name. It it more like default than standard

@dhardy
Copy link
Member

dhardy commented May 24, 2024

Default would conflict with the std trait and likely be confusing.

@Enter-tainer
Copy link
Author

my concern is that this distribution is actually not "standard". i dont think there is a standard for random distributions.

i agree with your concern. it can be hard to find a suitable name for this.

@newpavlov
Copy link
Member

How about DiscreteUniform? I agree that Standard is not the best name, but unless we find a better alternative, I am personally fine with keeping it.

@vks
Copy link
Collaborator

vks commented May 30, 2024

I think DiscreteUniform is misleading, because it's not uniform for Option<T>, and for floats it's not really discrete either.

To justify the churn of renaming Standard, we would need a significantly better name.

I could live with StandardUniform, and possibly removing the Option<T> impl, but I'm not sure this is significantly better.

@purplesyringa
Copy link

I'd like to add that "Standard" is misleading not just because of search engines. "Standard deviation" is a common term, and it's often considered in relation to the normal distribution.

@dhardy
Copy link
Member

dhardy commented Oct 24, 2024

All in favour of one of the following, vote with an emoji:

  • Switch to StandardUniform - 🚀
  • Keep Standard - 👎
  • Switch to Default - 😕
  • Another suggestion: post below; people can vote 👍

@dhardy dhardy reopened this Oct 24, 2024
@vks
Copy link
Collaborator

vks commented Oct 25, 2024

@purplesyringa "Standard deviation" is not unique to the normal distribution. In fact, most (all?) distributions in Rand have one.

@benjamin-lieser
Copy link
Collaborator

Cauchy does not have one ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants