Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a module with helper functions #989

Open
newpavlov opened this issue Jun 23, 2020 · 22 comments
Open

Add a module with helper functions #989

newpavlov opened this issue Jun 23, 2020 · 22 comments

Comments

@newpavlov
Copy link
Member

newpavlov commented Jun 23, 2020

One of the major rand critiques, which I've recently seen, argues that it makes hard to write "quick and dirty" code. This motivates creation of crates like fastrand, which could lead to an unfortunate ecosystem split and overall code duplication.

We already have the random helper function and I think it could be worth to add other helper functions as well. To make docs less cluttered we probably should keep them in a separate top-level module (something like helpers?). With the potential migration to ChaCha8/12, those functions can be simple wrappers around thread_rng and be fast enough for all, but most demanding use-cases. One of side-effects of using helpers is that trait imports are no longer needed for simple use-cases, so it could be seen as a work-around until something like rust-lang/rfcs#2375 gets added to the language.

A preliminary list of helper functions:

We could tweak naming a bit to make it a bit more natural.

cc @stjepang

@dhardy
Copy link
Member

dhardy commented Jun 25, 2020

Is this primarily because rand::thread_rng().shuffle(&mut v) is too long to write or because this is too hard to learn?
Or is it (for whatever reason) because the rand crate is too big with too many dependencies?

We should be careful about adding more convenience functions (complicating the API for ease-of-use), and also realise that we can't be all-things-to-everyone (some people really care about minimising dependencies/size for whatever reason; rand is limited here by the requirement for a cryptographically-secure default generator).

@vks
Copy link
Collaborator

vks commented Jun 25, 2020

One of the major rand critiques, which I've recently seen, argues that it makes hard to write "quick and dirty" code. This motivates creation of crates like fastrand, which could lead to an unfortunate ecosystem split and overall code duplication.

There are a lot of crates like this, implementing some subset of Rand's functionality. There does not seem to be any convergence on a "quick and dirty" randomness crate, with every author creating their own variant. I don't see what we can do about this code duplication.

How popular are these crates? I'm not sure that there is a risk of an ecosystem split.

Looking at fastrand, it seems the main API simplification is getting rid of thread_rng() by making it implicit (like rand::random) and replacing Rng::gen<T> with rand::T for primitive types. I agree with @dhardy that this overlaps with Rng, which is already a wrapper for convenience. Having both seems too much? If most use cases don't require an RNG different from thread_rng(), we could replace Rng with the simplified API.

@vks vks closed this as completed Jun 25, 2020
@vks vks reopened this Jun 25, 2020
@newpavlov
Copy link
Member Author

Don't forget that helper functions not only shorter (due to the implicitness of thread_rng()), but also don't require traits import. I think it makes easier for novice Rust programmers to start using rand and reduces a certain amount of paper-cuts for experienced ones. Also it's a bit strange to have only a single helper function, which arguably is not overwhelmingly more popular than other ones.

Since those functions are trivial and will be kept in a separate module, I don't think they will be perceived as an API complication and additional maintenance burden will be minuscule.

Judging by a cursory look at these search results, I would say that the proposed helper functions will cover more than half of explciit thread_rng() uses. (It would be nice to have a more thorough analysis, but it would take a significant amount of time...)

overlaps with Rng, which is already a wrapper for convenience.

I wouldn't call it a convenience wrapper. It's an extension trait which implements additional functionality.

@vks
Copy link
Collaborator

vks commented Jun 25, 2020

I wouldn't call it a convenience wrapper. It's an extension trait which implements additional functionality.

Not really, it only wraps Distribution and RngCore methods.

@dhardy
Copy link
Member

dhardy commented Jun 25, 2020

There is also good reason to keep Rng even if we do add these "helper functions", since Rng is also useful on other RNGs. Often we just tell people to use rand::prelude::*; in which case the trait import isn't important.

Since those functions are trivial and will be kept in a separate module

The main drawback IMO is that in the docs there will be a long-ish section at the bottom of all these functions. But random already fits in with this list perfectly leaving only thread_rng as the slightly-obscured "vitcim", so it's not a big drawback.

I wouldn't want to remove thread_rng since it has other uses (also from_rng):

let mut rng = thread_rng();
let dist = Uniform::from(1..11);
let sample = dist.sample(&mut rng);

Conclusion: I see only very minor advantages and drawbacks — so either option sounds acceptable.

@vks
Copy link
Collaborator

vks commented Jul 1, 2020

So to resolve this, we would have to implement the following?

Via thread_rng():

  • rand::T for primitive integers T
  • rand::shuffle
  • rand::digit
  • rand::{alphanumeric, alphabetic, lowercase, uppercase}

New methods for Rng:

  • Rng::T for primitive integers T (note the subtle difference between Rng::bool and Rng::gen_bool)
  • Rng::shuffle (equivalent to SliceRandom::shuffle)
  • Rng::digit
  • Rng::{alphanumeric, alphabetic, lowercase, uppercase}

@vks vks mentioned this issue Aug 5, 2020
19 tasks
@dhardy
Copy link
Member

dhardy commented Aug 28, 2020

My opinion is that we should not attempt to "resolve" this issue without more feedback on what users want and why. Of the above, I don't see anything wrong with adding Rng::shuffle. We could also add rand::rng as an alias for thread_rng.

Adding methods like rand::i16 — the question is really where do you stop? IMO it would mostly just be clutter in the docs.

One thing we can't compete with fastrand on is being very small, simple and zero-dependency. For some people/cases that is a valid feature (probably mainly because people like using something they can understand).

Ironically the selling point of fastrand is not its speed so much as its simplicity.

@dhardy
Copy link
Member

dhardy commented Nov 19, 2020

I was hoping more people might pitch in with what they would / would not like to see. So, here's my opinion (again):

  1. Adding Rng::shuffle (for slices) might be okay, but if we did, then what about Rng::choose and Rng::choose_multiple?
  2. Other methods seem mostly fillers
  3. We don't have some of the above generators such as digit, alphabetic, lowercase and uppercase, but do we want them?

Regarding (1): The choose methods are implemented on both slices and iterators under other traits (SliceRandom and IteratorRandom), but supporting both on Rng would require a new trait (impl'd for I: Iterator and &[T]). More convenience for significantly more complexity; is that worth it?

Regarding (3): Should we impl SampleUniform on char? If we did, this would enable things like rng.gen_range('A'..='Z'), but there is a "hole" in the char range (0xD800..=0xDFFF) which should never be generated. Maybe the hole isn't a big deal though (just an awkward impl).

@sicking
Copy link
Contributor

sicking commented Dec 1, 2020

I've always felt that Rust needs a crate for "simple" random number generation. It's definitely hard to draw the line for exactly what to include in such a crate though.

To answer the question in #989 (comment), I think the goal of such a crate would primarily be "easy to learn" rather than "few characters to type". To accomplish this I would heavily bias towards having few functions in the API.

My proposal would be to include just

  • Generate u32 in range [0, N)
  • Generate f64 in range [0, 1)
  • Generate integer type (maybe including bool, but not including char)
  • Shuffle array
  • Create explicitly seeded generator
  • Create randomly seeded generator (default behavior)

And that these functions would use a cryptographically-secure algorithm and optimized for precision to the extent rand is right now (i.e. use exclusion zones for integer generation, but just 53 bits of precision for floats)

Most other commonly needed things can be easily and intuitively built on top of those functions.

Possibly more functions than the above should have convenience APIs. But it's always easy to add more over time.

However I don't think the rand crate will ever be the crate that matches this goal. Mainly because I believe the "easy to learn" goal implies "small API" which rand decidedly is not. And adding more helpers only moves it in the other direction. But possibly a separate crate built on rand_core could be such a crate.

Honestly I think fastrand gets the API close enough. But I think it has some unfortunate quirks like not using cryptographically secure generation, and the fact that you can seed the TLS generator feels pretty scary to me.

But I think that the rand crate also fulfills an important but different role of being a one-stop-shop for all randomness needs, including more complex algorithms and a more rich API

@dhardy
Copy link
Member

dhardy commented Dec 1, 2020

But possibly a separate crate built on rand_core could be such a crate.

Interesting direction: adding rand_easy or some such built over rand. I'm okay with that.

@dhardy
Copy link
Member

dhardy commented May 21, 2024

This issue is stale.

Personally I think we should just point new users at the Getting started guide.

Close?

@dhardy dhardy closed this as completed Jul 20, 2024
@oconnor663
Copy link
Contributor

oconnor663 commented Aug 14, 2024

I've been thinking about this for years, but I only just now decided to go digging for it in the issues list :) I occasionally make use of rand::random() to flip a coin in an if statement, and it feels really nice to do that. But more commonly I want to do something random with a Vec, and then I need to do the use rand::prelude::* and ThreadRng dance, and it's a shame that that's 3x more steps. My intuition is that it makes sense for shuffle, choose, and gen_range to be top-level functions like random. (Maybe even call the last one "range"?)

If I wanted to make a biased correctness/security argument for this, I'd point out that it's very convenient but of course wrong to generate a small range integer like this:

let x = rand::random::<u32>() % 10;

It would be quite nice if instead we could tell people to do this:

let x = rand::range(0..10);

@dhardy
Copy link
Member

dhardy commented Aug 14, 2024

You could import like this?

use rand::{random, prelude::*};

In general, using trait functions without importing is a pain (but possible with fully-qualified paths). I suppose we could add functions like rand::shuffle(&mut my_list) to avoid needing extra imports.

@oconnor663
Copy link
Contributor

I just opened #1488 as a proof-of-concept. Would love to get other people's thoughts.

@dhardy
Copy link
Member

dhardy commented Aug 21, 2024

My thoughts (from #1488) are that we should instead make Rng methods inherent methods on ThreadRng (either via the RFC or via some macro we can also use on SmallRng etc.). This avoids needing to import rand::Rng for common usages.

Potentially also add these inherent methods (selected subsets of SliceRandom functionality):

  • fn choose<IR: IndexedRandom>(&mut self, slice: IR) -> Option<&Self::Output>
  • fn choose_mut<IR: IndexedMutRandom>(&mut self, slice: IR) -> Option<&mut Self::Output>
  • fn shuffle<S: SliceRandom>(&mut self, slice: S)

Potentially, we could also remove rand::random (preferring rand::thread_rng().random()) which is only free function behaving as a shim over an Rng method.


The above additions to the API are mostly hidden under ThreadRng (potentially also StdRng, SmallRng and other RNGs) and possibly Rng (if we add the above choose/slice methods there too). #1488 instead proposes adding more free functions which makes the top-level documentation busier (while only supporting ThreadRng).

@oconnor663
Copy link
Contributor

Hmm, thinking some more about what I'm really going for here... Forgive me for a wall of text :)

The short version is that I think there are two separate papercuts. The first is that you have to import the trait, but the second is that you have to understand what thread_rng means and that it's the right thing to use. I love removing the first one, and I think there's ~equal value in removing the second.


The long version of that same thought: As you know much better than I do, rand holds the high honor of being one of only three library crates that appear in The Book. (And in Chapter 2! The others don't show up until Chapter 19.) Of course it's common to use randomness in beginner examples in any language. It's fun to play with, rank beginners instantly understand what it means, and a lot of them even ask for it unprompted without knowing the technical name. So the marginal value of fixing papercuts in entry-level randomness APIs is exceptionally high by papercut standards.

Here's how it reads when Chapter 2 gets to rand, when the only other Rust program most readers have ever seen is the 3-line "hello world" example from Chapter 1:

Let’s start using rand to generate a number to guess...

use rand::Rng;
...
let secret_number = rand::thread_rng().gen_range(1..=100);

First we add the line use rand::Rng;. The Rng trait defines methods that random number generators implement, and this trait must be in scope for us to use those methods. Chapter 10 will cover traits in detail.

Next, we’re adding two lines in the middle. In the first line, we call the rand::thread_rng function that gives us the particular random number generator we’re going to use: one that is local to the current thread of execution and is seeded by the operating system. Then we call the gen_range method on the random number generator. This method is defined by the Rng trait that we brought into scope with the use rand::Rng; statement. The gen_range method takes a range expression as an argument and generates a random number in the range.

If we made the gen_range method inherent one way or another, that section would get a good bit shorter:

Let’s start using rand to generate a number to guess...

let secret_number = rand::thread_rng().gen_range(1..=100);

In the first line, we call the rand::thread_rng function that gives us the particular random number generator we’re going to use: one that is local to the current thread of execution and is seeded by the operating system. Then we call the gen_range method on the random number generator. The gen_range method takes a range expression as an argument and generates a random number in the range.

But imagine if it could be this short:

Let’s start using rand to generate a number to guess...

let secret_number = rand::range(1..=100);

The range function takes a range expression as an argument and generates a random number in the range.

One question I'd be particularly happy to shield Day 1 beginners from is "What is a thread?" I could imagine some compromise proposals along that line. Maybe we could define default_rng or even just rng as an alias of thread_rng. That plus the inherent methods could get us down to rand::rng().gen_range(1..=100). That's an improvement I could get behind. But if we got started in that direction, I don't know why we'd want to stop :)

Another option we could consider (with "we" here maybe being The Book more than rand itself) is defining some sort of lightweight_rand crate. It seems like many have tried, but rand just does a damn good job, and it's hard to compete with. Another compromise option might be some sort of wrapper crate that delegated all the real work to rand but got creative with the public API. That's viable, but of course something important would be lost: it's good when we can steer beginners towards the tools that experienced folks actually use.

I'm positively biased here by my own experiences with the blake3 API. We expose some freestanding functions for the most common use cases (like hash), and we bury the more advanced stuff (like finalize_xof) in the inherent methods of different structs. The docs for the freestanding functions become a natural jumping-off point for learning about the other things you can do. As far as I know this has worked well for users, and I'm happy with it.

@dhardy
Copy link
Member

dhardy commented Aug 22, 2024

rand::range

This hides the source of randomness. For some people & usages that's completely fine, but it can also be confusing, especially if someone can't work out why they have to do several other steps (import a trait and explicitly provide the RNG) just to change the RNG.

thread_rng vs default_rng vs rng: you make a good point that people shouldn't need to know about thread-locality (especially since results are never reproducible). We could change this to just rng; my only real concern is how many lines of code will need to change as a result.

gen_range vs range: it was the case that Rng::gen was the base method name, and gen_range, gen_bool etc. were variants of this. Then gen became a reserved keyword so we renamed to Rng::random (matching rand::random). So we could now drop all those gen_ prefixes (but with gen_iter becoming random_iter).

We expose some freestanding functions for the most common use cases (like hash)

The documentation for hash should probably point out the equivalent (results-wise) implementation using Hasher::new, update, finalize. But given that there is only a single hasher type and a very small API in total, there's not a lot of need.

oconnor663 added a commit to BLAKE3-team/BLAKE3 that referenced this issue Aug 22, 2024
@oconnor663
Copy link
Contributor

The documentation for hash should probably point out the equivalent (results-wise) implementation using Hasher::new, update, finalize.

Good point, done! BLAKE3-team/BLAKE3@08cb01c

@dhardy dhardy mentioned this issue Sep 10, 2024
1 task
@dhardy
Copy link
Member

dhardy commented Oct 3, 2024

To update my thoughts on this:

Making RNG methods inherent using macros as in #1492 is a little messy and not extensible to anything outside of rand, therefore probably not the way to go. (I updated the PR anyway for comparative purposes.)

I approve of the idea of adding rand::range instead, however I'd like the name to match that in Rng::gen_range. Since gen is to become a keyword for generators, I think we should go with just range, though it is significant breakage (being a widely used method).

I dislike gen_bool / gen_ratio nominatively, especially in that the latter does not generate a ratio (just like gen_range does not yield a range), but also in that gen_bool does not describe what type of parameter it accepts. Maybe we should rename to bool_p / bool_ratio or bernoulli_p / ...? Or just p / ratio? I'm not convinced by any of these. Depending on the name(s) chosen, we may also add a top-level function for booleans.

We could implement rand::choose as a multiplexer over choose for &[T] and choose_mut for &mut [T], but unfortunately not for I: Iterator (demo; would require Specialization or a specific negative trait bound in std to support iterators too, so this might be a viable extension in the future).

I think perhaps we should rename rand::thread_rng to just rand::rng without renaming ThreadRng. The justification is that the RNG being thread-local is an implementation detail, and not one people need to know about (or choose between, given the lack of alternatives). Arguments against are that thread_rng is much more searchable, avoids a breaking change, and better matches the name ThreadRng (which cannot be renamed to Rng due to ambiguity with the trait). Another possible argument against is that rand::rng may make it less obvious that user-defined alternatives such as let mut rng = SmallRng::seed_from_u64(0); are possible.

@oconnor663
Copy link
Contributor

We could implement rand::choose as a multiplexer over choose for &[T] and choose_mut for &mut [T]

Ooo I love that idea.

@dhardy
Copy link
Member

dhardy commented Oct 9, 2024

Name changes have been pushed off to #1503. Pending this, we can add rand::gen_range (or whatever that gets named); potentially also gen_bool.

Lets not add rand::choose or shuffle. It would be a partial duplication of iterator/slice choosing functionality (which support choosing multiple elements, weighted choices and partial shuffles, with distinct differences in API as is appropriate for slices vs iterators). Sorry if this is disappointing, but importing use rand::prelude::*; to enable method call on iterators/slices is not hard and already has a proven API.

I'm still on-the-fence regarding rename thread_rngrng:

  • For direct usage without import, rand::rng() is shorter than rand::thread_rng() while remaining easy to understand
  • If used after import, thread_rng() is fairly clear while rng() is just confusing. We can recommend not importing rand::rng, but not everyone will read (or follow) that advice.
  • Regarding the above: note that thread_rng is currently in rand::prelude (along with random). This should maybe change anyway.
  • It's easier to search a project for usages of thread_rng than it is rng; searching rand::rng is insufficient due to multi-item imports.

On the whole, I like the idea of this rename, after removing from the prelude.

@oconnor663
Copy link
Contributor

Sorry if this is disappointing, but importing use rand::prelude::*; to enable method call on iterators/slices is not hard and already has a proven API.

No worries :) range is the only one I feel particularly strongly about, because of the combination of how often it's used and how convenient it is to use % incorrectly instead.

If we end up sending most beginners to range, then I could go either way on the rng vs thread_rng name for the generator. I guess the upside of adding beginner-friendly entrypoints is that it frees us up a bit to use more descriptive/technical names for the parts of the API that aren't right in your face.

dhardy added a commit that referenced this issue Oct 31, 2024
Adds random_iter, random_range, random_bool, random_ratio, fill.
See also #989, #1503.

Co-authored-by: Diggory Hardy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants