Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of gen_range is platform dependent for usize #1399

Closed
ttencate opened this issue Feb 20, 2024 · 11 comments
Closed

Output of gen_range is platform dependent for usize #1399

ttencate opened this issue Feb 20, 2024 · 11 comments

Comments

@ttencate
Copy link

The output of Rng::gen_range is different depending on whether usize is 32 bits or 64 bits on the current platform, even if the requested range is small (0..12 in my case). For example:

    use rand::Rng;
    use rand_core::SeedableRng;

    let mut rng = rand_pcg::Pcg32::seed_from_u64(0);
    for _ in 0..16 {
        let x: usize = rng.gen_range(0..12);
        print!("{} ", x);
    }
    println!();

On a 64-bits system, the output is: 4 8 6 0 5 1 5 8 9 10 11 7 6 11 1 8
On a 32-bits system, the output is: 4 1 8 0 6 1 0 8 5 11 1 9 5 5 8 6

This does not affect SliceRandom::choose, thanks to some special handling in gen_index:

rand/src/seq/mod.rs

Lines 699 to 710 in 7ff0fc9

// Sample a number uniformly between 0 and `ubound`. Uses 32-bit sampling where
// possible, primarily in order to produce the same output on 32-bit and 64-bit
// platforms.
#[inline]
fn gen_index<R: Rng + ?Sized>(rng: &mut R, ubound: usize) -> usize {
if ubound <= (core::u32::MAX as usize) {
rng.gen_range(0..ubound as u32) as usize
} else {
rng.gen_range(0..ubound)
}
}

This means that, contrary to my expectations, my_slice.choose(&mut rng) and my_slice[rng.gen_range(0..my_slice.len())] are not equivalent (even ignoring the case of an empty slice).

I'm not sure if this qualifies as a bug, but it definitely surprised me, and took me some time to debug. I'm also not sure if it can be fixed, and if fixing it would constitute a breaking change.

In any case it could be documented, here in the book and here on Rng::gen_range.

@dhardy
Copy link
Member

dhardy commented Feb 20, 2024

We could add gen_index to the Rng trait and improve docs (by mentioning in gen_range).

Should we, in this case, make Rng::gen_index accept a range like Rng::gen_range?

We could go further and remove support for usize from Standard and gen_range, requiring usage of a special method like gen_index or casting... this would be quite disruptive.

@vks
Copy link
Collaborator

vks commented Mar 21, 2024

For this use case, we could also change gen_range to generate a u32 the range is small enough (for any integer type). This would make generating usize portable without adding more API surface.

@newpavlov
Copy link
Member

What about 16 bit targets? Would they generate u32 as well?

@vks
Copy link
Collaborator

vks commented Mar 22, 2024

Yes, we only have RngCore::next_u64 and RngCore::next_u32. The portability issue arises, because 64-bit platforms use next_u64 to generate usize, while 32-bit platforms use next_u32. If both used next_u32, the problem would be fixed. (For large ranges we would still have to use next_u64 on 64-bit platforms, but they would not fit into a 32-bit usize anyway.)

@dhardy
Copy link
Member

dhardy commented Mar 22, 2024

Similar, recent issues: #1373 #1415. Also, discussions by rand developers: comment in #1196, #809, #805. We should consider this again.

To be precise, there are two issues:

  1. In some cases, especially rng.gen_range(0..my_slice.len()), there is a hidden portability hazard.
  2. As mentioned above, my_slice.choose(&mut rng) and my_slice[rng.gen_range(0..my_slice.len())] are not equivalent.

Solutions

Document the portability hazard

In the book: Portability of usize

We could document this hazard in more places, especially on Rng::gen_range.

Test the range

We use this internal method for choose, added in #809:

fn gen_index<R: Rng + ?Sized>(rng: &mut R, ubound: usize) -> usize {
    if ubound <= (core::u32::MAX as usize) {
        rng.gen_range(0..ubound as u32) as usize
    } else {
        rng.gen_range(0..ubound)
    }
}

This is an extra conditional-jump on 64-bit platforms, but one which almost always resolves to the first branch in practice, and thus likely has a little cost in most cases.

We could use this approach also for the impl of Uniform (gen_range) for usize and isize.

Above + remove non-portable Standard impls

The leaves non-portable behaviour, which could be resolved by additionally removing usize and isize impls from Standard.

Note: sample_single_inclusive(0, usize::MAX, rng) should probably optimise down suitably as an alternative. This is still not great for anyone wanting a generic solution, but it's really hard to imagine why anyone would actually want that given that usize is used for sizes which are never uniformly distributed over all usize values (this even applies to offsets, unless your memory size is exactly usize::MAX + 1 bytes).

Remove all non-portable impls from Standard and Uniform

We implement equivalent code to the above, but not under the UniformSampler trait. Instead of Rng::gen_range, people must use Rng::gen_index.

This is more disruptive and less generic friendly than the above, for no obvious benefit.

@dhardy
Copy link
Member

dhardy commented Jul 12, 2024

Locations where non-portable generation of usize/isize values occurs:

  • impl rand::Fill for [usize] and for [isize]
  • rand::seq::index::sample_array (new code)
  • rand::distributions::Slice::sample
  • impl rand::distributions::Distribution<usize> for Standard, also for isize
  • impl Distribution<Simd<usize, LANES>> for Standard, also for isize
  • rand::distributions::UniformSampler
  • (WideningMultiply for usize; non-random and used by Uniform)

As a lesser form of non-portability, the following support usize values larger than u32::MAX but remain portable with 32-bit usize for smaller values:

  • rand::seq::index::sample
  • choose and shuffle methods

From the above, sample_array and Slice::sample should probably be made portable (like index::sample already is).


Non-portable wrapper

We could add a wrapper type, rand::Nonportable (or Unportable?), and implement Fill, Distribution and UniformSampler over this wrapper for all supported un-wrapped types and support wrapped usize, isize values.

We can then remove support for generating unwrapped isize, usize values, and add Rng::gen_index (repackaging of the private function seq::gen_index) for convenience.

@dhardy
Copy link
Member

dhardy commented Jul 26, 2024

FYI I'm strongly considering removing all support for random usize / isize values in #1471, except:

  • Rng::gen_index helper
  • Additional helpers? UniformUsize which uses range-specific u32 or u64 sampling?

We could even release this, then add sampling back later if there is a significant demand — though I'd prefer not to, since this is non-portable behaviour, and any case where random usize or isize values are required probably requires special attention (the obvious exception being gen_index).

@vks
Copy link
Collaborator

vks commented Jul 26, 2024

Couldn't we make it portable (by using gen_index) instead of removing it? This would cause less breakage and seems more user-friendly.

@dhardy
Copy link
Member

dhardy commented Jul 26, 2024

We could do. There's already UniformUsize in rand::distr::slice (private); I started making that public and more flexible, but then wasn't sure whether to go that route.

@dhardy
Copy link
Member

dhardy commented Aug 10, 2024

Small update: #1471 and alpha.2 have portable Rng::gen_range for usize and portable Rng::gen_index for usize; we probably won't be keeping both.

@dhardy
Copy link
Member

dhardy commented Sep 13, 2024

Changed in #1487.

@dhardy dhardy closed this as completed Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants