Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HighPrecision01 distribution #320

Closed
wants to merge 10,000 commits into from

Conversation

pitdicker
Copy link
Contributor

Separated out from #308.

dhardy and others added 30 commits January 10, 2018 09:25
 Implement Rng for Box with alloc feature
CloudABI has a special system call, random_get(), that can be used to
obtain random data from the environment (the kernel, emulator, etc). We
can invoke this system call by making use of the cloudabi crate.
Copied from dhardy/master, minus single-read functions, with extra tests
The planned switch to 256-bit byte array seeds does not allow such long seeds.
It's not being used anyway unless the std feature is enabled
@dhardy
Copy link
Member

dhardy commented Mar 22, 2018

@tspiteri can I interest you in reviewing?

@dhardy dhardy mentioned this pull request Mar 22, 2018
33 tasks
@dhardy
Copy link
Member

dhardy commented Mar 22, 2018

I had a thought: we could do extra testing with biased generators. E.g. if an RNG generated four numbers and XORed all, I think the mean output from this algorithm should be 2^-4 = 1/16. This might let us better test extremely unlikely outputs?

@dhardy
Copy link
Member

dhardy commented Mar 23, 2018

Continuing our discussion about using a wrapper type...

Range is currently parameterised only around the return type; it then uses Uniform internally to sample values before mapping to the target range. We could in theory parametrise Range also over a distribution, but while tempting, that distribution can't simply replace Uniform (the widening-multiply method would make no sense for other distributions), so in reality this would become just Range<T, Uniform> most of the time and not be very useful.

So using unique types for high-precision stuff is probably the best approach. In which case we might have to choose between something like one of the following:

// generic wrapper type:
let range = Range::new(HighPrecision(-1.0), HighPrecision(1.0));
let range = Range::new(HP(-1.0), HP(1.0));
// specific wrapper types:
let range = Range::new(f64(-1.0), f64(1.0));

I think possibly f32 / f64 is the best option?

Also @burdges perhaps you have a better idea of how to parametrise Range to use a different, high-precision, method internally?

@pitdicker
Copy link
Contributor Author

Continuing from #326 (comment):

I am slowly coming around to the idea of using a wrapper type. Would then something like this work?

let value = rng.gen::<HighPrecision<f32>>().0;
let value_in_range = rng.gen_range::<HighPrecision<f32>>(-10.0, 10.0).0;

Still slightly ugly, but no extra distributions.

@pitdicker
Copy link
Contributor Author

let range = Range::new(HighPrecision(-1.0), HighPrecision(1.0));

Ah, that is how it would look. Again seems ugly to me. That wrapper type has no meaning for the input values.

@pitdicker
Copy link
Contributor Author

I had a thought: we could do extra testing with biased generators.

I don't really follow what you are suggesting.
But I don't think testing using averages etc. is going to be all that useful. I did it before submitting this PR, and it did help catch a typo. It seems better to me to just test against a couple of known-good values, and leave it at that.

@pitdicker
Copy link
Contributor Author

One other thing I am thinking about (directly copied from the old open_closed01 method in the experimental branch):

This method is similar to closed_open01, with one extra step: If the fractional part ends up as zero, add 1 to the exponent in 50% of the cases. This changes 0.5 to 1.0, 0.25 to 0.5, etc.

The chance to select these numbers that straddle the boundary between exponents is 33% to high in closed_open01 (e.g. 33%*2^-23 for 0.5f32). The reason is that with 0.5 as example the distance to the next number is 2^-23, but to the previous number 2^-24. With the adjustment they have exactly the right chance of getting sampled.

So our HighPrecision01 is not 'perfect' yet, it would have to sample from [0, 1] to be able to represent every value with exactly the right chance.

But I am not sure this is something we want to fix. It would require an extra branch, and taking one bit from the exponent, which seems like it would complicate the function a lot. The worst case is 0.5f32, which will appear every once in 2^24 times too often.

@dhardy dhardy mentioned this pull request Mar 26, 2018
@dhardy dhardy added F-new-int Functionality: new, within Rand P-high Priority: high labels Mar 26, 2018
@dhardy
Copy link
Member

dhardy commented Mar 26, 2018

So "0.5" represents the space between 0.5 and 0.5+ε/2, whereas the next smallest number represents the space between 0.5-ε/4 and 0.5; indeed we always round down instead of rounding up 50% of the time. I guess a "fix" would be to have a 50% chance of adding 1 to the fractional part, and dealing with the case it wraps to zero properly (increase exponent).

But should we?

  1. The significance is tiny — 1 in 2^{fraction bits}, so arguably if it matters you should be using higher-precision floating-point types already
  2. For certain operations such as sample < p test the current behaviour is correct anyway.

So I'd say no.

@dhardy
Copy link
Member

dhardy commented Mar 26, 2018

Ah, that is how it would look. Again seems ugly to me. That wrapper type has no meaning for the input values.

You are right that the wrapper type solution is ugly, but the alternatives are:

  • whole new version of Range for high-precision float stuff (but identical on other types, if implemented) — more ugly
  • a new RangeImpl (e.g. RangeFloatHP) accessed via a different constructor, Range::new_hp; I think this needs a separate helper trait SampleRangeHP

The latter actually sounds reasonable, don't you think? Obviously this means it would never be used by Rng::gen_range but not a big deal.

I am wondering though if we should offer two versions of gen_bool, or at least document that rng.sample(HighPrecision01) < p offers the same performance if p is variable and has higher precision for p not a multiple of 2^-32.

@pitdicker
Copy link
Contributor Author

In my mind having two distributions, HighPrecision01 and HighPrecisionRange, is the simplest solution.

I wonder if we could even make do with a distribution HighPrecisionRange. If we write things right, the compiler should see that HighPrecisionRange(0.0, 1.0) doesn't do much more than val * 1.0 + 0.0, and optimize things out.

We don't need to support all methods from Range. range_inclusive and sample_single do not change things here or make things faster. So this can be a simple distribution, a bit like how Binomial takes two values.

@dhardy
Copy link
Member

dhardy commented Mar 26, 2018

It's not going to be HighPrecisionRange(0.0, 1.0) because IIRC there are four cases to handle differently, so a constructor is needed; thus I see no reason not to use Range<HighPrecisionFloatRange> or similar.

exp += bits.trailing_zeros() as i32;
// If RNG were guaranteed unbiased we could skip the
// check against exp; unfortunately it may be.
// Worst case ("zeros" RNG) has recursion depth 16.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion can be removed easily. The contents of fallback can look like the code below, which I also find a little easier to understand.

// Worst case ("zeros" RNG) will have 16 iterations (f64 exp bias / 64).
loop {
    let bits = rng.gen::<$uty>();
    exp += bits.trailing_zeros() as i32;
    
    // if exp reaches bias, return a subnormal that can be zero
    if exp >= $exponent_bias {
        return fraction.into_float_with_exponent(-$exponent_bias);
    }
    if bits != 0 {
        return fraction.into_float_with_exponent(-exp);
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just thought the recursion was neater since @pitdicker recommended this bit be in a separate function anyway (not inlined so that the common code is small). Since it will almost never recurse I don't see a problem with that strategy. But using a loop also works (was used in an older commit).

@pitdicker
Copy link
Contributor Author

I have been thinking about how to do a range implementation that can have precision like HighPrecision01.

Take for example a range like [0.0 .. 3.5]. If we generate a value in [0.0 .. 1.0] and multiply by 3.5 we get a (small) rounding mess. And a range like [-3.0 .. 7.0] is going to need some trickery to bring part of the unsymmetrical part over to the other sign.

When we only used 32/64 bits to generate the float this seemed acceptable. But if we call this 'as much precision as the floating-point format can represent', it gets messy. Is it instead possible to generate values directly inside the target range? I think it is possible, with only a very small part of the values that have to be rejected.

What is needed (for f32):

  • 1 bit for the sign.
  • 23 bits for the fraction.
  • an unknown number of bits for the exponent.

Let's assume the [-3.0 .. 7.0] range again.

  • This would need an exponent between 3 and -9, where each smaller exponent had 50% the chance of the larger one.
  • We have two zones to reject from: [-8.0 .. -3.0] and [7.0 .. 8.0].
  • We can improve on the last zone. If we have the largest exponent (3) we can zero out as many most significant bits of the fraction as are zero in 7.0. And the same is true in the zone [-4.0 .. -3.0].
  • We can also improve a bit on the first zone. Basically we generate the largest exponents (that are asymmetrical), 3 in this case, twice to often. We only use some of the exponents bits to get this exponent. Only the first bit for the highest exponent, two bits for the second highest etc. If we fall in this zone, we need to pick another bit. Based on that bit we reject the exponent in 50% of the cases. So we are only rejecting a couple of bits, not the whole value. And as we already have a mechanism in place to generate more bits just for the exponent, this is relatively cheap.

I think this would give us full precision, without the rounding problems.

@dhardy
Copy link
Member

dhardy commented Apr 3, 2018

Sounds very good!

Can we also eliminate the sign bit if not needed? If so I suspect this can be as fast as the [0, 1) implementation with those bounds (when statically known); even without eliminating the sign it may be close enough that we don't need both.

@dhardy
Copy link
Member

dhardy commented Apr 4, 2018

On second thoughts, it sounds complicated without having large rejection zones (and if you consider a range like [-1 .. 70] you effectively need to sample from (-128, 128) then with rejection you need to throw out approx 5/7ths of all possible samples).

I don't think your zeroing trick can work without bias. Numbers from [7, 8) should be rejected; to get there you need positive sign, exponent 3, and the top two bits of the fraction 1, but you can't just zero all the other fraction bits in this case because that introduces bias, and you need those bits in most cases. You could possibly only sample enough bits of the fraction to know you are in unusable territory (i.e. sign+exponent+top two bits, then next 1 bit pushes you over 7), but then you have to sample the next fraction part from however many bits are left, and it gets complicated (and likely slow).

What's your second idea — sample the sign bit and the first exponent bit, and if both are 1 (i.e. range (-8, 4] which is unusable) then skip those two bits? That is unbiased but I suspect it will be a bit slow because there's extra looping. Note that you do need to resample the sign bit too to avoid bias, and you can't just shift 1 bit because you know the next bit is 1 in this case which introduces bias.

This gives some improvement to the above range [-1, 70], but still rejects values from (70, 128) (assuming we don't do complex partial fraction rejection) and rejects bits from (-128, -1).

So it sounds to me like any perfect generation method would be slow.

There's also closed ranges to consider: ideally we would sample from [0, 1] by sampling from [0, 2) then rejecting anything in (1, 2), but this is (just a bit) ridiculous.

All FP arithmetic is designed to be lossy anyway once the maximum representable precision is exceeded. Perfectly producing random values seems like a lot of work to save one extra bit of precision — and if users care about precision they will already be using f64 which very likely has more precision than actually needed. So I think the multiplication method is the better option.

@tspiteri
Copy link
Contributor

tspiteri commented Apr 4, 2018

This seems to be settled, but just a small note: I agree that just one bit of precision is not worth the extra complexity for ranges that are not from 0 to 1; using mul_add(range_width, range_low) will result in only one rounding error after all.

@tspiteri
Copy link
Contributor

tspiteri commented Apr 4, 2018

Replying to my own comment, mul_add does result in one rounding error, but it can result in more than one bit of lost precision. If for example the range 0 to 1 is to be converted to the range -1 to 1, the result of rand01.mul_add(2.0, -1.0) will not have high precision around zero, as resulting numbers close to zero will come from something like close_to_half.mul_add(2.0, -1.0), which will have errors of the order of f32::EPSILON, not of the order of f32::MIN_POSITIVE.

But handling such cases should probably be left to the user of the library. In that particular example, if the library user really wants high precision in the range -1 to 1, they can just generate a number from 0.0 to 1.0 and then generate the sign using a random bool. The most that can be expected from the rand crate, I think, is to mention something in the docs.

@dhardy
Copy link
Member

dhardy commented Apr 5, 2018

No, this isn't settled, just recreated (#372) due to history cleanup.

@pitdicker already implemented a method to maintain (some) precision close to 0; see https://github.com/dhardy/rand/blob/experimental/src/distributions/range.rs#L432. I guess we will use that or similar code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand P-high Priority: high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants