Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poisson: Fix undefined behavior and support f64 output #795

Merged
merged 9 commits into from
May 16, 2019

Conversation

vks
Copy link
Collaborator

@vks vks commented May 15, 2019

No description provided.

vks added 2 commits May 15, 2019 14:27
Before, we might trigger undefined behaviour if the sample gets too
large.
Internally, we generate `f64`, so it makes sense to let the user access
that result directly, instead of forcing a conversion from `u64`.
@dhardy dhardy mentioned this pull request May 15, 2019
22 tasks
@vks vks added this to the 0.7 release milestone May 15, 2019
Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Now the only question is whether to bother with an f32 version. I don't see a lot of point, but it would still be a breaking change to introduce later (because Poission becomes Poisson<f64>).

impl Distribution<u64> for Poisson {
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> u64 {
let result: f64 = self.sample(rng);
assert!(result >= 0.);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already guaranteed by the algorithm (see loop break condition), though the check is cheap enough relative to the algorithm that it doesn't matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should add #[inline], so the check can be optimized away?

I think it makes sense to have the check to avoid undefined behavior, in case the the implementation somehow is wrong and returns negative numbers or nan.

@vks
Copy link
Collaborator Author

vks commented May 15, 2019

I added f32 sampling. We could probably have a specialized version of log_gamma for f32 that uses less precision.

rand_distr/src/poisson.rs Outdated Show resolved Hide resolved
rand_distr/src/utils.rs Outdated Show resolved Hide resolved
@@ -91,33 +111,33 @@ impl Float for f64 {
/// `Ag(z)` is an infinite series with coefficients that can be calculated
/// ahead of time - we use just the first 6 terms, which is good enough
/// for most purposes.
pub(crate) fn log_gamma(x: f64) -> f64 {
pub(crate) fn log_gamma<N: Float>(x: N) -> N {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an optimal f32 implementation and I'm not sure it's even an acceptable one (e.g. the a parameter below rounds to 1). I would suggest just using f64 arithmetic and converting at function start/end, though there's very little point to supporting Poisson<f32> in this case (only really for the type deduction).

Additionally, this function prototype is okay for internal use but not if we wished to expose log_gamma since we could not add correct implementations for other float types without specialization. A trait would be better, though clunky since we only have one method.

Statrs's Gamma function implementations are likely better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It essentially evaluates a polynomial, so I think it should be fine, even if the coefficients have less precision. I thinks it is more likely that too many coefficients are used for the target precision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though there's very little point to supporting Poisson in this case (only really for the type deduction).

It might help for sampling from the uniform distribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this help with the uniform distribution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing some rejection sampling for the Poisson distribution, sampling f32 instead of f64 in the loop might help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean speed up usage of uniform, not help implement uniform. Yes, and when we get a specialisation of log_gamma it may speed things up for any(?) situation where f32 is faster than f64.

@@ -137,29 +137,41 @@ mod test {
fn test_poisson_10() {
let poisson = Poisson::new(10.0).unwrap();
let mut rng = crate::test::rng(123);
let mut sum = 0;
let mut sum_u64 = 0;
let mut sum_f64 = 0.;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of this test? We know both values should be the same since (a) both should be non-negative integers and (b) we should not be going anywhere close to the limits/accuracy of either type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to test that the code path works as expected, not the precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same code path, aside from the extra cast.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to trigger the path with the cast, making sure it works. I think it is important because of the potential undefined behavior.

@vks
Copy link
Collaborator Author

vks commented May 15, 2019

I changed log_gamma to always use f64. I think the previous implementation was okay though.

@@ -67,6 +67,7 @@ where Standard: Distribution<N>
impl<N: Float> Distribution<N> for Poisson<N>
where Standard: Distribution<N>
{
#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large function — I don't think #[inline] makes any sense here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not #[inline(always)], so the compiler is still free to make that choice. It might make sense to inline it into the u64 sampling, so that some bound checks can be eliminated.

@dhardy dhardy merged commit c0b8722 into rust-random:master May 16, 2019
@vks vks deleted the fp-poisson branch June 3, 2019 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants