Add binomial and Poisson distributions #96

fizyk20 · 2016-01-22T20:40:41Z

This pull request adds the binomial and Poisson distributions using "step-by-step" generation for low expected values and the rejection method for higher ones.

The code is heavily based on the algorithms presented in "Numerical Recipes in C" - it's obviously not just copy-pasted and the algorithms are rather widely known, but I'm still not sure if that doesn't cause any licensing problems.

mkindahl · 2017-09-12T05:11:51Z

Not sure what dependencies that are allowed by the library, but POSIX.1-2001 defines both the tgamma and lgamma function for computing the gamma function and ln-Gamma function, respectively. Hence the log gamma function might only be necessary to define if the intention is to not be dependent on POSIX.

fizyk20 · 2017-10-24T12:20:55Z

I don't think the crate should depend on POSIX, it's also supposed to work on Windows, after all. That being said, we could conditionally compile to use the POSIX lgamma on systems supporting it, but I'm not sure if it's worth the effort.

Also, it looks like this PR might be made obsolete by the future changes in the API anyway, so... ;)

dhardy · 2017-10-24T13:30:12Z

I don't know about obselete. There's been no discussion yet of whether distributions should be moved to a separate crate; having these distributions would still be useful.

dhardy · 2018-03-04T09:08:44Z

@fizyk20 are you happy to update this now? I think we could merge now.

https in headers please
impl Distribution not *Sample

fizyk20 · 2018-03-04T13:46:44Z

Awesome! Rebased, fixed and squashed. If any other changes are necessary, please let me know.

dhardy · 2018-03-04T13:57:11Z

Thanks! Are you force-pushing? (Was told page was out of date twice.) Just let me know when this is ready for review..

fizyk20 · 2018-03-04T13:58:58Z

Yes, I was - I realised that my rustfmt ran automatically and modified pretty much the whole crate, so I amended the commit so that the changes are minimal. Should be ready now 👍

dhardy

Ah, you remembered my aversions to rustfmt 🤣

Just had a quick look at the API and doc; looks good! I'd still like to go over the maths though I don't expect any probleems.

dhardy · 2018-03-04T13:59:24Z

src/distributions/log_gamma.rs

+
+/// Calculates ln(gamma(x)) (natural logarithm of the gamma
+/// function) using the Lanczos approximation with g=5
+pub fn log_gamma(x: f64) -> f64 {


Would be good to see a little more doc on this.

Edit: sorry, it's not exported so this is fine.

Haha, I added more before I noticed the edit :P Well, it won't hurt, and may help in reading the code ;)

Looks good anyway 👍

dhardy · 2018-03-04T14:00:19Z

src/distributions/binomial.rs

+}
+
+impl Distribution<u64> for Binomial {
+    fn sample<R: Rng>(&self, rng: &mut R) -> u64 {


We're using R: Rng + ?Sized for now. (I'm not sure whether to change to R: RngCore + ?Sized; I'm actually surprised this compiles since it differs from the trait.)

I guess it wouldn't compile if some code tried to use it with an unsized R - but I'd expect it to notice the difference, too. Maybe it's something that could be reported to the compiler team?

dhardy · 2018-03-04T14:05:19Z

src/distributions/binomial.rs

+    /// `n`, `p`. Panics if `p <= 0` or `p >= 1`.
+    pub fn new(n: u64, p: f64) -> Binomial {
+        assert!(p > 0.0, "Binomial::new called with `p` <= 0");
+        assert!(p < 1.0, "Binomial::new called with `p` >= 1");


Why quote p in the assert messages? I think better not to (also for λ later)

dhardy · 2018-03-04T14:06:30Z

src/distributions/poisson.rs

+}
+
+impl Distribution<u64> for Poisson {
+    fn sample<R: Rng>(&self, rng: &mut R) -> u64 {


R: Rng + ?Sized again

dhardy · 2018-03-04T14:07:38Z

src/distributions/poisson.rs

+/// The Poisson distribution `Poisson(lambda)`.
+///
+/// This distribution has density function: `f(k) = lambda^k *
+/// exp(-lambda) / k!` for `k >= 0`.


Better not to split the code block over multiple lines IMO

dhardy

Some comments; still needs more review of the maths (which are not trivial).

dhardy · 2018-03-05T15:57:11Z

src/distributions/binomial.rs

+            if expected < 25.0 {
+                let mut lresult = 0.0;
+                for _ in 0 .. self.n {
+                    if rng.gen::<f64>() < p {


@pitdicker what do you think about this probability test? I've been wondering if we should add a dedicated Bernoulli distribution for more accurate sampling.

I should first learn a lot before I can make any meaningful comments w.r.t. the distributions...

This single line is pretty much the Bernoulli distribution? It might be more generally useful than gen_weighted_bool.

Yes, I think we should add a Bernoulli distribution, and it would be nice to have it reasonably accurate for small p.

If we can do better than just rng.gen::<f64>() < p, then sure, this is a good idea, otherwise I'm not really convinced that it makes sense to make it a separate distribution.
And as for doing better, that's a bit over my head, so unfortunately I won't be able to help...

To fill you in @fizyk20, @pitdicker already did quite a bit of work implementing higher-precision floating point sampling, since the default method uses the same precision over the 0-1 range despite the format being able to represent a lot more close to 0 — however, we seem to have decided not to use this sampling method by default. There's also the thing that we use a small offset, which normally isn't an issue, but might be for correct sampling of small probabilities.

Almost available with Rng::gen_bool(p) from #308.

dhardy · 2018-03-06T09:37:46Z

src/distributions/poisson.rs

+            lambda: lambda,
+            exp_lambda: (-lambda).exp(),
+            log_lambda: lambda.ln(),
+            magic_val: lambda * lambda.ln() - log_gamma(1.0 + lambda),


Why call lambda.ln() twice? Also, the choice of method could be made here and stored in an enum; this reduces the number of parameters since exp_lambda and the last two parameters are not used simultaneously. Maybe also cache (2.0 * self.lambda).sqrt() since SQRT is slow?

Just an oversight, I wouldn't be surprised if the compiler optimised that, though.
And sure, caching the sqrt sounds reasonable.

dhardy · 2018-03-06T09:54:54Z

src/distributions/poisson.rs

+    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> u64 {
+        // using the algorithm from Numerical Recipes in C
+
+        // for low expected values use the Knuth method


Would it be better to use the inverse transform method for small samples, since it only requires 1 random sample? I don't know a lot about this topic unfortunately.
https://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables

To be honest, I don't know, my knowledge is also limited. Maybe it is a good idea, sampling just once sounds attractive. I mostly just ported the algorithm from "Numerical Recipes", but it is possible that something else could be better.

Hmm; unless an expert in this area turns up (unlikely), perhaps the best we can do is implement some tests (e.g. plot a high-resolution histogram), then say this is ~~good enough for now~~ the best we can do.

pitdicker · 2018-03-06T12:05:22Z

One small thing to add yet are benchmarks.

dhardy · 2018-03-15T12:21:58Z

src/distributions/mod.rs


 mod float;
 mod integer;
+mod log_gamma;


This module also needs the cfg gate for std-only

pitdicker · 2018-03-18T14:18:45Z

I have just gone over the code here and in "Numerical Recipes in C" side by side. It is a little different and uses much clearer function names. But it seems like a clean translation to me.

We should get more expert in the various distribution sampling methods, and may want to pick different algorithms in the future. But that will take time, I think this PR is just good to have in the meantime.

My result with plotting 100.000 samples of the binomial distribution (using simple spreadsheet):

    #[test]
    fn test_binomial_distr() {
        let mut results = [0; 41];
//        let binomial = Binomial::new(20, 0.5);
//        let binomial = Binomial::new(20, 0.7);
        let binomial = Binomial::new(40, 0.5);
        let mut rng = ::test::rng(123);
        for _ in 0..100_000 {
            let sample = binomial.sample(&mut rng);
            if sample <= 40 { results[sample as usize] += 1 }
        }
        let sum = results.iter().sum::<u64>() as f64;
        for sample in results.iter() {
            println!("{}", *sample as f64 / sum);
        }
        panic!();
    }

And the same for Poisson:

    #[test]
    fn test_poisson_distr() {
        let mut results = [0; 21];
//        let poisson = Poisson::new(1.0);
//        let poisson = Poisson::new(4.0);
        let poisson = Poisson::new(10.0);
        let mut rng = ::test::rng(123);
        for _ in 0..100_000 {
            let sample = poisson.sample(&mut rng);
            if sample <= 20 { results[sample as usize] += 1 }
        }
        let sum = results.iter().sum::<u64>() as f64;
        for sample in results.iter() {
            println!("{}", *sample as f64 / sum);
        }
        panic!();
    }

It is very primitive, but both look very plausible to me when compared to Wikipedia (Binomial, Poisson) 😄.

What is left to finally get this PR over the finish line (after 2+ years)?

std-only feature gate (also fixed CI error)
benchmarks
maybe use Rng::gen_bool(p)
I see some tests that are disabled for msvc, I suppose two years ago it was not completely reliable yet?

@dhardy What do you think of merging this PR, and I make a PR with those tiny fix-ups?

pitdicker · 2018-03-18T14:28:39Z

The distributions are slow at the moment though:

test distr_uniform_f64          ... bench:       2,769 ns/iter (+/- 7) = 2889 MB/s (baseline)

test distr_binomial             ... bench:      87,178 ns/iter (+/- 373) = 91 MB/s
test distr_exp                  ... bench:       6,445 ns/iter (+/- 38) = 1241 MB/s
test distr_gamma_large_shape    ... bench:      17,451 ns/iter (+/- 152) = 458 MB/s
test distr_gamma_small_shape    ... bench:      76,848 ns/iter (+/- 2,983) = 104 MB/s
test distr_log_normal           ... bench:      23,258 ns/iter (+/- 431) = 343 MB/s
test distr_normal               ... bench:       6,196 ns/iter (+/- 46) = 1291 MB/s
test distr_poisson              ... bench:      31,369 ns/iter (+/- 330) = 255 MB/s

dhardy · 2018-03-18T16:38:36Z

Ok. I created a tracker: #310

pitdicker · 2018-03-18T16:39:12Z

Actually I have a branch ready.

pitdicker · 2018-03-18T16:39:24Z

🎉

Add binomial and Poisson distributions

alexcrichton added the F-new-int Functionality: new, within Rand label Jun 14, 2017

dhardy mentioned this pull request Sep 11, 2017

Add Poisson distribution #170

Closed

fizyk20 force-pushed the discrete branch from de6af81 to 010349d Compare March 4, 2018 13:45

fizyk20 changed the title ~~Added binomial and Poisson distributions~~ Add binomial and Poisson distributions Mar 4, 2018

fizyk20 force-pushed the discrete branch 2 times, most recently from 2ced48c to b7c59c6 Compare March 4, 2018 13:54

Add Poisson and binomial distributions

629aa94

fizyk20 force-pushed the discrete branch from b7c59c6 to 629aa94 Compare March 4, 2018 13:55

dhardy reviewed Mar 4, 2018

View reviewed changes

Address review comments

3a3fa47

dhardy reviewed Mar 6, 2018

View reviewed changes

Address more comments

8e9cec0

dhardy mentioned this pull request Mar 11, 2018

Tracker: planned changes for 0.5 #232

Closed

33 tasks

dhardy added P-medium D-review Do: needs review labels Mar 12, 2018

dhardy reviewed Mar 15, 2018

View reviewed changes

src/distributions/mod.rs

mod float;

mod integer;

mod log_gamma;

Copy link

Member

dhardy Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module also needs the cfg gate for std-only

dhardy mentioned this pull request Mar 18, 2018

Fixes for Binomial, Poisson distributions #310

Closed

5 tasks

dhardy merged commit 2e3f2bf into rust-random:master Mar 18, 2018

fizyk20 deleted the discrete branch March 19, 2018 15:16

pitdicker pushed a commit that referenced this pull request Apr 4, 2018

Merge pull request #96 from fizyk20/discrete

8558b22

Add binomial and Poisson distributions

ndebuhr mentioned this pull request Jan 27, 2021

CHANGE: Poisson distribution return type, change from float to integer #1093

Closed

Add binomial and Poisson distributions #96

Add binomial and Poisson distributions #96

Conversation

fizyk20 commented Jan 22, 2016

mkindahl commented Sep 12, 2017

fizyk20 commented Oct 24, 2017

dhardy commented Oct 24, 2017

dhardy commented Mar 4, 2018

fizyk20 commented Mar 4, 2018

dhardy commented Mar 4, 2018

fizyk20 commented Mar 4, 2018 • edited Loading

dhardy left a comment

Choose a reason for hiding this comment

dhardy Mar 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy Mar 8, 2018 • edited Loading

Choose a reason for hiding this comment

pitdicker commented Mar 6, 2018

Choose a reason for hiding this comment

pitdicker commented Mar 18, 2018

pitdicker commented Mar 18, 2018

dhardy commented Mar 18, 2018

pitdicker commented Mar 18, 2018

pitdicker commented Mar 18, 2018

fizyk20 commented Mar 4, 2018 •

edited

Loading

dhardy Mar 4, 2018 •

edited

Loading

dhardy Mar 8, 2018 •

edited

Loading