first draft of Binomial.icdf issue #6612 - PyData 2024 Hackathon #7362

niknow · 2024-06-15T16:06:18Z

Description

Related Issue

Closes #
Related to Add icdf functions for distributions #6612

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7362.org.readthedocs.build/en/7362/

welcome · 2024-06-15T16:06:21Z

]
💖 Thanks for opening this pull request! 💖 The PyMC community really appreciates your time and effort to contribute to the project. Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

ricardoV94 · 2024-06-16T09:03:23Z

pymc/distributions/discrete.py

+        cdf_vals = Binomial.logcdf(pt.arange(0, n), n, p)
+        return pt.argmax(cdf_vals > pt.log(value))


This is clever but widely inefficient for large n?

Also would probably need to pass axis=-1 to argmax

You are totally right, it is slow for large n as it is in O(n). I did actually flag this point at the hackathon and would agree that for that reason this is an ad-hoc implementation, but was encouraged to submit it anyway. ;) If you feel it is too slow for production I'm totally fine if you reject this PR.

I would note though that for very large n, one is usually better off using a normal approximation instead anyway.

And on the upside: This trick can be used to quickly get an implementation for the icdf of any discrete distribution, so one idea we had at the event was to use that as a first draft and make it quicker later if needed. One might be able to get the complexity down to O(log(n)) one replaces this with some sort of bisection.

Having a fast icdf just for Binomial might require quite a bit more effort, e.g. some numerical root finder of the CDF (which for Binomial is the regularized incomplete beta function). So I think the options are:

take this idea as a draft and reject for production use (maybe keep in mind using for unit tests)

take this idea and accept that it is slow for now

try to tweak to O(log(n))

try to lift and shift a more tailored approach based on beta function (will probably require a bit higher effort)

Of course open to ideas if you see another option? ;-)

Thanks for the explanation!

I think we should try 3/4, and we can use this approach to obtain and independent reference for testing?

Perhaps the binary search wouldn't be too hard to implement with a ScalarLoop (so it is trivial to vectorize as an Elemwise operation)?

https://github.com/pymc-devs/pytensor/blob/main/pytensor/scalar/loop.py

The requirement is that the CDF expression be composed only of Scalar (or Elemwise in the batched version) operations.

The code may be a bit daunting but here are some cases where we use it: https://github.com/pymc-devs/pytensor/blob/efa845a3484915e4e15a928918fa97d081886d50/pytensor/scalar/math.py#L870

Or tests that may be more readable: https://github.com/pymc-devs/pytensor/blob/main/tests/scalar/test_loop.py

first draft of Binomial.icdf issue pymc-devs#6612

0f2cf84

ricardoV94 reviewed Jun 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first draft of Binomial.icdf issue #6612 - PyData 2024 Hackathon #7362

first draft of Binomial.icdf issue #6612 - PyData 2024 Hackathon #7362

niknow commented Jun 15, 2024 •

edited

Loading

welcome bot commented Jun 15, 2024

ricardoV94 Jun 16, 2024

ricardoV94 Jun 16, 2024

niknow Jun 19, 2024

ricardoV94 Jun 19, 2024

		cdf_vals = Binomial.logcdf(pt.arange(0, n), n, p)
		return pt.argmax(cdf_vals > pt.log(value))

first draft of Binomial.icdf issue #6612 - PyData 2024 Hackathon #7362

Are you sure you want to change the base?

first draft of Binomial.icdf issue #6612 - PyData 2024 Hackathon #7362

Conversation

niknow commented Jun 15, 2024 • edited Loading

Description

Related Issue

Checklist

Type of change

welcome bot commented Jun 15, 2024

ricardoV94 Jun 16, 2024

Choose a reason for hiding this comment

ricardoV94 Jun 16, 2024

Choose a reason for hiding this comment

niknow Jun 19, 2024

Choose a reason for hiding this comment

ricardoV94 Jun 19, 2024

Choose a reason for hiding this comment

niknow commented Jun 15, 2024 •

edited

Loading