Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster mp_toradix? #328

Open
MasterDuke17 opened this issue Aug 4, 2019 · 3 comments
Open

Faster mp_toradix? #328

MasterDuke17 opened this issue Aug 4, 2019 · 3 comments
Milestone

Comments

@MasterDuke17
Copy link
Collaborator

MasterDuke17 commented Aug 4, 2019

For background, I starting looking at this when a profile of a Perl 6 program that calculated Ackerman numbers showed that 90% of the time was spent stringifying a big Integer to print the result. I was using Rakudo with the MoarVM backend which uses libtommath under the hood (the stringifying was essentially just a call to mp_toradix, https://github.com/MoarVM/MoarVM/blob/master/src/math/bigintops.c#L1023). I looked at the implementation of mp_toradix and wondered if there was a faster algorithm. I started searching and found this thread http://code.activestate.com/lists/tcl-core/13692/ (link doesn't seem to work anymore, but it's also available here https://sourceforge.net/p/tcl/mailman/message/31972670/). This is my rough proof-of-concept transliteration into C using libtommath functions https://gist.github.com/MasterDuke17/956999b54093258f4a7e616a00d95ac2 (it wasn't until after I finished that I noticed some Barrett reduction related functions in libtommath). This is significantly faster. On my machine mp_toradix of 2**500000 takes ~13s, but Barrett_todigits takes ~0.3s. I haven't extensively benchmarked small and large values to see if using Barrett only makes sense after a certain size, but it definitely seems worthwhile at large values. Would there be any interest in using this for libtommath? Either as the implementation of mp_toradix, or maybe an alternative function?

@czurnieden
Copy link
Contributor

Yes, the mp_toradix function is really slow, that's for sure! ;-)

We are doing a lot of refactoring now to bring LibTomMath up-to-date (and to get rid of a lot of legacy stuff that is definitely gone obsolete by now). Speeding things up is next (some things are already done, e.g.: faster "fast multiplication", a usable nth-root and others). I don't know how the rest thinks about it but a simple divide&conquer algorithm for to_radix would do (Schoenhage was published it first, I think?) but that needs fast division. All that is already written (I have it in my own version for a couple of years now, i.e.: tried and tested) it just needs a "port" to the current LTM version.
Advantage of that D&C algorithm: very simple and starts to be faster at low values (I have the cut-off at 600 bits).

Another problem: mp_radix_size is O(n^2) instead of O(1). Look at the current code and you know why ;-)
Replacement has been written (table based, i.e.: O(1) ) and fully tested, it just needs to be "ported".
It has a difference, though, it can overestimate up to one digit (three digits with MP_8BIT) just like the GMP equivalent, the original on the other side was always exact. But there is mp_ilogb now, which can be used to give an exact result in roughly O(log n).

So why for Knuth's sake isn't it already done you rightfully ask?

Because the head-maintainer and a lot of other people here, myself included, are currently drowning in work. The kind of work that pays the rent which understandably comes first, sorry.
I don't know when it will get better and cannot speak for all the others but for me it is the whole of august at least.

But after a closer look: it is interesting, the single problem seems to be the high value of the cutoff-point.
Mmh…
Why don't you fill in your sketch and make an official pull request? It would make it much easier to test and discuss it.

@MasterDuke17
Copy link
Collaborator Author

Cool, glad to see work on libtommath is still going on. Let me caveat this upfront by saying I'm not a number theorist, not even any kind of mathematician, and only just barely a C programmer. I've been looking at the *_reduce_* functions and still haven't figured out how to incorporate them into my proof-of-concept. But...I'll clean up my existing code a little bit and do some benchmarks of different sizes of values and submit a PR.

@minad minad added this to the v2.0.0 milestone Oct 3, 2019
@mdehoogh
Copy link

mdehoogh commented Oct 21, 2023

I'm using libtommath 1.1.0 and also noticed that mo_toradix is slow. I took the existing implementation and managed to speed it up 3 to 4 times when the radix is 10, but it is still considerably slower (up to 50 times) than Python (indeed on representing 2<<500000 and larger shifts). I'm going to look into the other solution presented here. I looked at the convert to BCD algorithms that are popular, but don't see how fast it could be. Any help would be welcome, because such a large difference in speed suggests a totally different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants