bpo-43420: Simple optimizations for Fraction's arithmetics #24779

skirpichev · 2021-03-07T04:50:11Z

making Fraction class more usable for large arguments (>~ 10**6),
with cost 10-20% for small components case.

References:
https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps
https://gmplib.org/, e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c

https://bugs.python.org/issue43420

skirpichev · 2021-03-07T06:46:08Z

There was a test failure in tkinter tests, seems to be random. Not sure if it's related to the PR.

Lib/fractions.py

mdickinson

The code looks correct to me (apart from the unnecessary negativity check in _mul that @rhettinger pointed to). I'm less convinced that this is a desirable change, but I'll comment on the issue for that.

rhettinger · 2021-03-11T02:34:26Z

After looking at Tim's comments, I think this should be limited to just _mul and _div.

tim-one · 2021-03-11T03:38:34Z

Actually, I think the optimization is generally more valuable for + -. For example, consider using Fraction instances derived from floats. All denominators are powers of 2 then, and the gcd of the incoming denominators is never 1 (unless one happens to be an exact integer). So the product of the denominators is (almost) never in lowest terms for + - (although naive * is always in lowest terms then: odd/power_of_2 * odd/power_of_2 = odd/power_of_2).

The optimization is really in two parts, and the first is taught to kids without major trouble: when adding fractions, first convert both to use the LCM of the operands' denominators (for example, in 1/6 + 1/3, LCM(6, 3) = 6 so convert to 1/6 + 2/6 - nobody in real life converts to 3/18 + 6/18 instead!).

The second part is what requires some subtle reasoning: realizing that any common factor between the top and bottom of the sum must be a factor of the original gcd of the incoming denominators. There's no need at all to look at the LCM (6). In fact, there was really no need to compute the LCM to begin with (& the optimized code does not).

The stuff in that paragraph generally isn't taught to kids, but it's not really deep, just unfamiliar to most and non-obvious.

How about if I wrote a coherent proof using the PR's variable names?

skirpichev · 2021-03-11T03:55:33Z

I think this should be limited to just _mul and _div.

@rhettinger, that makes the change useless for it's potential consumers, e.g. SymPy. (Correct me, please, @asmeurer.) One situation, that trigger the change - sum (or difference) of unrelated fractions (which may occur already in the statistics module of the stdlib - the only other practical use case for Fraction, that was presented so far). Profit example was right at the beginning of the issue.

Ok, you are -1 so far, correct?

Also, I doubt that "while the optimizations for * and / are close to self-evident, those for + and - are much subtler" is a good argument. "Code obscurity" covers not just _add/sub, but others modified methods as well. I think, it's pretty obvious what is going on (i.e. it doesn't make sense to describe algorithms in comments, right?). Why it is so - the question of algorithms correctness, and there is the reference came to play.

I doubt that python could restrict itself to self-evident (middle-school?) algorithms to be useful. In the issue thread were examples when it's not. Why this case is a special?

How about if I wrote a coherent proof using the PR's variable names?

@tim-one, maybe naming convention should be adapted to follow Knuth more closely (u and u' for numerators, v and v' for denominators and so on), instead of being coherent with the rest of module?

LCM

s/_/GCD/ )

tim-one · 2021-03-11T04:37:41Z

Sergey, it doesn't work to tell people they can't be concerned about what they are concerned about 😉. People will be maintaining this code probably long after you go on to something else, and clarity is a legitimate concern. The current code in this module is almost all not just simple, but dead obvious to anyone who knows what rationals are. In cases where it's not, comments are currently used to try to help readers along (e.g., in __hash__() and limit_denominator()).

This is especially acute for modules written in Python (rather than in C): we expect curious, but "ordinary", Python users to be able to learn from studying std Python modules.

Toward that end, it's best if we don't expect them to have a stack of reference books handy either. Rewriting code here to use more Knuth-like notation would only make some sense if we expected the user to have Knuth handy - but they overwhelmingly won't, and it's much more harmful in context to use naming conventions that have nothing in common with what the module's other methods use.

There's nothing wrong about writing a short, self-contained explanation for non-obvious code. Your "I think, it's pretty obvious what is going on" was demolished long ago: it wasn't obvious to Mark, Raymond, or to me at first, and combined we have at least 50 years' experience in numeric software. It doesn't matter whether it's pretty obvious to you unless you can promise to personally maintain the code forever 😉.

If this required deep theory, sure, writing a small book would be unreasonable. But it doesn't.

About LCM vs GCD, no, I meant LCM everywhere I wrote LCM. LCM(a, b) = a * b / GCD(a, b), their least common multiple.

skirpichev · 2021-03-11T05:14:06Z

it doesn't work to tell people they can't be concerned about what they are concerned

My reasonings were, rather, about how to workaround these concerns. Now I see your point: just references are not enough for Python code but might be ok for C parts. Fine.

There's nothing wrong about writing a short, self-contained explanation for non-obvious code.

So, are you suggest moving the proofs of correctness from i.e. Knuth? For all methods or just _add/sub? Or should I also explain why these methods are better than middle-school variants? (That might require a small book, e.g. to explain ~60% probability for the gcd=1 branches, see theorem 4.5.2D in TAOCP II.)

BTW, _add()/_sub() code follows same pattern. Do you think it does make sense to unify them as gmplib did or this doesn't belong to the current PR?

unless you can promise to personally maintain the code forever

(Moscow's bus can hit me.)

skirpichev · 2021-03-11T05:39:30Z

(
Here is a version, where _add/sub() were merged:

    def _add_sub_(a, b, pm=int.__add__):
        na, nb = a.numerator, b.numerator
        da, db = a.denominator, b.denominator
        g = math.gcd(da, db)
        if g == 1:  
            return Fraction(pm(na * db, da * nb), da * db, _normalize=False)
        else:
            q1, q2 = da // g, db // g
            t = pm(na * q2, nb * q1)
            g2 = math.gcd(t, g)
            return Fraction(t // g2, (da // g) * (db // g2), _normalize=False)

    _add = functools.partial(_add_sub_)
    _add.__doc__ = 'a + b'
    __add__, __radd__ = _operator_fallbacks(_add, operator.add)

    _sub = functools.partial(_add_sub_, pm=int.__sub__)
    _sub.__doc__ = 'a - b'
    __sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)

Also, the module is inconsistent in accessing numerator/denominator's. Sometimes public methods are used, sometimes private attributes (which is slightly faster). This holds for arithmetic methods too (e.g. __mod__ vs __pow__). Should I turn changed methods to use the second variant?

UPD: using attributes does affect simple benchmarks comparable wrt new algorithms. E.g. (patched2 is the version with _add_sub_ AND using attributes):

$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
50000 loops, best of 11: 9.64 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
20000 loops, best of 11: 12.1 usec per loop
$ ./python -m timeit -r11 -s 'from patched2 import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
20000 loops, best of 11: 10.7 usec per loop
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
50000 loops, best of 11: 9.25 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
20000 loops, best of 11: 11.6 usec per loop
$ ./python -m timeit -r11 -s 'from patched2 import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
50000 loops, best of 11: 9.85 usec per loop

All this may be considered as off-topic the wrt major issue, i.e. documentation for algorithms.
)

skirpichev · 2021-03-11T10:16:18Z

Ok, I've managed to add some remarks around the code. Let me know how bad it is.

tim-one · 2021-03-13T04:37:50Z

Thanks, Sergey! I think that helps a lot. It's more of a memory jog for someone who already worked through a detailed proof, and I'd be wordier, but it's plenty to get a motivated reader on the right track.

This is - and always has been - the simplest way to implement _sub:

def _sub(a, b):
    """a - b"""
    return a + -b

Trivial to code, trivial to understand, and self-evidently correct. But the pain we've always been willing to endure to avoid the implied __neg__() and __add__() calls shows that we don't really value triviality and obviousness over speed 😉.

sweeneyde · 2021-03-13T04:41:58Z

I think it might be nice to write down the sort of thing that someone would write on their scratch paper when verifying this (hopefully eliminating the need for scratch paper). Maybe even some ascii art, like:

    # Let g = gcd(da, db). Then
    # 
    #     na   nb    na*(db//g)   nb*(da//g)
    #     -- + -- == ---------- + ----------
    #     da   db    da*(db//g)   db*(da//g)
    #             
    #                na*(db//g) + nb*(da//g)
    #             == -----------------------
    #                       da*db//g
    # 
    # When g is large, this can help decrease the size of the
    # multiplications required.
    # 
    # We'll special-case g == 1, since 60.8% of randomly-chosen
    # integers are coprime.
    # 
    # We can also optimize the normalization of the resulting
    # fraction. Note that the denominator da*db//g can be re-written
    # as (da//g)*(db//g)*g. Helpfully,
    # 
    #     gcd(da//g, numerator) == gcd(da//g, na*(db//g) + nb*(da//g))
    #                           == gcd(da//g, na*(db//g))
    #                           == 1.
    # 
    # The last equality holds because:
    # 
    #     (da//g, na) share no factors: the input was normalized.
    #     (da//g, db//g) share no factors: common ones are removed.
    # 
    # For the same reason, gcd(db//g, numerator) == 1. Applying these,
    # 
    #     gcd(numerator, denominator)
    #         == gcd(numerator, (da//g)*(db//g)*g)
    #         == gcd(numerator, g).
    # 
    # This is useful because the unnormalized denominator could be
    # much larger than g.

tim-one · 2021-03-13T04:49:07Z

the module is inconsistent in accessing numerator/denominator's. Sometimes public methods are used, sometimes private attributes (which is slightly faster).

I have no idea why these are inconsistent, and can detect no coherent pattern; e.g., __floor__() uses the public properties, but __trunc__() uses the private attributes.

Perhaps there's some obscure hidden reasoning about consequences for subclasses, but, if so, I can't guess what.

In the absence of someone clarifying, I'd go with the faster way (private attributes) in code you write, and leave all other code alone.

skirpichev · 2021-03-13T04:57:56Z

This is - and always has been - the simplest way to implement _sub

Sure) And this is really slow for small components case. So, @tim-one, do you think I should keep _add_sub_()?

I think it might be nice to write down the sort of thing that someone would write on their scratch paper [...]

Not mine) I think, equality of fractions to middle-school variants - is pretty obvious. That is not trivial (or not too trivial): why outputs are normalized (that is that I tried to document). @tim-one, what do you think on this version?

Perhaps there's some obscure hidden reasoning about consequences for subclasses

I suspect, this is for handling int's. This could be solved by type-casting to Fraction's (with a speed penalty for mixed cases).

tim-one · 2021-03-13T04:58:43Z

Yes, I'd prefer something wordier, probably at module level so it doesn't overwhelm the code in the method. Just FYI, this is what I wrote for myself:

The addends are a = an/ad where an and ad are coprime, and similarly for b = bn/bd. The sum is

(an * bd + bn * ad) / (ad * bd)   [1]

reduced to lowest terms. For speed, some transformations are done first, to allow working with smaller integers. These are related to the elementary textbook approach of using the least common multiple of ad and bd as the initial denominator.

First let

g = gcd(ad, bd)
ad' = ad / g
bd' = bd / g

Note that ad' and bd' are coprime (if not, they would have some common non-unit factor f, and then g * f would be a larger divisor of ad and bd than g is, contradicting that g is their greatest common divisor).

Then [1] becomes

(an * bd' * g + bn * ad' * g) / (ad * bd' * g)

Top and bottom can be divided by g to give the equivalent

(an * bd' + bn * ad') / (ad * bd') [2]

Note: the denominator of [2] is the least common multiple of ad and bd.

Now, if g > 1, we're working with smaller integers. Reducing to lowest terms is also sped, by the following. Any prime p dividing the top and the bottom of [2] must divide at least one of ad', g, and,bd' (the product of which all is the denominator).

Suppose p divides ad'. To divide the numerator, p must also divide an * bd'. But as noted above, ad' and bd' are coprime, so p cannot divide bd'. And since ad' divides ad, and ad was coprime to an on entry, p cannot divide an either. Since p doesn't divide either factor, it doesn't divide their product, and the numerator is not divisible by p.

By symmetry, p cannot divide bd' either.

Therefore, any common factor must divide g. The largest such possible is

g' = gcd(an * bd' + bn * ad', g)

and then

((an * bd' + bn * ad') / g') / (ad' * (bd / g')) [3]

is the result in lowest terms.

As a special case worth catering to, if g = 1 near the start, it follows that ad' = ad, bd' = bd, and g' = 1, so the lowest-terms [3] is already computed directly by [1] in this case.

skirpichev · 2021-03-13T05:22:02Z

The drawback of big module-level comment is that it's may be harder to keep one in sync with the code.

tim-one · 2021-03-13T05:33:30Z

The drawback of big module-level comment is that it's may be harder to keep one in sync with the code.

Not really here. The code is tiny. Just plant a comment at the top of the code reminding maintainers to keep the block comment in synch with the code.

Which is academic anyway, since the code isn't going to change again 😉 - seriously, there are no major optimizations left to implement then. Just random low-value junk nobody will bother with, like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest based on the bit lengths of da, db, g, and g2.

skirpichev · 2021-03-13T06:24:46Z

like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest

I pick the first because of the referenced book. First version used the last variant, as SymPy - which may be also less efficient (one multiplication more).

skirpichev · 2021-03-13T07:02:39Z

@tim-one, let me know what do you think about wordier versions.

Is the class-level comment ok? If so, probably I could rebase code. I think, there should be two commits: one which introduces _add_sub_() helper and second for the rest.

skirpichev · 2021-03-13T10:38:47Z

Lib/fractions.py

-        x2 = math.gcd(nb, da)
-        # Resulting fraction is normalized, because gcd(na*nb, da*db) == x1*x2.
-        # Indeed, any prime power, that divides both products on the left must
-        # either (because fractions are normalized, i.e. na and db - coprime):


Oops, here was a typo. Meant da.

Lib/fractions.py

tim-one · 2021-03-13T18:45:43Z

like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest

I pick the first because of the referenced book. First version used the last variant, as SymPy - which may be also less efficient (one multiplication more).

No spelling is always fastest - depends on the relative sizes. Knuth's is fine. I think SymPy's requires unusual inputs to be fastest.

For example, suppose denominators each have a million digits (in whatever base the underlying implementation uses), g has 1000 digits, and g2 has 999, so that g // g2 is just 1 digit.

Then g // g2 is very cheap, with a 1-digit result. Multiplying by a single digit is also relatively cheap.

In contrast, db // g2 or da // g2 are very expensive, with quotients close to a million digits, each digit of which, in a textbook division algorithm, requires at least a 999-by-1 digit multiplication and shifted subtract of the product.

But g2 being "large" is probably uncommon.

I think, there should be two commits: one which introduces _add_sub_() helper and second for the rest.

I don't see a real point to splitting it, but suit yourself.

making Fraction class more usable for large arguments (>~ 10**6), with cost 10-20% for small components case. Before: $ ./python -m timeit -s 'from fractions import Fraction as F' \ -s 'a=[F(1, _**3) for _ in range(1, 1000)]' 'sum(a)' 5 loops, best of 5: 81.2 msec per loop After: $ ./python -m timeit -s 'from fractions import Fraction as F' \ -s 'a=[F(1, _**3) for _ in range(1, 1000)]' 'sum(a)' 10 loops, best of 5: 23 msec per loop References: Knuth, TAOCP, Volume 2, 4.5.1, https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps, https://gmplib.org/ (e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c)

That's slightly faster for small components. Before: $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b' 20000 loops, best of 11: 12.1 usec per loop $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = 7' 'a + b' 20000 loops, best of 11: 11.4 usec per loop After: $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b' 50000 loops, best of 11: 9.74 usec per loop $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = 7' 'a + b' 20000 loops, best of 11: 16.5 usec per loop On the master: $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b' 50000 loops, best of 11: 9.61 usec per loop $ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \ -s 'a = F(10, 3)' -s 'b = 7' 'a + b' 50000 loops, best of 11: 9.11 usec per loop

Lib/fractions.py

tim-one · 2021-03-17T21:17:53Z

Thanks, Mark! I concur. We do need to cover all branches in the test suite, and I overlooked that.

Do you have an opinion on the other (now reverted) material change to how mixed-type arithmetic is implemented, to wrap ints in new Fraction objects so the four changed methods can access fields via private attributes (like ._numerator) instead of the current properties (like .numerator - no leading underscore - which Fraction and int both support)? Using properties (the status quo) effectively imposes the overhead of 4 method calls on each binary operation (to access the two numerators and two denominators).

I don't much care either way, but private attributes do run faster than properties. Perhaps I'm just hyper-sensitive to Raymond's "spread" aversion 😉. The downside to the change is that ints would need to be wrapped in temporary new Fraction objects.

Another: do

    from math import gcd as _gcd

near the top and use _gcd everywhere instead of math.gcd. Would save one dynamic level of namespace lookup on each use.

Another where I don't much care either way. I want the major speedups for "large" rationals the PR buys, but just don't care much about marginal speed changes for "small" rationals.

It is, though, thoroughly legitimate if you (and/or Raymond) do - I won't fight either way.

mdickinson · 2021-03-17T21:48:34Z

Do you have an opinion on the other (now reverted) material change [...]

No strong opinion either way.

Importing math.gcd as _gcd seems worth doing.

Co-authored-by: Mark Dickinson <[email protected]>

skirpichev · 2021-03-18T02:58:43Z

We do need to cover all branches in the test suite, and I overlooked that.

Indeed, @tim-one, one branch was uncovered (at least by test_fractions.py & test_math.py). I had a prejudice that coverage testing is a part of the CPython CI. But it seems there are codecov.io profiles only for C code...

from math import gcd as _gcd

Sounds good for me, but I'm not sure now if it belongs to this pr (just like using private attributes)...

tim-one · 2021-03-18T03:19:30Z

from math import gcd as _gcd

Sounds good for me, but I'm not sure now if it belongs to this pr (just like using private attributes)...

Still want to hear from Raymond. His objections here have been too telegraphic for me to flesh out for him, and he seems disinclined to clarify them, but his opinion remains valuable to me.

For me, I want the "huge wins" here, so suggest leaving this be for now. If Raymond doesn't say more by the end of the weekend, I intend to merge this. Micro-optimizations can wait.

Note that, on the BPO report, Mark was the strongest champion of not slowing "small rational" cases, but now, most recently here, he has "no strong opinion either way". Peoples' opinions do change over time, provided they stay engaged and consider more of the issues. I'm another: I was +0 on the BPO report, but am +1 now that there's code to try and I see major speedups in snippets of old actual code I've tried it on.

skirpichev · 2021-03-18T11:31:35Z

Still want to hear from Raymond. ... his opinion remains valuable to me.

And for me as well...

there's code to try and I see major speedups in snippets of old actual code I've tried it on.

BTW, in case I'll not rebase PR before merge, I think it's time to update benchmarks for 430e476.

Perhaps, something like following may be fine for a merge commit:

[bpo-43420](https://bugs.python.org/issue43420): Simple optimizations for Fraction's arithmetics

making Fraction class more usable for large arguments, with cost 10-20%
for small components case.

On my system, for the master:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3, 4); b = F(5, 7)' 'a + b'
  50000 loops, best of 5: 9.68 usec per loop
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3236111155551492222222237492, 4911111118982374922222238497)' \
  -s 'b = F(3555555555555555523649237492, 4989823777777777777774938497)' 'a + b'
  20000 loops, best of 5: 15.9 usec per loop
while on the branch:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3, 4); b = F(5, 7)' 'a + b'
  20000 loops, best of 5: 11.1 usec per loop
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3236111155551492222222237492, 4911111118982374922222238497)' \
  -s 'b = F(3555555555555555523649237492, 4989823777777777777774938497)' 'a + b'
  20000 loops, best of 5: 14.8 usec per loop

The crosspoint, where speeds are comparable - when components are around
10**10.  While this seems big, it isn't.

The main reason for this is that e.g. the numerator and denominator of the sum
of two rational numbers is usually of the same order of magnitude
as those of their product.  And real use-cases of the fractions module include
repeated application of arithmetic functions, e.g. summation of a series (as
statistics.mean do).

Lets compute 1000-th harmonic number, for example.

On the master:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = [F(1, _) for _ in range(1, 1000)]' 'sum(a)'
  10 loops, best of 5: 28.7 msec per loop
On the branch there is 2x speedup:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = [F(1, _) for _ in range(1, 1000)]' 'sum(a)'
  20 loops, best of 5: 15.7 msec per loop

References:
Knuth, TAOCP, Volume 2, 4.5.1,
https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps,
https://gmplib.org/ (e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c)

rhettinger · 2021-03-20T15:10:15Z

Lib/fractions.py

+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g1 = math.gcd(na, db)
+        if g1 > 1:


I think the "> 1" tests should be removed. The cost of a test is in the same ballpark as the cost of a division. The code is clearer if you just always divide. The performance gain mostly comes from two small gcd's being faster than a single big gcd.

I think the "> 1" tests should be removed.

Maybe, I didn't benchmark this closely, but...

The cost of a test is in the same ballpark as the cost of a division.

This is a micro-optimization, yes. But ">" has not same cost as "//" even in the CPython:

$ ./python -m timeit -s 'a = 1' -s 'b = 1' 'a > b' 5000000 loops, best of 5: 82.3 nsec per loop $ ./python -m timeit -s 'a = 111111111111111111111111111111111' -s 'b = 1' 'a > b' 5000000 loops, best of 5: 86.1 nsec per loop $ ./python -m timeit -s 'a = 11111' -s 'b = 1' 'a // b' 2000000 loops, best of 5: 120 nsec per loop $ ./python -m timeit -s 'a = 111111111111111111111111' -s 'b = 1' 'a // b' 1000000 loops, best of 5: 203 nsec per loop $ ./python -m timeit -s 'a = 1111111111111111111111111111111111111111111111' -s 'b = 1' 'a // b' 1000000 loops, best of 5: 251 nsec per loop $ ./python -m timeit -s 'a = 111111111111111111111111111111122222222222222111111111111111' \ -s 'b = 1' 'a // b' 1000000 loops, best of 5: 294 nsec per loop

Maybe my measurements are primitive, but there is some pattern...

The code is clearer if you just always divide.

I tried to explain reasons for branching in the comment above code. Careful reader might ask: why you don't include same optimizations in the _mul as in _add/sub?

On another hand, gmplib doesn't use thise optimizations (c.f. same in mpz_add/sub). SymPy's PythonRational class - too.

So, I did my best in defending those branches and I'm fine with removing, if @tim-one has no arguments against.

The performance gain mostly comes from two small gcd's being faster than a single big gcd.

This is more important, yes.

The cost of a test is in the same ballpark as the cost of a division.

? Comparing g to 1 takes, at worst, small constant time, independent of how many digits g has. But n // 1 takes time proportional to the number of internal digits in n - it's a slow, indirect way of making a physical copy of n. Since we expect g to be 1 most of the time, and these are unbounded ints, the case for checking g > 1 seems clear to me.

I don't mind either way.

If it were just me, I would leave out the '> 1' tests because:

For small inputs, the code is about 10% faster without the guards.

For random 100,000 bit numerators and denominators where the gcd's were equal to 1, my timings show no discernible benefit.

The code and tests are simpler without the guards.

That just doesn't make much sense to me. On my box just now, with the released 3.9.1:

C:\Windows\System32>py -m timeit -s "g = 1" "g > 1"
10000000 loops, best of 5: 24.2 nsec per loop

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "g > 1; t // g"
10000 loops, best of 5: 32.4 usec per loop

So doing the useless division not only drove the per-iteration measure from 24.2 to 32.4, the unit increased by a factor of 1000(!), from nanoseconds to microseconds. That's using, by construction, an exactly 100,001 bit numerator.

And, no, removing the test barely improves that relative disaster at all:

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "t // g"
10000 loops, best of 5: 32.3 usec per loop

BTW, as expected, boost the number of numerator bits by a factor of 10 (to a million+1), and per-loop expense also becomes 10 times as much; cut it by a factor 10, and similarly the per-loop expense is also cut by a factor of 10.

With respect to the BPO message you linked to, it appeared atypical: both gcds were greater than 1: (10 / 3) * (6 / 5), so there was never any possibility of testing for > 1 paying off (while in real life a gcd of 1 is probably most common, and when multiplying Fractions obtained from floats gcd=1 is certain unless one of the inputs is an even integer (edited: used to say 0, but it's broader than just that)).

rhettinger · 2021-03-20T16:15:33Z

Tim, I don't have a technical objection. I just stepped aside because at an emotional level I don't like the code and wouldn't enjoy showing it to other people. Unlike earlier drafts, it is no longer self-evidently correct and probably does need the lengthly comment. We do that in C code but usually aim for pure Python code to be more forthright.

skirpichev · 2021-03-20T16:44:26Z

@rhettinger, sorry for questions. But what's exactly doesn't look "self-evidently correct"? Isn't it's clear, that "usual" numerators and denominators (e.g. for sum) were divided on same quantity? Thus, it's the same fraction. (My opinion, that this should be evident to any careful reader.)

Or do you think this is obvious for _mul() case and not - for _add()?

Or it's not clear, that the resulting fraction is normalized? (That's a non-trivial part, IMHO, that could be explained inline.)

tim-one · 2021-03-20T17:49:17Z

Sergey, you should also add your name to the Misc/ACKS file.

skirpichev · 2021-03-20T18:25:11Z

On Sat, Mar 20, 2021 at 10:34:34AM -0700, Tim Peters wrote: of n. Since we expect g to be 1 most of the time

There are two cases (g1 and g2), that are independent for random input. "==1" probabilities for each one - around 60%. So, not most of time, but more than half per branch) But probability of _at least one_ g being 1 is bigger, around 80%. Same for _add/_sub.

skirpichev · 2021-03-20T18:45:49Z

On Sat, Mar 20, 2021 at 10:49:30AM -0700, Tim Peters wrote: Sergey, you should also add your name to the Misc/ACKS file.

I did, but I think I should address Raymond's critique. I'm guessing that _mul() code is more obvious for him. I.e. it's clear that both numerator and denominator are divided on same quantity. We might add inline note that each term in the numerator is coprime to each term in the denominator. It's all. Same should be possible for _add/_sub, but here I wouldn't want to guess that he found "not self-evidently correct".

tim-one · 2021-03-21T02:11:52Z

Sergey, it seems clear to me that Raymond's objection cannot be overcome. +- is not obvious to him, or, indeed, to anyone - where "obvious" emphatically includes that the computed result is directly in lowest terms. Without that, it's not even code for +- in context.

So he unassigned himself, and doesn't want to give his blessing. That's OK - we don't always agree on everything. To me, the "non-obviousness" is more-than-less shallow, everyone here who thought about it figured out "why" for themselves, and the potential for major compensating speedups is real in non-contrived cases.

In other news, earlier you appeared to suggest a quite long commit comment. We don't really do those anymore. In the early days, sure, but now we have BPO and Github to record discussions. Commit messages correspondingly have shrunk to essentials. In this case, I had more in mind

Implement standard transformations in + - * / that can often reduce the size of intermediate integers needed. For rationals with large components, this can yield dramatic speed improvements, but for small rationals can run 10-20% slower, due to increased fixed overheads in the longer-winded code. If those slowdowns turn out to be a problem, see the PR discussion for low-level implementation tricks that could cut other fixed overheads.

skirpichev · 2021-03-21T03:57:17Z

Sergey, it seems clear to me that Raymond's objection cannot be overcome.

I'll do my last attempt to address his critique.

+- is not obvious to him

Maybe I can make it obvious? Here is another version with inline comments:

diff --git a/Lib/fractions.py b/Lib/fractions.py
index de3e23b759..99511c7b37 100644
--- a/Lib/fractions.py
+++ b/Lib/fractions.py
@@ -380,32 +380,96 @@ def reverse(b, a):
 
         return forward, reverse
 
+    # See Knuth, TAOCP, Volume 2, 4.5.1 for rational arithmetic algorithms.
+
     def _add(a, b):
         """a + b"""
-        da, db = a.denominator, b.denominator
-        return Fraction(a.numerator * db + b.numerator * da,
-                        da * db)
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g = math.gcd(da, db)
+        if g == 1:
+            # Randomly-chosen integers are coprime with probability
+            # 6/(pi**2) ~ 60.8%.  Here also g2 == 1 and the result is
+            # already normalized (see below).
+            return Fraction(na * db + da * nb, da * db, _normalize=False)
+        # First, divide both numerator and denominator of the sum by g.  Note
+        # that the denominator now is d = (da//g)*db == (da//g)*(db//g)*g.
+        s = da // g
+        n = na * (db // g) + nb * s
+        # To optimize normalization of the fraction n/d, note that its common
+        # term g2 = gcd(n, d) == gcd(t, (da//g)*(db//g)*g) == gcd(t, g),
+        # because t, (da//g) and (db//g) are pairwise coprime.
+        #
+        # Indeed, (da//g) and (db//g) share no common factors (they were
+        # removed) and da is coprime with na (since input fractions are
+        # normalized), hence (da//g) and na are coprime.  By symmetry,
+        # (db//g) and nb are coprime too.  Then,
+        #
+        #     gcd(t, da//g) == gcd(na*(db//g), da//g) == 1
+        #     gcd(t, db//g) == gcd(nb*(da//g), db//g) == 1
+        g2 = math.gcd(n, g)
+        if g2 == 1:
+            return Fraction(n, s * db, _normalize=False)
+        return Fraction(n // g2, s * (db // g2), _normalize=False)
 
     __add__, __radd__ = _operator_fallbacks(_add, operator.add)
 
     def _sub(a, b):
         """a - b"""
-        da, db = a.denominator, b.denominator
-        return Fraction(a.numerator * db - b.numerator * da,
-                        da * db)
+        # Same as _add() with negated nb.
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g = math.gcd(da, db)
+        if g == 1:
+            return Fraction(na * db - da * nb, da * db, _normalize=False)
+        s = da // g
+        n = na * (db // g) - nb * s
+        g2 = math.gcd(n, g)
+        if g2 == 1:
+            return Fraction(n, s * db, _normalize=False)
+        return Fraction(n // g2, s * (db // g2), _normalize=False)
 
     __sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)
 
     def _mul(a, b):
         """a * b"""
-        return Fraction(a.numerator * b.numerator, a.denominator * b.denominator)
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g1 = math.gcd(na, db)
+        if g1 > 1:
+            na //= g1
+            db //= g1
+        g2 = math.gcd(nb, da)
+        if g2 > 1:
+            nb //= g2
+            da //= g2
+        # The result is normalized, because each of two factors in the
+        # numerator is coprime to each of the two factors in the denominator.
+        #
+        # Indeed, pick (na//g1).  It's coprime with (da//g2), because input
+        # fractions are normalized.  It's also coprime with (db//g1), because
+        # common factors are removed by g1 == gcd(na, db).
+        return Fraction(na * nb, db * da, _normalize=False)
 
     __mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)
 
     def _div(a, b):
         """a / b"""
-        return Fraction(a.numerator * b.denominator,
-                        a.denominator * b.numerator)
+        # Same as _mul(), with inversed b.
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g1 = math.gcd(na, nb)
+        if g1 > 1:
+            na //= g1
+            nb //= g1
+        g2 = math.gcd(db, da)
+        if g2 > 1:
+            da //= g2
+            db //= g2
+        n, d = na * db, nb * da
+        if d < 0:
+            n, d = -n, -d
+        return Fraction(n, d, _normalize=False)
 
     __truediv__, __rtruediv__ = _operator_fallbacks(_div, operator.truediv)

@tim-one, I hope you will be ok with this version anyway. It's much smaller and, I think, it does explain all non-trivial parts inline (reasons for single-out branches for g2 in _add and g1, g2 in _mul are less important).

For rationals with large components, this can yield dramatic speed improvements

This doesn't explain why large components are common in computations (which was a non-obvious thing e.g. for Mark in the issue thread).

tim-one · 2021-03-21T04:29:54Z

Sergey, it seems clear to me that Raymond's objection cannot be overcome.

I'll do my last attempt to address his critique.

+- is not obvious to him

Maybe I can make it obvious? Here is another version with inline comments:

Why? You've already tried several times, and made no progress. It's just plain not obvious. Which Raymond (& I, & everyone else) keep telling you. Believe us 😉. It's not that we're stupid - it's that it's not obvious. It requires thought. Period.

The new code is less readable to me, because it mixes up lines of comments with lines of code, making it harder to follow each. It's a wall-of-text.

Note that Raymond didn't complain about * or / - those were obvious to him (and everyone else). Indeed, he wrote much the same code himself in the BPO report. If you removed the explanations for those entirely, I doubt anyone would object.

For rationals with large components, this can yield dramatic speed improvements

This doesn't explain why large components are common in computations (which was a non-obvious thing e.g. for Mark in the issue thread).

I believe you misread Mark there. He's quite aware of that rationals can have arbitrarily large components. What he was pushing back against on the BPO report is that statistics._sum() was a motivating example. Converting from floats leaves exact power-of-2 denominators, and those don't "blow up" under rational summation. Similarly for converting from Decimal.

Yes, we know statistics isn't limited to that. But "motivating examples" need to be things people actually do, not just hypothetical possibilities. It's enough for me, e.g., that factorials are ubiquitous in denominators of power series expansions. Sums of those do "blow up".

skirpichev · 2021-03-21T05:13:20Z

On Sat, Mar 20, 2021 at 09:30:07PM -0700, Tim Peters wrote: The new code is less readable to me, because it mixes up lines of comments with lines of code

Which is now inline comments are supposed to work, right? First do explain some action (or a non-trivial aspect of the following code), introduce notation and so on - then the code follows... Trivial things, that are clear from the code itself - are not commented. In this draft I did exactly that (as it's at my best).

It's a wall-of-text.

Ok. Lets stay with a big, verbose comment. Personally, I don't like one and find it less helpful.

Note that Raymond didn't complain about * or / - those were obvious to him (and everyone else). Indeed, he wrote much the same code himself in the BPO report. If you removed the explanations for those entirely, I doubt anyone would object.

In the draft version above - I did this, partially (only normalization was explained). But for verbose class-level comment is doesn't matter anyway.

tim-one · 2021-03-21T05:27:55Z

Inline comments work fine when code is extensive and naturally breaks into phases. The code here is trivially small. The code is lost in the comments. It's not at all the code that's "not obvious", it's "the math" justifying the code. Consider that any paper you may care to cite also separates the justification from "the code" for algorithms this tiny.

Nevertheless, I'm worn out by the repetitive argument here. If other people say they like the squashed version better, I won't fight it. But, short of that, I'd prefer to leave it well enough alone.

bedevere-bot · 2021-03-22T02:51:19Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 RHEL7 3.x has failed when building commit 690aca7.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/539/builds/760) and take a look at the build logs.
Check if the failure is related to this commit (690aca7) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/539/builds/760

Summary of the results of the build (if available):

== Tests result: ENV CHANGED ==

413 tests OK.

10 slowest tests:

test_concurrent_futures: 4 min 6 sec
test_unparse: 3 min 37 sec
test_capi: 3 min 1 sec
test_tokenize: 2 min 44 sec
test_multiprocessing_spawn: 2 min 39 sec
test_peg_generator: 2 min 35 sec
test_asyncio: 2 min 25 sec
test_lib2to3: 2 min 13 sec
test_multiprocessing_forkserver: 1 min 43 sec
test_gdb: 1 min 26 sec

1 test altered the execution environment:
test_asyncio

13 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev
test_startfile test_tix test_tk test_ttk_guionly test_winconsoleio
test_winreg test_winsound test_zipfile64

Total duration: 7 min 12 sec

Click to see traceback logs

Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 321, in __del__
    self.close()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 316, in close
    self._ssl_protocol._start_shutdown()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 590, in _start_shutdown
    self._abort()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 731, in _abort
    self._transport.abort()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/selector_events.py", line 680, in abort
    self._force_close(None)
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/selector_events.py", line 731, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/base_events.py", line 745, in call_soon
    self._check_closed()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/base_events.py", line 510, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

skirpichev · 2021-03-22T04:27:52Z

Test failure seems to be issue 38912.

Adapted from 046c84e8f9 This makes arithmetic between Fractions with small components just as fast as before #24779, at some expense of mixed arithmetic (e.g. Fraction + int).

the-knights-who-say-ni added the CLA signed label Mar 7, 2021

bedevere-bot added the awaiting review label Mar 7, 2021

rhettinger self-assigned this Mar 7, 2021

rhettinger requested review from mdickinson and serhiy-storchaka March 7, 2021 06:56

rhettinger reviewed Mar 7, 2021

View reviewed changes

Lib/fractions.py Outdated Show resolved Hide resolved

mdickinson reviewed Mar 7, 2021

View reviewed changes

skirpichev commented Mar 13, 2021

View reviewed changes

tim-one reviewed Mar 13, 2021

View reviewed changes

Lib/fractions.py Outdated Show resolved Hide resolved

skirpichev added 3 commits March 14, 2021 08:26

Merge Fraction._add/sub() to a common helper _add_sub_()

598b3c9

mdickinson reviewed Mar 17, 2021

View reviewed changes

Lib/fractions.py Outdated Show resolved Hide resolved

skirpichev and others added 3 commits March 18, 2021 05:02

Apply spelling fixes

40401af

Co-authored-by: Mark Dickinson <[email protected]>

Restore back initial d test in _div() per Mark suggestion

020037a

Add coverage tests (missing in test_fractions.py)

bde52d1

rhettinger reviewed Mar 20, 2021

View reviewed changes

Merge branch 'master' into opt-fractions

51b52b6

Update Misc/ACKS

2789350

tim-one merged commit 690aca7 into python:master Mar 22, 2021

bedevere-bot removed the awaiting merge label Mar 22, 2021

skirpichev deleted the opt-fractions branch March 22, 2021 04:18

skirpichev mentioned this pull request Apr 22, 2021

gh-91851: Micro optimizations for Fraction's arithmetic #25518

Merged

skirpichev mentioned this pull request Apr 23, 2022

Optimize the Fraction class arithmetics for small components #91851

Closed

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

Conversation

skirpichev commented Mar 7, 2021 • edited Loading

skirpichev commented Mar 7, 2021

mdickinson left a comment

Choose a reason for hiding this comment

rhettinger commented Mar 11, 2021

tim-one commented Mar 11, 2021

skirpichev commented Mar 11, 2021

tim-one commented Mar 11, 2021

skirpichev commented Mar 11, 2021

skirpichev commented Mar 11, 2021 • edited Loading

skirpichev commented Mar 11, 2021

tim-one commented Mar 13, 2021

sweeneyde commented Mar 13, 2021

tim-one commented Mar 13, 2021

skirpichev commented Mar 13, 2021

tim-one commented Mar 13, 2021

skirpichev commented Mar 13, 2021

tim-one commented Mar 13, 2021

skirpichev commented Mar 13, 2021

skirpichev commented Mar 13, 2021

skirpichev Mar 13, 2021

Choose a reason for hiding this comment

tim-one commented Mar 13, 2021

tim-one commented Mar 17, 2021

mdickinson commented Mar 17, 2021

skirpichev commented Mar 18, 2021

tim-one commented Mar 18, 2021

skirpichev commented Mar 18, 2021 • edited by bedevere-bot Loading

rhettinger Mar 20, 2021

Choose a reason for hiding this comment

skirpichev Mar 20, 2021

Choose a reason for hiding this comment

tim-one Mar 20, 2021

Choose a reason for hiding this comment

rhettinger Mar 21, 2021

Choose a reason for hiding this comment

tim-one Mar 22, 2021 • edited Loading

Choose a reason for hiding this comment

rhettinger commented Mar 20, 2021

skirpichev commented Mar 20, 2021

tim-one commented Mar 20, 2021

skirpichev commented Mar 20, 2021 via email

skirpichev commented Mar 20, 2021 via email

tim-one commented Mar 21, 2021

skirpichev commented Mar 21, 2021

tim-one commented Mar 21, 2021

skirpichev commented Mar 21, 2021 via email

tim-one commented Mar 21, 2021

bedevere-bot commented Mar 22, 2021

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

skirpichev commented Mar 22, 2021 • edited Loading

skirpichev commented Mar 7, 2021 •

edited

Loading

skirpichev commented Mar 11, 2021 •

edited

Loading

skirpichev commented Mar 18, 2021 •

edited by bedevere-bot

Loading

tim-one Mar 22, 2021 •

edited

Loading

skirpichev commented Mar 22, 2021 •

edited

Loading