Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

Merged
merged 13 commits into from
Mar 22, 2021
Merged

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

merged 13 commits into from
Mar 22, 2021

Conversation

skirpichev
Copy link
Member

@skirpichev skirpichev commented Mar 7, 2021

making Fraction class more usable for large arguments (>~ 10**6),
with cost 10-20% for small components case.

References:
https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps
https://gmplib.org/, e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c

https://bugs.python.org/issue43420

@skirpichev
Copy link
Member Author

There was a test failure in tkinter tests, seems to be random. Not sure if it's related to the PR.

Lib/fractions.py Outdated Show resolved Hide resolved
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct to me (apart from the unnecessary negativity check in _mul that @rhettinger pointed to). I'm less convinced that this is a desirable change, but I'll comment on the issue for that.

@rhettinger
Copy link
Contributor

After looking at Tim's comments, I think this should be limited to just _mul and _div.

@tim-one
Copy link
Member

tim-one commented Mar 11, 2021

Actually, I think the optimization is generally more valuable for + -. For example, consider using Fraction instances derived from floats. All denominators are powers of 2 then, and the gcd of the incoming denominators is never 1 (unless one happens to be an exact integer). So the product of the denominators is (almost) never in lowest terms for + - (although naive * is always in lowest terms then: odd/power_of_2 * odd/power_of_2 = odd/power_of_2).

The optimization is really in two parts, and the first is taught to kids without major trouble: when adding fractions, first convert both to use the LCM of the operands' denominators (for example, in 1/6 + 1/3, LCM(6, 3) = 6 so convert to 1/6 + 2/6 - nobody in real life converts to 3/18 + 6/18 instead!).

The second part is what requires some subtle reasoning: realizing that any common factor between the top and bottom of the sum must be a factor of the original gcd of the incoming denominators. There's no need at all to look at the LCM (6). In fact, there was really no need to compute the LCM to begin with (& the optimized code does not).

The stuff in that paragraph generally isn't taught to kids, but it's not really deep, just unfamiliar to most and non-obvious.

How about if I wrote a coherent proof using the PR's variable names?

@skirpichev
Copy link
Member Author

I think this should be limited to just _mul and _div.

@rhettinger, that makes the change useless for it's potential consumers, e.g. SymPy. (Correct me, please, @asmeurer.) One situation, that trigger the change - sum (or difference) of unrelated fractions (which may occur already in the statistics module of the stdlib - the only other practical use case for Fraction, that was presented so far). Profit example was right at the beginning of the issue.

Ok, you are -1 so far, correct?

Also, I doubt that "while the optimizations for * and / are close to self-evident, those for + and - are much subtler" is a good argument. "Code obscurity" covers not just _add/sub, but others modified methods as well. I think, it's pretty obvious what is going on (i.e. it doesn't make sense to describe algorithms in comments, right?). Why it is so - the question of algorithms correctness, and there is the reference came to play.

I doubt that python could restrict itself to self-evident (middle-school?) algorithms to be useful. In the issue thread were examples when it's not. Why this case is a special?

How about if I wrote a coherent proof using the PR's variable names?

@tim-one, maybe naming convention should be adapted to follow Knuth more closely (u and u' for numerators, v and v' for denominators and so on), instead of being coherent with the rest of module?

LCM

s/_/GCD/ )

@tim-one
Copy link
Member

tim-one commented Mar 11, 2021

Sergey, it doesn't work to tell people they can't be concerned about what they are concerned about 😉. People will be maintaining this code probably long after you go on to something else, and clarity is a legitimate concern. The current code in this module is almost all not just simple, but dead obvious to anyone who knows what rationals are. In cases where it's not, comments are currently used to try to help readers along (e.g., in __hash__() and limit_denominator()).

This is especially acute for modules written in Python (rather than in C): we expect curious, but "ordinary", Python users to be able to learn from studying std Python modules.

Toward that end, it's best if we don't expect them to have a stack of reference books handy either. Rewriting code here to use more Knuth-like notation would only make some sense if we expected the user to have Knuth handy - but they overwhelmingly won't, and it's much more harmful in context to use naming conventions that have nothing in common with what the module's other methods use.

There's nothing wrong about writing a short, self-contained explanation for non-obvious code. Your "I think, it's pretty obvious what is going on" was demolished long ago: it wasn't obvious to Mark, Raymond, or to me at first, and combined we have at least 50 years' experience in numeric software. It doesn't matter whether it's pretty obvious to you unless you can promise to personally maintain the code forever 😉.

If this required deep theory, sure, writing a small book would be unreasonable. But it doesn't.

About LCM vs GCD, no, I meant LCM everywhere I wrote LCM. LCM(a, b) = a * b / GCD(a, b), their least common multiple.

@skirpichev
Copy link
Member Author

it doesn't work to tell people they can't be concerned about what they are concerned

My reasonings were, rather, about how to workaround these concerns. Now I see your point: just references are not enough for Python code but might be ok for C parts. Fine.

There's nothing wrong about writing a short, self-contained explanation for non-obvious code.

So, are you suggest moving the proofs of correctness from i.e. Knuth? For all methods or just _add/sub? Or should I also explain why these methods are better than middle-school variants? (That might require a small book, e.g. to explain ~60% probability for the gcd=1 branches, see theorem 4.5.2D in TAOCP II.)

BTW, _add()/_sub() code follows same pattern. Do you think it does make sense to unify them as gmplib did or this doesn't belong to the current PR?

unless you can promise to personally maintain the code forever

(Moscow's bus can hit me.)

@skirpichev
Copy link
Member Author

skirpichev commented Mar 11, 2021

(
Here is a version, where _add/sub() were merged:

    def _add_sub_(a, b, pm=int.__add__):
        na, nb = a.numerator, b.numerator
        da, db = a.denominator, b.denominator
        g = math.gcd(da, db)
        if g == 1:  
            return Fraction(pm(na * db, da * nb), da * db, _normalize=False)
        else:
            q1, q2 = da // g, db // g
            t = pm(na * q2, nb * q1)
            g2 = math.gcd(t, g)
            return Fraction(t // g2, (da // g) * (db // g2), _normalize=False)

    _add = functools.partial(_add_sub_)
    _add.__doc__ = 'a + b'
    __add__, __radd__ = _operator_fallbacks(_add, operator.add)

    _sub = functools.partial(_add_sub_, pm=int.__sub__)
    _sub.__doc__ = 'a - b'
    __sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)

Also, the module is inconsistent in accessing numerator/denominator's. Sometimes public methods are used, sometimes private attributes (which is slightly faster). This holds for arithmetic methods too (e.g. __mod__ vs __pow__). Should I turn changed methods to use the second variant?

UPD: using attributes does affect simple benchmarks comparable wrt new algorithms. E.g. (patched2 is the version with _add_sub_ AND using attributes):

$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
50000 loops, best of 11: 9.64 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
20000 loops, best of 11: 12.1 usec per loop
$ ./python -m timeit -r11 -s 'from patched2 import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a + b'
20000 loops, best of 11: 10.7 usec per loop
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
50000 loops, best of 11: 9.25 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
20000 loops, best of 11: 11.6 usec per loop
$ ./python -m timeit -r11 -s 'from patched2 import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
50000 loops, best of 11: 9.85 usec per loop

All this may be considered as off-topic the wrt major issue, i.e. documentation for algorithms.
)

@skirpichev
Copy link
Member Author

Ok, I've managed to add some remarks around the code. Let me know how bad it is.

@tim-one
Copy link
Member

tim-one commented Mar 13, 2021

Thanks, Sergey! I think that helps a lot. It's more of a memory jog for someone who already worked through a detailed proof, and I'd be wordier, but it's plenty to get a motivated reader on the right track.

This is - and always has been - the simplest way to implement _sub:

def _sub(a, b):
    """a - b"""
    return a + -b

Trivial to code, trivial to understand, and self-evidently correct. But the pain we've always been willing to endure to avoid the implied __neg__() and __add__() calls shows that we don't really value triviality and obviousness over speed 😉.

@sweeneyde
Copy link
Member

I think it might be nice to write down the sort of thing that someone would write on their scratch paper when verifying this (hopefully eliminating the need for scratch paper). Maybe even some ascii art, like:

    # Let g = gcd(da, db). Then
    # 
    #     na   nb    na*(db//g)   nb*(da//g)
    #     -- + -- == ---------- + ----------
    #     da   db    da*(db//g)   db*(da//g)
    #             
    #                na*(db//g) + nb*(da//g)
    #             == -----------------------
    #                       da*db//g
    # 
    # When g is large, this can help decrease the size of the
    # multiplications required.
    # 
    # We'll special-case g == 1, since 60.8% of randomly-chosen
    # integers are coprime.
    # 
    # We can also optimize the normalization of the resulting
    # fraction. Note that the denominator da*db//g can be re-written
    # as (da//g)*(db//g)*g. Helpfully,
    # 
    #     gcd(da//g, numerator) == gcd(da//g, na*(db//g) + nb*(da//g))
    #                           == gcd(da//g, na*(db//g))
    #                           == 1.
    # 
    # The last equality holds because:
    # 
    #     (da//g, na) share no factors: the input was normalized.
    #     (da//g, db//g) share no factors: common ones are removed.
    # 
    # For the same reason, gcd(db//g, numerator) == 1. Applying these,
    # 
    #     gcd(numerator, denominator)
    #         == gcd(numerator, (da//g)*(db//g)*g)
    #         == gcd(numerator, g).
    # 
    # This is useful because the unnormalized denominator could be
    # much larger than g.

@tim-one
Copy link
Member

tim-one commented Mar 13, 2021

the module is inconsistent in accessing numerator/denominator's. Sometimes public methods are used, sometimes private attributes (which is slightly faster).

I have no idea why these are inconsistent, and can detect no coherent pattern; e.g., __floor__() uses the public properties, but __trunc__() uses the private attributes.

Perhaps there's some obscure hidden reasoning about consequences for subclasses, but, if so, I can't guess what.

In the absence of someone clarifying, I'd go with the faster way (private attributes) in code you write, and leave all other code alone.

@skirpichev
Copy link
Member Author

This is - and always has been - the simplest way to implement _sub

Sure) And this is really slow for small components case. So, @tim-one, do you think I should keep _add_sub_()?

I think it might be nice to write down the sort of thing that someone would write on their scratch paper [...]

Not mine) I think, equality of fractions to middle-school variants - is pretty obvious. That is not trivial (or not too trivial): why outputs are normalized (that is that I tried to document). @tim-one, what do you think on this version?

Perhaps there's some obscure hidden reasoning about consequences for subclasses

I suspect, this is for handling int's. This could be solved by type-casting to Fraction's (with a speed penalty for mixed cases).

@tim-one
Copy link
Member

tim-one commented Mar 13, 2021

Yes, I'd prefer something wordier, probably at module level so it doesn't overwhelm the code in the method. Just FYI, this is what I wrote for myself:

The addends are a = an/ad where an and ad are coprime, and similarly for b = bn/bd. The sum is

(an * bd + bn * ad) / (ad * bd)   [1]

reduced to lowest terms. For speed, some transformations are done first, to allow working with smaller integers. These are related to the elementary textbook approach of using the least common multiple of ad and bd as the initial denominator.

First let

g = gcd(ad, bd)
ad' = ad / g
bd' = bd / g

Note that ad' and bd' are coprime (if not, they would have some common non-unit factor f, and then g * f would be a larger divisor of ad and bd than g is, contradicting that g is their greatest common divisor).

Then [1] becomes

(an * bd' * g + bn * ad' * g) / (ad * bd' * g)

Top and bottom can be divided by g to give the equivalent

(an * bd' + bn * ad') / (ad * bd') [2]

Note: the denominator of [2] is the least common multiple of ad and bd.

Now, if g > 1, we're working with smaller integers. Reducing to lowest terms is also sped, by the following. Any prime p dividing the top and the bottom of [2] must divide at least one of ad', g, and,bd' (the product of which all is the denominator).

Suppose p divides ad'. To divide the numerator, p must also divide an * bd'. But as noted above, ad' and bd' are coprime, so p cannot divide bd'. And since ad' divides ad, and ad was coprime to an on entry, p cannot divide an either. Since p doesn't divide either factor, it doesn't divide their product, and the numerator is not divisible by p.

By symmetry, p cannot divide bd' either.

Therefore, any common factor must divide g. The largest such possible is

g' = gcd(an * bd' + bn * ad', g)

and then

((an * bd' + bn * ad') / g') / (ad' * (bd / g')) [3]

is the result in lowest terms.

As a special case worth catering to, if g = 1 near the start, it follows that ad' = ad, bd' = bd, and g' = 1, so the lowest-terms [3] is already computed directly by [1] in this case.

@skirpichev
Copy link
Member Author

The drawback of big module-level comment is that it's may be harder to keep one in sync with the code.

@tim-one
Copy link
Member

tim-one commented Mar 13, 2021

The drawback of big module-level comment is that it's may be harder to keep one in sync with the code.

Not really here. The code is tiny. Just plant a comment at the top of the code reminding maintainers to keep the block comment in synch with the code.

Which is academic anyway, since the code isn't going to change again 😉 - seriously, there are no major optimizations left to implement then. Just random low-value junk nobody will bother with, like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest based on the bit lengths of da, db, g, and g2.

@skirpichev
Copy link
Member Author

like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest

I pick the first because of the referenced book. First version used the last variant, as SymPy - which may be also less efficient (one multiplication more).

@skirpichev
Copy link
Member Author

@tim-one, let me know what do you think about wordier versions.

Is the class-level comment ok? If so, probably I could rebase code. I think, there should be two commits: one which introduces _add_sub_() helper and second for the rest.

Lib/fractions.py Outdated
x2 = math.gcd(nb, da)
# Resulting fraction is normalized, because gcd(na*nb, da*db) == x1*x2.
# Indeed, any prime power, that divides both products on the left must
# either (because fractions are normalized, i.e. na and db - coprime):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, here was a typo. Meant da.

Lib/fractions.py Outdated Show resolved Hide resolved
@tim-one
Copy link
Member

tim-one commented Mar 13, 2021

like picking one of q1 * (db // g2), q2 * (da // g2), and q1 * q2 * (g // g2) depending on which spelling is expected to run fastest

I pick the first because of the referenced book. First version used the last variant, as SymPy - which may be also less efficient (one multiplication more).

No spelling is always fastest - depends on the relative sizes. Knuth's is fine. I think SymPy's requires unusual inputs to be fastest.

For example, suppose denominators each have a million digits (in whatever base the underlying implementation uses), g has 1000 digits, and g2 has 999, so that g // g2 is just 1 digit.

Then g // g2 is very cheap, with a 1-digit result. Multiplying by a single digit is also relatively cheap.

In contrast, db // g2 or da // g2 are very expensive, with quotients close to a million digits, each digit of which, in a textbook division algorithm, requires at least a 999-by-1 digit multiplication and shifted subtract of the product.

But g2 being "large" is probably uncommon.

I think, there should be two commits: one which introduces _add_sub_() helper and second for the rest.

I don't see a real point to splitting it, but suit yourself.

making Fraction class more usable for large arguments (>~ 10**6),
with cost 10-20% for small components case.

Before:
$ ./python -m timeit -s 'from fractions import Fraction as F' \
-s 'a=[F(1, _**3) for _ in range(1, 1000)]' 'sum(a)'
5 loops, best of 5: 81.2 msec per loop

After:
$ ./python -m timeit -s 'from fractions import Fraction as F' \
-s 'a=[F(1, _**3) for _ in range(1, 1000)]' 'sum(a)'
10 loops, best of 5: 23 msec per loop

References:
Knuth, TAOCP, Volume 2, 4.5.1,
https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps,
https://gmplib.org/ (e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c)
That's slightly faster for small components.

Before:
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b'
20000 loops, best of 11: 12.1 usec per loop
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = 7' 'a + b'
20000 loops, best of 11: 11.4 usec per loop

After:
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b'
50000 loops, best of 11: 9.74 usec per loop
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = 7' 'a + b'
20000 loops, best of 11: 16.5 usec per loop

On the master:
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = F(6, 5)' 'a + b'
50000 loops, best of 11: 9.61 usec per loop
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' \
-s 'a = F(10, 3)' -s 'b = 7' 'a + b'
50000 loops, best of 11: 9.11 usec per loop
Lib/fractions.py Outdated Show resolved Hide resolved
@tim-one
Copy link
Member

tim-one commented Mar 17, 2021

Thanks, Mark! I concur. We do need to cover all branches in the test suite, and I overlooked that.

Do you have an opinion on the other (now reverted) material change to how mixed-type arithmetic is implemented, to wrap ints in new Fraction objects so the four changed methods can access fields via private attributes (like ._numerator) instead of the current properties (like .numerator - no leading underscore - which Fraction and int both support)? Using properties (the status quo) effectively imposes the overhead of 4 method calls on each binary operation (to access the two numerators and two denominators).

I don't much care either way, but private attributes do run faster than properties. Perhaps I'm just hyper-sensitive to Raymond's "spread" aversion 😉. The downside to the change is that ints would need to be wrapped in temporary new Fraction objects.

Another: do

    from math import gcd as _gcd

near the top and use _gcd everywhere instead of math.gcd. Would save one dynamic level of namespace lookup on each use.

Another where I don't much care either way. I want the major speedups for "large" rationals the PR buys, but just don't care much about marginal speed changes for "small" rationals.

It is, though, thoroughly legitimate if you (and/or Raymond) do - I won't fight either way.

@mdickinson
Copy link
Member

Do you have an opinion on the other (now reverted) material change [...]

No strong opinion either way.

Importing math.gcd as _gcd seems worth doing.

@skirpichev
Copy link
Member Author

We do need to cover all branches in the test suite, and I overlooked that.

Indeed, @tim-one, one branch was uncovered (at least by test_fractions.py & test_math.py). I had a prejudice that coverage testing is a part of the CPython CI. But it seems there are codecov.io profiles only for C code...

from math import gcd as _gcd

Sounds good for me, but I'm not sure now if it belongs to this pr (just like using private attributes)...

@tim-one
Copy link
Member

tim-one commented Mar 18, 2021

from math import gcd as _gcd

Sounds good for me, but I'm not sure now if it belongs to this pr (just like using private attributes)...

Still want to hear from Raymond. His objections here have been too telegraphic for me to flesh out for him, and he seems disinclined to clarify them, but his opinion remains valuable to me.

For me, I want the "huge wins" here, so suggest leaving this be for now. If Raymond doesn't say more by the end of the weekend, I intend to merge this. Micro-optimizations can wait.

Note that, on the BPO report, Mark was the strongest champion of not slowing "small rational" cases, but now, most recently here, he has "no strong opinion either way". Peoples' opinions do change over time, provided they stay engaged and consider more of the issues. I'm another: I was +0 on the BPO report, but am +1 now that there's code to try and I see major speedups in snippets of old actual code I've tried it on.

@skirpichev
Copy link
Member Author

skirpichev commented Mar 18, 2021

Still want to hear from Raymond. ... his opinion remains valuable to me.

And for me as well...

there's code to try and I see major speedups in snippets of old actual code I've tried it on.

BTW, in case I'll not rebase PR before merge, I think it's time to update benchmarks for 430e476.

Perhaps, something like following may be fine for a merge commit:

[bpo-43420](https://bugs.python.org/issue43420): Simple optimizations for Fraction's arithmetics

making Fraction class more usable for large arguments, with cost 10-20%
for small components case.

On my system, for the master:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3, 4); b = F(5, 7)' 'a + b'
  50000 loops, best of 5: 9.68 usec per loop
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3236111155551492222222237492, 4911111118982374922222238497)' \
  -s 'b = F(3555555555555555523649237492, 4989823777777777777774938497)' 'a + b'
  20000 loops, best of 5: 15.9 usec per loop
while on the branch:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3, 4); b = F(5, 7)' 'a + b'
  20000 loops, best of 5: 11.1 usec per loop
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = F(3236111155551492222222237492, 4911111118982374922222238497)' \
  -s 'b = F(3555555555555555523649237492, 4989823777777777777774938497)' 'a + b'
  20000 loops, best of 5: 14.8 usec per loop

The crosspoint, where speeds are comparable - when components are around
10**10.  While this seems big, it isn't.

The main reason for this is that e.g. the numerator and denominator of the sum
of two rational numbers is usually of the same order of magnitude
as those of their product.  And real use-cases of the fractions module include
repeated application of arithmetic functions, e.g. summation of a series (as
statistics.mean do).

Lets compute 1000-th harmonic number, for example.

On the master:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = [F(1, _) for _ in range(1, 1000)]' 'sum(a)'
  10 loops, best of 5: 28.7 msec per loop
On the branch there is 2x speedup:
  $ ./python -m timeit -s 'from fractions import Fraction as F' \
  -s 'a = [F(1, _) for _ in range(1, 1000)]' 'sum(a)'
  20 loops, best of 5: 15.7 msec per loop

References:
Knuth, TAOCP, Volume 2, 4.5.1,
https://www.eecis.udel.edu/~saunders/courses/822/98f/collins-notes/rnarith.ps,
https://gmplib.org/ (e.g. https://gmplib.org/repo/gmp/file/tip/mpq/aors.c)

na, da = a.numerator, a.denominator
nb, db = b.numerator, b.denominator
g1 = math.gcd(na, db)
if g1 > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed. The cost of a test is in the same ballpark as the cost of a division. The code is clearer if you just always divide. The performance gain mostly comes from two small gcd's being faster than a single big gcd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed.

Maybe, I didn't benchmark this closely, but...

The cost of a test is in the same ballpark as the cost of a division.

This is a micro-optimization, yes. But ">" has not same cost as "//" even in the CPython:

$ ./python -m timeit -s 'a = 1' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 82.3 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111111' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 86.1 nsec per loop
$ ./python -m timeit -s 'a = 11111' -s 'b = 1' 'a // b'
2000000 loops, best of 5: 120 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 203 nsec per loop
$ ./python -m timeit -s 'a = 1111111111111111111111111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 251 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111122222222222222111111111111111' \
 -s 'b = 1' 'a // b'
1000000 loops, best of 5: 294 nsec per loop

Maybe my measurements are primitive, but there is some pattern...

The code is clearer if you just always divide.

I tried to explain reasons for branching in the comment above code. Careful reader might ask: why you don't include same optimizations in the _mul as in _add/sub?

On another hand, gmplib doesn't use thise optimizations (c.f. same in mpz_add/sub). SymPy's PythonRational class - too.

So, I did my best in defending those branches and I'm fine with removing, if @tim-one has no arguments against.

The performance gain mostly comes from two small gcd's being faster than a single big gcd.

This is more important, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of a test is in the same ballpark as the cost of a division.

? Comparing g to 1 takes, at worst, small constant time, independent of how many digits g has. But n // 1 takes time proportional to the number of internal digits in n - it's a slow, indirect way of making a physical copy of n. Since we expect g to be 1 most of the time, and these are unbounded ints, the case for checking g > 1 seems clear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way.

If it were just me, I would leave out the '> 1' tests because:

  1. For small inputs, the code is about 10% faster without the guards.
  2. For random 100,000 bit numerators and denominators where the gcd's were equal to 1, my timings show no discernible benefit.
  3. The code and tests are simpler without the guards.

Copy link
Member

@tim-one tim-one Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just doesn't make much sense to me. On my box just now, with the released 3.9.1:

C:\Windows\System32>py -m timeit -s "g = 1" "g > 1"
10000000 loops, best of 5: 24.2 nsec per loop

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "g > 1; t // g"
10000 loops, best of 5: 32.4 usec per loop

So doing the useless division not only drove the per-iteration measure from 24.2 to 32.4, the unit increased by a factor of 1000(!), from nanoseconds to microseconds. That's using, by construction, an exactly 100,001 bit numerator.

And, no, removing the test barely improves that relative disaster at all:

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "t // g"
10000 loops, best of 5: 32.3 usec per loop

BTW, as expected, boost the number of numerator bits by a factor of 10 (to a million+1), and per-loop expense also becomes 10 times as much; cut it by a factor 10, and similarly the per-loop expense is also cut by a factor of 10.

With respect to the BPO message you linked to, it appeared atypical: both gcds were greater than 1: (10 / 3) * (6 / 5), so there was never any possibility of testing for > 1 paying off (while in real life a gcd of 1 is probably most common, and when multiplying Fractions obtained from floats gcd=1 is certain unless one of the inputs is an even integer (edited: used to say 0, but it's broader than just that)).

@rhettinger
Copy link
Contributor

Tim, I don't have a technical objection. I just stepped aside because at an emotional level I don't like the code and wouldn't enjoy showing it to other people. Unlike earlier drafts, it is no longer self-evidently correct and probably does need the lengthly comment. We do that in C code but usually aim for pure Python code to be more forthright.

@skirpichev
Copy link
Member Author

@rhettinger, sorry for questions. But what's exactly doesn't look "self-evidently correct"? Isn't it's clear, that "usual" numerators and denominators (e.g. for sum) were divided on same quantity? Thus, it's the same fraction. (My opinion, that this should be evident to any careful reader.)

Or do you think this is obvious for _mul() case and not - for _add()?

Or it's not clear, that the resulting fraction is normalized? (That's a non-trivial part, IMHO, that could be explained inline.)

@tim-one
Copy link
Member

tim-one commented Mar 20, 2021

Sergey, you should also add your name to the Misc/ACKS file.

@skirpichev
Copy link
Member Author

skirpichev commented Mar 20, 2021 via email

@skirpichev
Copy link
Member Author

skirpichev commented Mar 20, 2021 via email

@tim-one
Copy link
Member

tim-one commented Mar 21, 2021

Sergey, it seems clear to me that Raymond's objection cannot be overcome. +- is not obvious to him, or, indeed, to anyone - where "obvious" emphatically includes that the computed result is directly in lowest terms. Without that, it's not even code for +- in context.

So he unassigned himself, and doesn't want to give his blessing. That's OK - we don't always agree on everything. To me, the "non-obviousness" is more-than-less shallow, everyone here who thought about it figured out "why" for themselves, and the potential for major compensating speedups is real in non-contrived cases.

In other news, earlier you appeared to suggest a quite long commit comment. We don't really do those anymore. In the early days, sure, but now we have BPO and Github to record discussions. Commit messages correspondingly have shrunk to essentials. In this case, I had more in mind

Implement standard transformations in + - * / that can often reduce the size of intermediate integers needed. For rationals with large components, this can yield dramatic speed improvements, but for small rationals can run 10-20% slower, due to increased fixed overheads in the longer-winded code. If those slowdowns turn out to be a problem, see the PR discussion for low-level implementation tricks that could cut other fixed overheads.

@skirpichev
Copy link
Member Author

Sergey, it seems clear to me that Raymond's objection cannot be overcome.

I'll do my last attempt to address his critique.

+- is not obvious to him

Maybe I can make it obvious? Here is another version with inline comments:

diff --git a/Lib/fractions.py b/Lib/fractions.py
index de3e23b759..99511c7b37 100644
--- a/Lib/fractions.py
+++ b/Lib/fractions.py
@@ -380,32 +380,96 @@ def reverse(b, a):
 
         return forward, reverse
 
+    # See Knuth, TAOCP, Volume 2, 4.5.1 for rational arithmetic algorithms.
+
     def _add(a, b):
         """a + b"""
-        da, db = a.denominator, b.denominator
-        return Fraction(a.numerator * db + b.numerator * da,
-                        da * db)
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g = math.gcd(da, db)
+        if g == 1:
+            # Randomly-chosen integers are coprime with probability
+            # 6/(pi**2) ~ 60.8%.  Here also g2 == 1 and the result is
+            # already normalized (see below).
+            return Fraction(na * db + da * nb, da * db, _normalize=False)
+        # First, divide both numerator and denominator of the sum by g.  Note
+        # that the denominator now is d = (da//g)*db == (da//g)*(db//g)*g.
+        s = da // g
+        n = na * (db // g) + nb * s
+        # To optimize normalization of the fraction n/d, note that its common
+        # term g2 = gcd(n, d) == gcd(t, (da//g)*(db//g)*g) == gcd(t, g),
+        # because t, (da//g) and (db//g) are pairwise coprime.
+        #
+        # Indeed, (da//g) and (db//g) share no common factors (they were
+        # removed) and da is coprime with na (since input fractions are
+        # normalized), hence (da//g) and na are coprime.  By symmetry,
+        # (db//g) and nb are coprime too.  Then,
+        #
+        #     gcd(t, da//g) == gcd(na*(db//g), da//g) == 1
+        #     gcd(t, db//g) == gcd(nb*(da//g), db//g) == 1
+        g2 = math.gcd(n, g)
+        if g2 == 1:
+            return Fraction(n, s * db, _normalize=False)
+        return Fraction(n // g2, s * (db // g2), _normalize=False)
 
     __add__, __radd__ = _operator_fallbacks(_add, operator.add)
 
     def _sub(a, b):
         """a - b"""
-        da, db = a.denominator, b.denominator
-        return Fraction(a.numerator * db - b.numerator * da,
-                        da * db)
+        # Same as _add() with negated nb.
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g = math.gcd(da, db)
+        if g == 1:
+            return Fraction(na * db - da * nb, da * db, _normalize=False)
+        s = da // g
+        n = na * (db // g) - nb * s
+        g2 = math.gcd(n, g)
+        if g2 == 1:
+            return Fraction(n, s * db, _normalize=False)
+        return Fraction(n // g2, s * (db // g2), _normalize=False)
 
     __sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)
 
     def _mul(a, b):
         """a * b"""
-        return Fraction(a.numerator * b.numerator, a.denominator * b.denominator)
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g1 = math.gcd(na, db)
+        if g1 > 1:
+            na //= g1
+            db //= g1
+        g2 = math.gcd(nb, da)
+        if g2 > 1:
+            nb //= g2
+            da //= g2
+        # The result is normalized, because each of two factors in the
+        # numerator is coprime to each of the two factors in the denominator.
+        #
+        # Indeed, pick (na//g1).  It's coprime with (da//g2), because input
+        # fractions are normalized.  It's also coprime with (db//g1), because
+        # common factors are removed by g1 == gcd(na, db).
+        return Fraction(na * nb, db * da, _normalize=False)
 
     __mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)
 
     def _div(a, b):
         """a / b"""
-        return Fraction(a.numerator * b.denominator,
-                        a.denominator * b.numerator)
+        # Same as _mul(), with inversed b.
+        na, da = a.numerator, a.denominator
+        nb, db = b.numerator, b.denominator
+        g1 = math.gcd(na, nb)
+        if g1 > 1:
+            na //= g1
+            nb //= g1
+        g2 = math.gcd(db, da)
+        if g2 > 1:
+            da //= g2
+            db //= g2
+        n, d = na * db, nb * da
+        if d < 0:
+            n, d = -n, -d
+        return Fraction(n, d, _normalize=False)
 
     __truediv__, __rtruediv__ = _operator_fallbacks(_div, operator.truediv)

@tim-one, I hope you will be ok with this version anyway. It's much smaller and, I think, it does explain all non-trivial parts inline (reasons for single-out branches for g2 in _add and g1, g2 in _mul are less important).

For rationals with large components, this can yield dramatic speed improvements

This doesn't explain why large components are common in computations (which was a non-obvious thing e.g. for Mark in the issue thread).

@tim-one
Copy link
Member

tim-one commented Mar 21, 2021

Sergey, it seems clear to me that Raymond's objection cannot be overcome.

I'll do my last attempt to address his critique.

+- is not obvious to him

Maybe I can make it obvious? Here is another version with inline comments:

Why? You've already tried several times, and made no progress. It's just plain not obvious. Which Raymond (& I, & everyone else) keep telling you. Believe us 😉. It's not that we're stupid - it's that it's not obvious. It requires thought. Period.

The new code is less readable to me, because it mixes up lines of comments with lines of code, making it harder to follow each. It's a wall-of-text.

Note that Raymond didn't complain about * or / - those were obvious to him (and everyone else). Indeed, he wrote much the same code himself in the BPO report. If you removed the explanations for those entirely, I doubt anyone would object.

For rationals with large components, this can yield dramatic speed improvements

This doesn't explain why large components are common in computations (which was a non-obvious thing e.g. for Mark in the issue thread).

I believe you misread Mark there. He's quite aware of that rationals can have arbitrarily large components. What he was pushing back against on the BPO report is that statistics._sum() was a motivating example. Converting from floats leaves exact power-of-2 denominators, and those don't "blow up" under rational summation. Similarly for converting from Decimal.

Yes, we know statistics isn't limited to that. But "motivating examples" need to be things people actually do, not just hypothetical possibilities. It's enough for me, e.g., that factorials are ubiquitous in denominators of power series expansions. Sums of those do "blow up".

@skirpichev
Copy link
Member Author

skirpichev commented Mar 21, 2021 via email

@tim-one
Copy link
Member

tim-one commented Mar 21, 2021

Inline comments work fine when code is extensive and naturally breaks into phases. The code here is trivially small. The code is lost in the comments. It's not at all the code that's "not obvious", it's "the math" justifying the code. Consider that any paper you may care to cite also separates the justification from "the code" for algorithms this tiny.

Nevertheless, I'm worn out by the repetitive argument here. If other people say they like the squashed version better, I won't fight it. But, short of that, I'd prefer to leave it well enough alone.

@tim-one tim-one merged commit 690aca7 into python:master Mar 22, 2021
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 RHEL7 3.x has failed when building commit 690aca7.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/539/builds/760) and take a look at the build logs.
  4. Check if the failure is related to this commit (690aca7) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/539/builds/760

Summary of the results of the build (if available):

== Tests result: ENV CHANGED ==

413 tests OK.

10 slowest tests:

  • test_concurrent_futures: 4 min 6 sec
  • test_unparse: 3 min 37 sec
  • test_capi: 3 min 1 sec
  • test_tokenize: 2 min 44 sec
  • test_multiprocessing_spawn: 2 min 39 sec
  • test_peg_generator: 2 min 35 sec
  • test_asyncio: 2 min 25 sec
  • test_lib2to3: 2 min 13 sec
  • test_multiprocessing_forkserver: 1 min 43 sec
  • test_gdb: 1 min 26 sec

1 test altered the execution environment:
test_asyncio

13 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev
test_startfile test_tix test_tk test_ttk_guionly test_winconsoleio
test_winreg test_winsound test_zipfile64

Total duration: 7 min 12 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 321, in __del__
    self.close()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 316, in close
    self._ssl_protocol._start_shutdown()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 590, in _start_shutdown
    self._abort()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/sslproto.py", line 731, in _abort
    self._transport.abort()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/selector_events.py", line 680, in abort
    self._force_close(None)
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/selector_events.py", line 731, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/base_events.py", line 745, in call_soon
    self._check_closed()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-aarch64/build/Lib/asyncio/base_events.py", line 510, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

@skirpichev skirpichev deleted the opt-fractions branch March 22, 2021 04:18
@skirpichev
Copy link
Member Author

skirpichev commented Mar 22, 2021

Test failure seems to be issue 38912.

hauntsaninja pushed a commit that referenced this pull request Jan 8, 2023
Adapted from
046c84e8f9

This makes arithmetic between Fractions with small components
just as fast as before #24779, at some expense of
mixed arithmetic (e.g. Fraction + int).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants