Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. #46023

Closed
jyasskin mannequin opened this issue Dec 21, 2007 · 92 comments
Closed

Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. #46023

jyasskin mannequin opened this issue Dec 21, 2007 · 92 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jyasskin
Copy link
Mannequin

jyasskin mannequin commented Dec 21, 2007

BPO 1682
Nosy @gvanrossum, @rhettinger, @facundobatista, @mdickinson, @ncoghlan, @wm75
Files
  • rational.patch
  • rational.patch: V2, ported to 2.6
  • rational.patch: V3, still minimal
  • rational_tweaks.patch: Some tweaks on top of the original implementation
  • lehmer_gcd.py: Revised Lehmer gcd algorithm
  • lehmer_gcd_2.py: Benchmark gcd implementations
  • lehmer_gcd.patch
  • fraction_inline_arith.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-07-01.20:35:22.398>
    created_at = <Date 2007-12-21.17:41:30.425>
    labels = ['type-bug', 'library']
    title = 'Move Demo/classes/Rat.py to Lib/fractions.py and fix it up.'
    updated_at = <Date 2014-09-25.11:53:54.268>
    user = 'https://bugs.python.org/jyasskin'

    bugs.python.org fields:

    activity = <Date 2014-09-25.11:53:54.268>
    actor = 'wolma'
    assignee = 'jyasskin'
    closed = True
    closed_date = <Date 2008-07-01.20:35:22.398>
    closer = 'mark.dickinson'
    components = ['Library (Lib)']
    creation = <Date 2007-12-21.17:41:30.425>
    creator = 'jyasskin'
    dependencies = []
    files = ['9021', '9114', '9152', '9201', '9464', '9485', '9486', '9570']
    hgrepos = []
    issue_num = 1682
    keywords = ['patch']
    message_count = 92.0
    messages = ['58949', '59424', '59425', '59582', '59583', '59608', '59614', '59615', '59633', '59635', '59636', '59639', '59640', '59645', '59650', '59651', '59656', '59657', '59659', '59665', '59747', '59755', '59772', '59773', '59870', '59884', '59886', '59958', '59982', '60080', '60081', '60131', '60132', '60133', '60137', '61737', '61995', '62001', '62005', '62010', '62011', '62013', '62028', '62029', '62099', '62189', '62192', '62218', '62222', '62227', '62236', '62249', '62251', '62252', '62253', '62254', '62256', '62257', '62258', '62271', '62305', '62311', '62318', '62320', '62324', '62335', '62337', '62340', '62380', '62401', '62403', '62406', '62547', '62550', '62564', '62566', '62567', '62568', '62671', '62672', '63087', '63126', '68725', '68732', '68757', '68822', '68823', '68843', '68851', '69020', '69027', '69069']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'rhettinger', 'facundobatista', 'mark.dickinson', 'ncoghlan', 'jyasskin', 'wolma']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1682'
    versions = ['Python 2.6', 'Python 3.0']

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Dec 21, 2007

    This is written against the 3.0 branch, but before submitting it, I need
    to backport numbers.py and submit it against 2.6 instead. I'm posting it
    now to get a head start on comments.

    The numbers.py part is necessary for this to work but will be a separate
    patch.

    @jyasskin jyasskin mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 21, 2007
    @gvanrossum gvanrossum self-assigned this Dec 21, 2007
    @mdickinson
    Copy link
    Member

    Some random comments: take with a large pinch of salt

    (1) In __init__ I think you want something like:

      self._numerator = int(numerator)//g

    instead of

      self._numerator = int(numerator / g)

    and similarly for the denominator. At the moment I get, for example:

    >>> Rational(10**23)
    rational.Rational(99999999999999991611392,1)

    (2) What's the reason for repr of a Rational being "rational.Rational(...)", while repr
    of a Decimal is just "Decimal(...)"? I'm not suggesting that either is wrong; just
    wondering whether there's some sort of guiding principle that would suggest one or the
    other.

    (3) My first thought on looking at the _gcd function was: "Shouldn't there be an abs()
    somewhere in there"; since the gcd of two (possibly negative) integers is often
    (usually?) defined as the largest *nonnegative* common divisor. But on closer
    examination it's clear that the result of _gcd(a, b) has the same sign as that of b
    (unless b == 0). So _gcd very cleverly chooses exactly the right sign so that the
    denominator after rescaling is positive. I'm not sure whether this is a happy accident
    or clever design, but either way it probably deserves a comment.

    (4) the __ceil__ operation could be spelt more neatly as

    return -(-a.numerator // a.denominator)

    ...but perhaps that just amounts to obfuscation :)

    (5) It looks as though two-argument round just truncates when the second argument is
    negative. Presmably this isn't what's wanted here?

    >>> round(Rational(26), -1)  # expecting Rational(30, 1)
    rational.Rational(20,1)

    (6) The behaviour of the equality test is questionable when floats are involved. For
    example:

    >>> 10**23 == float(10**23)  # expect False
    False
    >>> Rational(10**23) == float(10**23)  # I'd also expect False here
    True

    Ideally, if x is a Rational and y is a float then I'd suggest that x == y should return
    True only when the numeric values really are equal. Coding this could be quite tricky,
    though. One option would be to convert the float to an (exactly equal) Rational first--
    -there's code to do this in the version of Rational.py in the sandbox.

    (7) For purely selfish reasons, I for one will be very happy if/when this makes it into
    2.6/3.0: I use Python a lot for teaching (geometry, number theory, linear algebra,
    ...); it's natural to work with rational numbers in this context; and it'll be much
    easier to tell an interested student to just download Python than to tell them they also
    need gmp and gmpy, or some other third party package, just to try out the code examples
    I've given them.

    @mdickinson
    Copy link
    Member

    Two more questions:

    (1) should a Rational instance be hashable?
    (2) Should "Rational(5,2) == Decimal('2.5')" evaluate to True or False?

    If Rationals are hashable and 2.5 == Rational(5, 2) == Decimal('2.5') then the
    rule that objects that compare equal should have equal hash is going to make
    life *very* interesting...

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Jan 9, 2008

    Thanks again for the excellent comments.

    __init__: good catch.

    repr(Rational): The rule for repr is "eval(repr(object)) == object".
    Unfortunately, that doesn't decide between the two formats, since both
    assume some particular import statements. I picked the one more likely
    to be unique, and I assume Decimal picked the shorter one. I can go
    either way.

    _gcd's sign: It's a happy accident for me. Possibly Sjoerd Mullender
    designed it that way. I've added a comment and a test.

    __ceil__: I like that implementation better.

    2-argument round: Fixed and tested.

    equality: Very good point. I've stolen the sandbox code and added
    Rational.from_float() using it. I think I also need to make this change
    to the comparisons.

    hashing: oops, yes these should be hashable. Decimal cheats by comparing
    != to even floats that it's equal to, so I'm going to assume that they
    also want Rational(5,2) != Decimal('2.5').

    The new patch is against 2.6.

    @rhettinger
    Copy link
    Contributor

    FWIW, I'm -1 on moving this module to the standard library. There has
    been basically *zero* demand for something like this. Because rational
    implementations seem to be more fun to write than to use, I think there
    are some competing implementations.

    @gvanrossum
    Copy link
    Member

    Raymond, do you *always* have to be such a grinch?

    The mere existance of competing implementations proves that there's a
    need. Hopefully having one in the stdlib will inspire others to
    contribute improvements rather than starting from scratch.

    @rhettinger
    Copy link
    Contributor

    Not sure I'm the grinch on this one. I thought PEPs on this were
    rejected long ago and no one complained afterwards. After years of
    watching newsgroup discussions and feature requests, I do not recall a
    single request or seeing a single problem that was better solved with a
    rational package. On python-help and the tutorial newsgroup, I've
    never seen a question about the existing module in the Tools directory
    or even a question on the topic.

    I think rational modules are ubiquitous for the same reason as Sudoku
    solvers -- they are cute, an easy exercise, and fun to write. There is
    some variation in how each chooses to make compromises so the
    denominators do not get unusably large. Also, there is some variation
    in sophistication of the GCD/LCD algorithm.

    I had thought the standards for inclusion in the standard library had
    risen quite a bit (best-in-class category killers and whatnot). ISTM,
    this is aspiring cruft -- I cannot remember encountering a real
    business or engineering problem that needed rational arithmetic (the
    decimal module seems to meet the rare need for high precision
    arithmetic).

    All that being said, maybe the module with serve some sort of
    educational purpose or will serve to show-off the numeric tower
    abstract base classes.

    @facundobatista
    Copy link
    Member

    The PEP-239 is Rejected (http://www.python.org/dev/peps/pep-0239/).

    If a Rational data type would be included in the stdlib, my
    recommendation is that first that PEP would be reopened and pushed until
    get accepted.

    Also, note the kind of questions Mark is asking here: the analysis and
    answer of those questions belongs to a PEP, not to a patch. We risk to
    not be very rational (1/2 wink) in these decisions.

    I'd close this patch as Rejected (as for the PEP), at least until the
    PEP gets accepted (if ever).

    @gvanrossum
    Copy link
    Member

    The rejection of PEP-239 was years ago. PEP-3141 was accepted; it
    includes adding a rational type to the stdlib, and Jeffrey is doing this
    with my encouragement.

    The motivation is to have at least one example of each type in our
    modest numeric tower in the stdlib. I'm sure there will be plenty of
    uses for rational.py, e.g. in education. Of course, if someone has a
    better candidate or a proposed improvement, speak up now! It looks like
    Mark's suggestions can be treated as code review; I don't think we need
    a new PEP.

    Note that the mere existence of PEP-239 again points out that the demand
    is not zero.

    @rhettinger
    Copy link
    Contributor

    If this goes in, I have some recommendations:

    • Follow the decimal module's lead in assiduously avoiding
      float->rational conversions. Something like Rat.from_float(1.1) is
      likely to produce unexpected results (like locking in an inexact input
      and having an unexpectedly large denominator).

    • Likewise, follow decimal's lead in avoiding all automatic coercisions
      from floats: Rational(3,10) + 0.3 for example. The two don't mix.

    • Consider following decimal's lead on having a context that can limit
      the maximum size of a denominator. There are various strategies for
      handling a limit overflow including raising an exception or finding the
      nearest rational upto the max denominator (there are some excellent
      though complex published algorithms for this) or rounding the nearest
      fixed-base (very easy). I'll dig out my HP calculator manuals at some
      point -- they had worked-out a number of challenges with fractional
      arithmetic (including their own version of an Inexact tag).

    • Consider adding Decimal.from_rational and Rational.from_decimal. I
      believe these are both easy and can be done losslessly.

    • Automatic coercions to/from Decimal need to respect the active decimal
      context. For example the result of Rational(3,11) +
      Decimal('3.1415926') is context dependent and may not be commutative.

    • When in doubt, keep the API minimal so we don't lock-in design mistakes.

    • Test the API by taking a few numerical algorithms and seeing how well
      they work with rational inputs (for starters, try
      http://docs.python.org/lib/decimal-recipes.html ).

    • If you do put in a method that accepts floats, make sure that it can
      accept arguments to control the rational approximation. Ideally, you
      would get something something like this Rational.approximate(math.pi, 6)
      --> 355/113 that could produce the smallest rationalal approximation to
      a given level of accuracy.

    @facundobatista
    Copy link
    Member

    2008/1/9, Raymond Hettinger said:

    • Consider adding Decimal.from_rational and Rational.from_decimal. I
      believe these are both easy and can be done losslessly.

    If it's lossless, why not just allow Decimal(Rational(...)) and
    Rational(Decimal(...))?

    @rhettinger
    Copy link
    Contributor

    If it's lossless, why not just allow
    Decimal(Rational(...)) and Rational(Decimal(...))?

    Those conversions are fine.

    It's the float<-->rational conversions that are problematic. There are
    exact float to rational conversions using math.frexp() but I don't think
    the results tend to match what users expect (since 1.1 isn't exactly
    represented). Also, there may be overflow issues with trying to go from
    rationals to floats when the denomintor is very large.

    I think the history of rational modules is that simplistic
    implementations get built and then the writers find that exploding
    denominators limit their usefulness for anything other than trivial
    problems. The solution is to limit denominators but that involves less
    trivial algorithms and a more sophisticated API.

    @gvanrossum
    Copy link
    Member

    On Jan 9, 2008 4:29 PM, Raymond Hettinger <[email protected]> wrote:

    I think the history of rational modules is that simplistic
    implementations get built and then the writers find that exploding
    denominators limit their usefulness for anything other than trivial
    problems. The solution is to limit denominators but that involves less
    trivial algorithms and a more sophisticated API.

    A "rational" type that limits denominators (presumably by rounding)
    isn't needed -- we alreay have Decimal. The proposed rational type is
    meant for "pure" mathematical uses, just like Python's long ints.

    Regarding float->rational, I propose to refuse Rational(1.1) for the
    same reasons as Decimal(1.1) is refused, but to add a separate
    constructor (a class method?) that converts a float to a rational
    exactly (as long as it isn't nan or inf), large denominators be
    damned. This can help people who are interested in taking floating
    point numbers apart.

    float(Rational()) should be fine.

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Jan 10, 2008

    • Follow the decimal module's lead in assiduously avoiding
      float->rational conversions. Something like Rat.from_float(1.1) is
      likely to produce unexpected results (like locking in an inexact input
      and having an unexpectedly large denominator).

    I agree that it's not usually what people ought to use, and I don't
    think it's an essential part of the API. It might be a useful starting
    point for the approximation methods though. .trim() and .approximate()
    in http://svn.python.org/view/sandbox/trunk/rational/Rational.py?rev=40988&view=auto
    and Haskell's approxRational
    (http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Ratio.html)
    start from rationals instead of floats. On the other hand, it might
    make sense to provide explicit methods to approximate from floats
    instead of asking people to execute the two-phase process. I'm happy
    to give it a different name or drop it entirely.

    • Likewise, follow decimal's lead in avoiding all automatic coercisions
      from floats: Rational(3,10) + 0.3 for example. The two don't mix.

    I'm reluctant to remove the fallback to float, unless the consensus is
    that it's always a bad idea to support mixed-mode arithmetic (which is
    possible: I was surprised by the loss of precision of "10**23/1" while
    writing this). Part of the purpose of this class is to demonstrate how
    to implement cross-type operations. Note that it is an automatic
    coercion _to_ floats, so it doesn't look like you've gotten magic
    extra precision.

    • Consider following decimal's lead on having a context that can limit
      the maximum size of a denominator. There are various strategies for
      handling a limit overflow including raising an exception or finding the
      nearest rational upto the max denominator (there are some excellent
      though complex published algorithms for this) or rounding the nearest
      fixed-base (very easy). I'll dig out my HP calculator manuals at some
      point -- they had worked-out a number of challenges with fractional
      arithmetic (including their own version of an Inexact tag).

    Good idea, but I'd like to punt that to a later revision if you don't
    mind. If we do punt, that'll force the default context to be "infinite
    precision" but won't prevent us from adding explicit contexts. Do you
    see any problems with that?

    • Consider adding Decimal.from_rational and Rational.from_decimal. I
      believe these are both easy and can be done losslessly.

    Decimal.from_rational(Rat(1, 3)) wouldn't be lossless, but
    Rational.from_decimal is easier than from_float. Then
    Decimal.from_rational() could rely on just numbers.Rational, so it
    would be independent of this module. Is that a method you'd want on
    Decimal anyway? The question becomes whether we want the rational to
    import decimal to implement the typecheck, or just assume that
    .as_tuple() does the right thing. These are just as optional as
    .from_float() though, so we can also leave them for future
    consideration.

    • Automatic coercions to/from Decimal need to respect the active decimal
      context. For example the result of Rational(3,11) +
      Decimal('3.1415926') is context dependent and may not be commutative.

    Since I don't have any tests for that, I don't know whether it works.
    I suspect it currently returns a float! :) What do you want it to do?
    Unfortunately, giving it any special behavior reduces the value of the
    class as an example of falling back to floats, but that shouldn't
    necessarily stop us from making it do the right thing.

    • When in doubt, keep the API minimal so we don't lock-in design mistakes.

    Absolutely!

    Good idea. I'll add some of those to the test suite.

    • If you do put in a method that accepts floats, make sure that it can
      accept arguments to control the rational approximation. Ideally, you
      would get something something like this Rational.approximate(math.pi, 6)
      --> 355/113 that could produce the smallest rationalal approximation to
      a given level of accuracy.

    Right. My initial plan was to use
    Rational.from_float(math.pi).simplest_fraction_within(Rational(1,
    10**6)) but I'm not set on that, and, because there are several
    choices for the approximation method, I'm skeptical whether it should
    go into the initial revision at all.

    @rhettinger
    Copy link
    Contributor

    Decimal.from_rational(Rat(1, 3)) wouldn't be lossless

    It should be implemented as Decimal(1)/Decimal(3) and then let the
    context handle any inexactness.

    Rational.from_decimal is easier than from_float.

    Yes. Just use Decimal.as_tuple() to get the integer components.

    Then Decimal.from_rational() could rely on just numbers.
    Rational, so it would be independent of this module.
    Is that a method you'd want on Decimal anyway?

    Instead, just expand the decimal constructor to accept a rational input.

    Regarding float->rational, I propose to refuse Rational(1.1)
    for the same reasons as Decimal(1.1) is refused,

    +1

    but to add a separate constructor (a class method?) that
    converts a float to a rational exactly (as long as it
    isn't nan or inf), large denominators be damned. This
    can help people who are interested in taking floating
    point numbers apart.

    Here's how to pick the integer components out of a float:

    mant, exp = math.frexp(x)
    mant, exp = int(mant * 2.0 ** 53), exp-53

    > * Likewise, follow decimal's lead in avoiding all
    > automatic coercions from floats:
    > Rational(3,10) + 0.3 for example. The two don't mix.

    I'm reluctant to remove the fallback to float,

    You will need some plan to scale-down long integers than exceed the
    range of floats (right shifting the numerator and denominator until they
    fit).

    @rhettinger
    Copy link
    Contributor

    One other thought, the Decimal and Rational constructors can be made to
    talk to each other via a magic method so that neither has to import the
    other (somewhat like we do now with __int__ and __float__).

    @mdickinson
    Copy link
    Member

    Allowing automatic coercion to and from Decimal sounds dangerous and
    complicated to me. Mightn't it be safer just to disallow this (at least for
    now)?

    I think something like trim() (find the closest rational approximation with
    denominator bounded by a given integer) would be useful to have as a Rational
    method, but probably only for explicit use.

    I'm still worried about equality tests: is it possible to give a good reason
    why Decimal("2.5") == Rational(5, 2) should return False, while Rational(5, 2)
    == 2.5 returns True. Can someone articulate some workable principle that
    dictates when an int/float/complex/Rational/Decimal instance compares equal to
    some other int/float/complex/Rational/Decimal instance of possibly different
    type but the same numerical value?

    @gvanrossum
    Copy link
    Member

    All in all, Decimal is the odd man out -- it may never become a full
    member of the Real ABC. The built-in types complex, float and int (and
    long in 2.x) all support mixed-mode arithmetic, and this is one of the
    assumptions of the numeric tower -- and of course of mathematics. The
    new Rational type can be designed to fit in this system pretty easily,
    because it has no pre-existing constraints; but the Decimal type
    defies coercion, and is perhaps best left alone. It's already breaking
    the commonly understood properties of equality, e.g. 1.0 == 1 ==
    Decimal("1") != 1.0.

    --Guido

    On Jan 9, 2008 7:51 PM, Mark Dickinson <[email protected]> wrote:

    Mark Dickinson added the comment:

    Allowing automatic coercion to and from Decimal sounds dangerous and
    complicated to me. Mightn't it be safer just to disallow this (at least for
    now)?

    I think something like trim() (find the closest rational approximation with
    denominator bounded by a given integer) would be useful to have as a Rational
    method, but probably only for explicit use.

    I'm still worried about equality tests: is it possible to give a good reason
    why Decimal("2.5") == Rational(5, 2) should return False, while Rational(5, 2)
    == 2.5 returns True. Can someone articulate some workable principle that
    dictates when an int/float/complex/Rational/Decimal instance compares equal to
    some other int/float/complex/Rational/Decimal instance of possibly different
    type but the same numerical value?


    Tracker <[email protected]>
    <http://bugs.python.org/issue1682\>


    @rhettinger
    Copy link
    Contributor

    Would it be reasonable then to not have any of the numeric tower stuff
    put in the decimal module -- basically leave it alone (no __ceil__,
    etc)?

    @gvanrossum
    Copy link
    Member

    Would it be reasonable then to not have any of the numeric tower stuff
    put in the decimal module -- basically leave it alone (no __ceil__,
    etc)?

    If that's the preference of the decimal developers, sure.

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Jan 11, 2008

    If the consensus is that Decimal should not implement Real, I'll reply
    to that thread and withdraw the patch.

    Raymond, do you want me to add Decimal.__init__(Rational) in this patch
    or another issue/thread?

    I don't understand the comment about scaling down long integers. It's
    already the case that float(Rational(10**23, 10**24 + 7))==0.1.

    Mark, I agree that .trim() and/or .approximate() (which I think is the
    same as Haskell's approxRational) would be really useful. Do you have
    particular reasons to pick .trim? Are those the best names for the
    concepts? I'd also really like to be able to link from their docstrings
    to a proof that their implementations are correct. Does anyone know of one?

    Finally, Decimal("2.5") != Rational(5, 2) because Decimal("2.5") != 2.5
    (so it'd make equality even more intransitive) and hash(Decimal("2.5"))
    != hash(2.5) so we couldn't follow the rule about equal objects implying
    equal hash codes, which is probably more serious. I don't have a
    principled explanation for Decimal's behavior, but given that it's
    fixed, I think the behavior of non-integral Rationals is determined too.
    On the other hand, I currently have a bug where Rational(3,1) !=
    Decimal("3") where the hash code could be consistent. I'll fix that by
    the next patch.

    @rhettinger
    Copy link
    Contributor

    If the consensus is that Decimal should not implement Real,
    I'll reply to that thread and withdraw the patch.

    Thanks. That would be nice.

    Raymond, do you want me to add Decimal.__init__(Rational)
    in this patch

    How about focusing on the rational module and when you've done, I'll
    adapt the Decimal constructor to accept a rational input.

    I don't understand the comment about scaling down long integers.

    My understanding is that you're going to let numerators and denominators
    grow arbitrarily large. When they get over several hundred digits each,
    you will have to scale the down before converting to a float. For
    example when numerator=long('2'*400+'7') and
    denominator=long('3'*400+'1'), the long-->float conversion will
    overflow, so it is necessary to first scale-down the two before
    computing the ratio: scale=325;
    float_ratio=float(numerator>>scale)/float(denominator>>scale)

    @mdickinson
    Copy link
    Member

    More comments, questions, and suggestions on the latest patch:

    1. In _binary_float_to_ratio, the indentation needs adjusting. Also 'top = 0L' could be replaced with 'top =
      0', unless you need this code to work with Python 2.3 and earlier, in which case top needs to be a long so
      that << behaves correctly. Otherwise, this looks good.

    2. Rational() should work, and it should be possible to initialize from a string. I'd suggest that
      Rational(' 1729 '), Rational('-3/4') and (' +7/18 \n') should be acceptable: i.e. leading and trailing
      whitespace, and an optional - or + sign should be permitted. But space between the optional sign and the
      numerator, or on either side of the / sign, should be disallowed. Not sure whether the numerator and
      denominator should be allowed to have leading zeros or not---probably yes, for consistency with int().

    3. I don't think representing +/-inf and nan as Rationals (1/0, -1/0, 0/0) is a good idea---special values
      should be kept out of the Rational type, else it won't be an implementation of the Rationals any more---it'll
      be something else.

    4. hash(n) doesn't agree with hash(Rational(n)) for large integers (I think you already mentioned this
      above).

    5. Equality still doesn't work for complex numbers:

    >>> from rational import *
    >>> Rational(10**23) == complex(10**23)  # expect False here
    True
    >>> Rational(10**23) == float(10**23)
    False
    >>> float(10**23) == complex(10**23)
    True
    1. Why the parentheses around the str() of a Rational?

    2. How about having hash of a Rational (in the case that it's not equal to an integer or a float) be
      given by hash((self.numerator, self.denominator))? That is, let the tuple hash take care of avoiding lots of
      hash collisions.

    @mdickinson
    Copy link
    Member

    About .trim and .approximate:

    it sounds like these are different, but quite closely related, methods: one takes a positive
    integer and returns the best approximation with denominator bounded by that integer; the other
    returns the 'smallest' rational in a given interval centered at the original rational. I guess
    we probably don't need both of these, but I can't give any good reason for preferring one over
    the other.

    I don't have anything to offer about names, either.

    I can try to find out whether the algorithms are published anywhere on the web---certainly,
    neither algorithm should be particularly hard to implement and prove the correctness of; they
    both essentially rely on computing the continued fraction development of the given rational.
    Almost any not-too-basic elementary number theory text should contain proofs of the relevant
    results about continued fractions.

    Am willing to help out with implementing either of these, if that's at all useful.

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Jan 13, 2008

    _binary_float_to_ratio: Oops, fixed.

    Rational() now equals 0, but I'd like to postpone Rational('3/2') until
    there's a demonstrated need. I don't think it's as common a use as
    int('3'), and there's more than one possible format, so some real world
    experience will help (that is, quite possibly not in 2.6/3.0). I'm also
    postponing Rational(instance_of_numbers_Rational).

    +/-inf and nan are gone, and hash is fixed, at least until the next bug.
    :) Good idea about using tuple.

    Parentheses in str() help with reading things like
    "%s**2"%Rational(3,2), which would otherwise format as "3/2**2". I don't
    feel strongly about this.

    Equality and the comparisons now work for complex, but their
    implementations make me uncomfortable. In particular, two instances of
    different Real types may compare unequal to the nearest float, but equal
    to each other and have similar but inconsistent problems with <=. I can
    trade off between false ==s and false !=s, but I don't see a way to make
    everything correct. We could do better by making the intermediate
    representation Rational instead of float, but comparisons are inherently
    doomed to run up against the fact that equality is uncomputable on the
    computable reals, so it's probably not worthwhile to spend too much time
    on this.

    I've added a test that float(Rational(long('2'*400+'7'),
    long('3'*400+'1'))) returns 2.0/3. This works without any explicit
    scaling on my part because long.__truediv__ already handles it. If
    there's something else I'm doing wrong around here, I'd appreciate a
    failing test case.

    The open issues I know of are:

    • Is it a good idea to have both numbers.Rational and
      rational.Rational? Should this class have a different name?
    • trim and approximate: Let's postpone them to a separate patch (I do
      think at least one is worth including in 2.6+3.0). So that you don't
      waste time on them, we already have implementations in the sandbox and
      (probably) a good-enough explanation at
      http://en.wikipedia.org/wiki/Continued_fraction#Best_rational_approximations.
      Thanks for the offer to help out with them. :)
    • Should Rational.from_float() exist and with the current name? If
      there's any disagreement, I propose to rename it to
      Rational._from_float() to discourage use until there's more consensus.
    • Rational.from_decimal(): punted to a future patch. I favor this for
      2.6+3.0.
    • Rational('3/2') (see above)

    I think this is close enough to correct to submit and fix up the
    remaining problems in subsequent patches. If you agree, I'll do so.

    @mdickinson
    Copy link
    Member

    The latest version of rational.py looks good to me---nice work! (I haven't looked properly at
    numbers.py or test_rational.py, yet. I do plan to, eventually.)

    I do think it's important to be able to create Rational instances from strings: e.g., for
    easy reading from and writing to files. But maybe I'm alone in this opinion. You say there's
    more than one possible format---what other formats were you considering?

    And since you pointed it out, I think Rational(Rational(a, b)) should work too.

    There's also the not-entirely-insignificant matter of documentation :)

    Other than that, I don't see why this shouldn't go in.

    Other comments:

    I have a weak preference for no parentheses on the str() of a Rational, but it's no big deal
    either way.

    I agree that equality and comparisons are messy. This seems almost inevitable: one obvious
    cause is that the existing int <-> float comparisons already break the `numeric tower' model
    (push both operands to the highest common type before operating). So I'm not sure there can
    be an easy and elegant solution here :(

    I like the name Rational for this class. Maybe change the name of numbers.Rational instead?

    Postponing trim, approximate, from_decimal sounds fine to me.

    Finally: the very first line of rational.py is, I think, no longer accurate. Please add your
    name so everyone knows who to blame/credit/assign bug reports to :)

    @mdickinson
    Copy link
    Member

    Just had a quick look at numbers.py. Only two comments:

    1. I think there's a typo in the docstring for Inexact: one of those == should be a !=

    2. Not that it really matters now, but note that at least for Decimal, x-y is not the same
      as x+(-y) (I'm looking at Complex.__sub__), and +x is not a no-op (Real.real,
      Real.conjugate). In both cases signs of zeros can be problematic:

    >>> x = Decimal('-0')
    >>> y = Decimal('0')
    >>> x-y
    Decimal("-0")
    >>> x+(-y)
    Decimal("0")
    >>> x
    Decimal("-0")
    >>> +x
    Decimal("0")

    Of course the first point wouldn't matter anyway since Decimal already implements __sub__;
    the second means that if Decimal were to join Real, something would need to be done to
    avoid Decimal("-0").real becoming Decimal("0").

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Feb 14, 2008

    r60785 speeds the benchmark up from 10.536s to 4.910s (on top of
    whatever my __instancecheck__ fix did). Here are the remaining
    interesting-looking calls:

    ncalls tottime percall cumtime percall filename:lineno(function)
    ...
    1 0.207 0.207 4.999 4.999 {sum}
    199996 1.587 0.000 3.419 0.000 fractions.py:58(new)
    99997 0.170 0.000 3.236 0.000 fractions.py:295(forward)
    99998 0.641 0.000 2.881 0.000 fractions.py:322(add)
    99999 0.202 0.000 1.556 0.000 benchmark.py:3(<genexpr>)
    199996 0.829 0.000 0.829 0.000 fractions.py:17(gcd)
    199996 0.477 0.000 0.757 0.000 abc.py:63(new)
    399993 0.246 0.000 0.246 0.000 abc.py:164(instancecheck)
    199996 0.207 0.000 0.207 0.000 {method 'get' of
    'dictproxy' objects}
    100071 0.185 0.000 0.185 0.000 {isinstance}
    399990 0.109 0.000 0.109 0.000 fractions.py:200(denominator)
    200004 0.073 0.000 0.073 0.000 {built-in method __new
    _
    of type object at 0xfeea0}
    199995 0.065 0.000 0.065 0.000 fractions.py:196(numerator)

    @gvanrossum
    Copy link
    Member

    Yay! And my benchmark went from 70 usec to 15 usec. Not bad!

    PS. Never use relative speedup numbers based on profiled code -- the
    profiler skews things tremendously. Not saying you did, just stating
    the obvious. :)

    @gvanrossum
    Copy link
    Member

    PS. I can shave off nearly 4 usec of the constructor like this:

    -        self = super(Fraction, cls).__new__(cls)
    +        if cls is Fraction:
    +            self = object.__new__(cls)
    +        else:
    +            self = super().__new__(cls)

    This would seem to give an upper bound for the gain you can make by
    moving the check for instantiating an abstract class to C. Your call.

    I also found that F(2,3)+F(5,7) takes about 22 usecs (or 18 using the
    above hack).

    Inlining the common case for addition (Fraction+Fraction) made that go
    down to 14 usec (combined with the above hack):

    -    __add__, __radd__ = _operator_fallbacks(_add, operator.add)
    +    __xadd__, __radd__ = _operator_fallbacks(_add, operator.add)
     
    +    def __add__(self, other):
    +        if type(other) is Fraction:
    +            na = self._numerator
    +            da = self._denominator
    +            nb = other._numerator
    +            db = other._denominator
    +            return Fraction(na*db + nb*da, da*db)
    +        return self.__xadd__(other)
    +

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Feb 14, 2008

    Thanks for writing the __add__ optimization down. I hadn't realized how
    simple it was. I think both optimizations are worth it. 22% on a rarely
    used class is worth 24 lines of python, and I think nearly eliminating
    the overhead of ABCs (which will prevent them from getting a bad
    performance reputation) is worth some C.

    @mdickinson
    Copy link
    Member

    Two things:

    (1) Speedup. I haven't been helping much here; apologies. Christian
    suggested that a C implementation of gcd might help. Is this true, or
    are we not yet at the stage where the gcd calls are significant? There
    are some basic tricks that can speed up the usual Euclidean algorithm by
    a constant factor, and I'll try to implement them if that's of interest
    (unless Christian beats me to it). There also some serious super-fast
    recursive gcd algorithms, with running time O(n**(1+epsilon)) for a pair
    of n-bit integers, but those are complicated and probably best left to
    GMP.

    (2) PEP-3101: Should Fraction get a __format__ method? I'm looking at
    implementing Decimal.__format__, and I think there's quite likely to be
    a fair amount of common code (e.g. for parsing the format specifier)
    between Decimal.__format__ and Fraction.__format__, so it would probably
    make sense for me to do both of these at once.

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Feb 18, 2008

    1. No worries. Even without inlining the common case of __add__, etc.,
      Fraction is now faster than Decimal for smallish fractions
      [sum(Fraction(1, i) for i in range(1, 100))], and for large ones
      [sum(Fraction(1, i) for i in range(1, 1000))] gcd takes so much of the
      time that I can't see the effects of any of the optimizations I've made.
      Since I wasn't thinking of this as a high-performance class, I'm taking
      that as a sign to stop optimizing, but gcd is definitely the function to
      tackle if anyone wants to keep going.

    2. I haven't been following __format__, but adding it to Rational sounds
      fine to me.

    @gvanrossum
    Copy link
    Member

    A C reimplementation of gcd probably makes little sense -- for large
    numbers it is dominated by the cost of the arithmetic, and that will
    remain the same.

    @mdickinson
    Copy link
    Member

    A C reimplementation of gcd probably makes little sense -- for large
    numbers it is dominated by the cost of the arithmetic, and that will
    remain the same.

    That's true for a direct translation of the usual algorithm. But
    there's a variant due to Lehmer which reduces the number of multi-
    precision operations, and dramatically reduces the number of full-
    precision divmod operations---they're mostly replaced by n-limb-by-1-
    limb multiplications and full-precision additions. I'd expect at least
    a factor of 2 speedup from using this; possibly more. Of course it's
    impossible to tell without implementing it.

    I've attached a Python version; note that:

    • the operations to find x and y at the start of the outer while loop
      are simple lookups when written in C
    • the else branch is taken rarely: usually once at the beginning of
      any gcd() calculation, and then less than 1 time in 20000 on average
      during the reduction.
    • all the arithmetic in the inner while loop is done with numbers <=
      2**15 in absolute value.

    Mark

    @gvanrossum
    Copy link
    Member

    I think this is definitely premature optimization. :-)

    In any case, before going any further, you should design a benchmark
    and defend it.

    @mdickinson
    Copy link
    Member

    Replacing lehmer_gcd.py with a revised version. Even in Python, this
    version is faster than the current one, on my machine, once both numbers
    are greater than 10**650 or so (your crossover points may vary). It's
    over four times faster for very large inputs (over 10**20000).

    In any case, before going any further, you should design a benchmark
    and defend it.

    Okay. I'll stop now :)

    @mdickinson
    Copy link
    Member

    Okay. I'll stop now :)

    So I lied. In a spirit of enquiry, I implemented math.gcd, and wrote a
    script (lehmer_gcd.py, attached) to produce some relative timings. Here
    are the results, on my MacBook Pro (OS X 10.5):

    An explanation of the table: I'm computing gcd(a, b) for randomly
    chosen a and b with a_digits and b_digits decimal digits, respectively.
    The times in the 5th and 6th columns are in seconds; the last column
    gives the factor by which math.gcd is faster than the direct Euclidean
    Python implementation.

    Even for quite small a and b, math.gcd(a, b) is around 10 times faster
    than its Python counterpart. For large, roughly equal-sized, a and b,
    the C version runs over 30 times faster.

    The smallest difference occurs when the two arguments are very different
    sizes; then both versions of gcd essentially do one time-consuming
    divmod, followed by a handful of tiny operations, so they take
    essentially the same time. For Fraction, the arguments will tend to be
    around equal size for many applications: this assumes that the actual
    real number represented by the Fraction is within a sensible range, even
    when the numerator and denominator have grown huge.

    For random a and b, gcd(a, b) tends to be very small. So for the last
    few entries in the table, the gcds have been artifically inflated (by
    calling gcd(g*a, g*b) for some largish g).

    On to the results.

    a_digits b_digits g_digits ntrials C_Lehmer Py_Euclid Factor
    -------- -------- -------- ------- -------- --------- ------
    1 1 0 100000 0.066856 0.240867 3.602780
    2 2 0 100000 0.070256 0.314639 4.478466
    4 4 0 100000 0.075708 0.466596 6.163087
    7 7 0 100000 0.082714 0.681443 8.238537
    10 10 0 10000 0.022336 0.318501 14.259532
    20 20 0 10000 0.045209 0.752662 16.648436
    50 50 0 10000 0.128269 2.167890 16.901128
    100 100 0 1000 0.028053 0.511769 18.242906
    1000 1000 0 1000 0.709453 18.867336 26.594197
    10000 10000 0 100 4.215795 148.597223 35.247736

    Lopsided gcds
    -------------
    20 100 0 100 0.000889 0.007659 8.616953
    100 20 0 100 0.000887 0.007665 8.642473
    1000 3 0 100 0.000727 0.001387 1.908167
    3 1000 0 100 0.000754 0.001457 1.932027
    10000 1 0 100 0.004689 0.004948 1.055219
    1 10000 0 100 0.004691 0.005038 1.073948
    500 400 0 100 0.020289 0.388080 19.127665
    400 500 0 100 0.020271 0.389301 19.204768

    Large gcd
    ---------
    10 10 10 1000 0.005235 0.043507 8.310880
    20 20 20 1000 0.008505 0.095732 11.256167
    100 100 100 1000 0.041734 0.812656 19.472284

    @mdickinson
    Copy link
    Member

    And here's the diff for math.gcd. It's not production quality---it's just
    there to show what's possible.

    @mdickinson
    Copy link
    Member

    I'm wondering what there is left to do here. Are we close to being able
    to close this issue? Some thoughts:

    (1) I think the docs could use a little work (just expanding, and giving
    a few examples). But it doesn't seem urgent that this gets done before
    the next round of alphas.

    (2) Looking at it, I'm really not sure that implementing
    Rational.__format__ is worthwhile; it's not obvious that we need a way
    to format a Rational as a float. So I'm -0 on this. If others think
    Rational should have a __format__ method I'm still happy to implement
    it.

    Is there much else that needs to be done here?

    @jyasskin
    Copy link
    Mannequin Author

    jyasskin mannequin commented Feb 29, 2008

    I agree that we're basically done here. I'm back to -1 on inlining the
    common case for arithmetic (attached here anyway). Simple cases are
    already pretty fast, and bigger fractions are dominated by gcd time, not
    function call overhead. Since duplicating the definitions of arithmetic
    will make it harder to do tricky things that shrink the arguments to
    gcd(), we shouldn't do it.

    I was always +0 to __format__, so not doing it is fine with me.

    Thanks for everyone's help!

    @jyasskin jyasskin mannequin closed this as completed Feb 29, 2008
    @mdickinson
    Copy link
    Member

    I'm reopening this to propose a last-minute minor change:

    Can we remove the trailing 'L' from the numerator and denominator in
    the repr of a Fraction instance? E.g.,

    >>> x = Fraction('9.876543210')
    >>> x
    Fraction(987654321L, 100000000L)
    >>> x * (1/x)
    Fraction(1L, 1L)

    I'd rather see Fraction(987654321, 100000000) and Fraction(1, 1). Does
    the 'L' serve any useful purpose?

    @mdickinson mdickinson reopened this Jun 25, 2008
    @rhettinger
    Copy link
    Contributor

    +1 on removing the L. Also, it would be nice if reduced fractions were
    restored to ints after the gcd step:

    >>> Fraction(1*10**18, 3*10**18)
    Fraction(1L, 3L)

    @gvanrossum
    Copy link
    Member

    +1 on removing the trailing L from the repr.

    +0 on trying to reduce the values to ints; that would be dead-end code
    since in 3.0 it's a non-issue. And it doesn't solve the problem with repr.

    @mdickinson
    Copy link
    Member

    Trailing 'L's removed in r64557.

    A couple more minor issues:

    (1) fractions.gcd is exported but not documented. I'll add some
    documentation for it, unless anyone feels that it shouldn't have been
    exported in the first place. If so, please speak up!

    (2) The Fraction constructor is a little more lenient than it ought to
    be. For example:

    >>> Fraction('1.23', 1)
    Fraction(123, 100)

    I think this should really be a TypeError: a second argument to the
    constructor should only be permitted when the first argument is an
    integer. However, it seems fairly harmless, so I'm inclined to just
    leave this as it is and not mess with it.

    @mdickinson
    Copy link
    Member

    Hmm. I don't much like this, though:

    >>> Fraction('1.23', 1.0)
    Fraction(123, 100)
    >>> Fraction('1.23', 1+0j)
    Fraction(123, 100)

    I'd say these should definitely be TypeErrors.

    @gvanrossum
    Copy link
    Member

    I think it's okay to accept Fraction('1.23', 1) -- if only because
    rejecting it means changing the default for the 2nd arg to None and that
    would slow it down again.

    I agree that if the type of the 2nd arg isn't int or long it should be
    rejected. That should not slow down the common path (two ints).

    @mdickinson
    Copy link
    Member

    I agree that if the type of the 2nd arg isn't int or long it should be
    rejected. That should not slow down the common path (two ints).

    Sounds good. Some care might be needed to avoid invalidating
    Fraction(n, d) where n and d are instances of numbers.Integral but not of
    type int or long.

    @mdickinson
    Copy link
    Member

    I agree that if the type of the 2nd arg isn't int or long it should be
    rejected. That should not slow down the common path (two ints).

    I'm having second thoughts about this; it doesn't seem worth adding an
    extra check that has to be applied every time a Fraction is created from a
    string (or another Rational instance). The condition being checked for (a
    denominator equal to 1, but not of type int or long) is unlikely to occur
    in practice, and fairly harmless even if it does occur. So I'm thinking
    that PBP says leave it alone.

    @gvanrossum
    Copy link
    Member

    That's fine.

    @mdickinson
    Copy link
    Member

    Well, I can't find anything more to fuss about here. :-)

    Reclosing.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants