Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119372: recover inf's and zeros in _Py_c_quot #119457

Merged
merged 8 commits into from
Jun 29, 2024

Conversation

skirpichev
Copy link
Member

@skirpichev skirpichev commented May 23, 2024

In some cases, previosly computed as (nan+nanj), we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd():

>>> from cmath import inf, infj, nanj
>>> (1+1j)/(inf+infj)  # was (nan+nanj)
0j
>>> (inf+nanj)/complex(2**1000, 2**-1000)
(inf-infj)

In some cases, previosly computed as (nan+nanj), we could
recover meaningful component values in the result, see
e.g. the C11, Annex G.5.2, routine _Cdivd():

>>> from cmath import inf, infj, nanj
>>> (1+1j)/(inf+infj)  # was (nan+nanj)
0j
>>> (inf+nanj)/complex(2**1000, 2**-1000)
(inf-infj)
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Lib/test/test_complex.py Outdated Show resolved Hide resolved
# test recover of infs if numerator has infinities and denominator is finite
self.assertComplexesAreIdentical(complex('(inf-infj)')/(1+0j), complex('(inf-infj)'))
self.assertComplexesAreIdentical(complex('(inf+infj)')/(0.0+1j), complex('(inf-infj)'))
self.assertComplexesAreIdentical(complex('(nan+infj)')/complex(2**1000, 2**-1000), complex('(inf+infj)'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, how did you came to this test? Its result looks paradoxical at first glance, but makes sense after thinking about it more.

Copy link
Member Author

@skirpichev skirpichev May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just a coverage test, that triggers underflow in ratio. The hard part, as you noted, was to figure out why it's true. I've just checked the exact answer in the Diofant (probably will work in the SymPy too), e.g. for the real part:

In [43]: x = Symbol('x', extended_real=True)
In [44]: y = Symbol('y', positive=True, finite=True)
In [45]: expr = re(expand((x + I*oo)*(y-I/y)))
In [46]: expr.subs({x: Dummy(real=True)})
Out[46]: ∞
In [47]: expr.subs({x: Dummy(positive=True)})
Out[47]: ∞
In [48]: limit(expr, x, -oo)
Out[48]: ∞

BTW, Mathematica does more brave things, but hardly it's correct:

In[14]:= Hold[(x + I*Infinity)*(2^10-I/2^10)]  (* no assumptions on x ! *)
                                 10    I
Out[14]= Hold[(x + I Infinity) (2   - ---)]
                                       10
                                      2
In[15]:= ReleaseHold[%]

         (1 + 1048576 I) Infinity
Out[15]= ------------------------
           Sqrt[1099511627777]

In[16]:= ReleaseHold[%14/.{x->1-I*Infinity}]

Infinity::indet: Indeterminate expression -I Infinity + I Infinity encountered.

Out[16]= Indeterminate

However, it looks I should restore remark about corner cases. Not all are handled by this code (have no chance, because one component computed as non-nan). Here is an example:

>>> complex('(1+infj)')/complex(2**1000, 2**-1000)
(nan+infj)

This is correctly handled by C code both for gcc and clang, but it turns they use a different algorithm for tdiv (not Smith's method, rather something like Priest's algorithm; see full _Cdivd() code before handling nan+nanj case). Maybe it worth to adopt this, but probably it's out of the scope of this pr.

Edit:
An example, where current algorithm fails (bad imaginary part) is $(2^{1023}+i 2^{-1023})/(2^{677}+i 2^{-677})\approx 2^{346}-i 2^{-1008}$ (taken from M.Baudin: "A Robust Complex Division in Scilab"). gcc v12.2 (and, probably, clang), that follows to example in C standard, does produce the correct answer.

example code and "fixed" Smith's algorithm (uses extended precision to compute ratio)
#include <complex.h>
#include <stdio.h>
#include <math.h>

/* CPython's version */
complex
tdiv_smith(complex a, complex b)
{
    const double abs_breal = fabs(creal(b));
    const double abs_bimag = fabs(cimag(b));
    if (abs_breal >= abs_bimag) {
        if (abs_breal != 0.0) {
            const double ratio = cimag(b) / creal(b);
            const double denom = creal(b) + cimag(b) * ratio;
            return CMPLX((creal(a) + cimag(a) * ratio) / denom,
                         (cimag(a) - creal(a) * ratio) / denom);
        }
    }
    else if (abs_bimag >= abs_breal) {
        const double ratio = creal(b) / cimag(b);
        const double denom = creal(b) * ratio + cimag(b);
        return CMPLX((creal(a) * ratio + cimag(a)) / denom,
                     (cimag(a) * ratio - creal(a)) / denom);
    }
    return CMPLX(NAN, NAN);
}

complex
tdiv_smith2(complex a, complex b)
{
    const double abs_breal = fabs(creal(b));
    const double abs_bimag = fabs(cimag(b));
    if (abs_breal >= abs_bimag) {
        if (abs_breal != 0.0) {
            const long double ratio = (long double) cimag(b) / (long double) creal(b);
            const double denom = creal(b) + cimag(b) * ratio;
            return CMPLX((creal(a) + cimag(a) * ratio) / denom,
                         (cimag(a) - creal(a) * ratio) / denom);
        }
    }
    else if (abs_bimag >= abs_breal) {
        const long double ratio = (long double) creal(b) / (long double) cimag(b);
        const double denom = creal(b) * ratio + cimag(b);
        return CMPLX((creal(a) * ratio + cimag(a)) / denom,
                     (cimag(a) * ratio - creal(a)) / denom);
    }
    return CMPLX(NAN, NAN);
}

int
main()
{
    complex a = CMPLX(0x1p+1023, 0x1p-1023);
    complex b = CMPLX(0x1p677, 0x1p-677);
    complex z;
    z = a / b;
    printf("%e%+ei\n", creal(z), cimag(z));
    z = tdiv_smith(a, b);
    printf("%e%+ei\n", creal(z), cimag(z));
    z = tdiv_smith2(a, b);
    printf("%e%+ei\n", creal(z), cimag(z));
    return 0;
}
$ cc -std=c11 a.c -lm && ./a.out
1.433437e+104-3.645561e-304i
1.433437e+104+0.000000e+00i
1.433437e+104-3.645561e-304i

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite odd to see NaNs disappearing in an arithmetic operation on operands that include NaNs. However, that seems unavoidable if we want to follow the C standard Annex G semantics, since its rules imply that complex(inf, nan) should be regarded as a complex infinity, and then the division of a finite complex number by a complex infinity should be a complex zero (so can't have any NaN components).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite odd to see NaNs disappearing in an arithmetic operation on operands that include NaNs.

I think that could be reasonably interpreted only as shown above, i.e. nan in components means "anything from the extended real line".

since its rules imply that complex(inf, nan) should be regarded as a complex infinity

I don't think there is such thing in the C standard. It has two types of infinities: when one component is finite, nonzero and other is infinite; other type (with few cases) is a directed infinity (phase is finite and absolute value is infinite). I.e. (1+infj) and (inf+infj). That brings an interesting clash with CAS like Mathematica, that has no notion like the first type if infinities. E.g. ArcSin[Infinity] is just -I Infinity c.f. cmath.asin(). See mpmath/mpmath#793

But CPython already adopts Annex G semantics in this respect, via cmath.isinf() and cmath.isfinite().

@mdickinson, what do you think about changing Smith's algorithm to something more closer to C standard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skirpichev

I don't think there is such thing in the C standard.

Take a look at Annex G. Quoting from §G.3 "Conventions" in C99, paragraph 1:

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

The language is slightly changed in the most recent publicly available draft of C23 (n3096.pdf):

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a quiet NaN).

Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, and the new addition looks clean. This brings us in line with C's Annex G and the explicit requirements in §G.5.1p4 (n3096.pdf).

I'm a bit worried about backwards compatibility. This is admittedly a corner case, and I agree that the new semantics are better than the old ones, but it's also changing behaviour that's been there for decades.

I'm not too worried that there's "real" code out there whose mathematical correctness relies on the old division semantics - that seems unlikely. But it does seem likely that this change could break code that's relying on reproducing the Python semantics. The most obvious example would be the behaviour of the NumPy complex types - the NumPy folks probably have tests that ensure that complex128 behaves the same as complex in corner cases.

It may be worth checking the NumPy behaviour here - if complex128 currently has the same behaviour as Python's complex in these corner cases, then we could open an issue on the NumPy tracker alerting them to the upcoming change and giving them the chance to complain.

Objects/complexobject.c Show resolved Hide resolved
Objects/complexobject.c Outdated Show resolved Hide resolved
@picnixz
Copy link
Contributor

picnixz commented May 28, 2024

My 2 cents:

the NumPy folks probably have tests that ensure that complex128 behaves the same as complex in corner cases.

Here are some relevant places where the complex128 tests take place:

https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_umath.py#L632

(+ below and above code)

Most of the basic operations tests are in https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_umath.py but the corner cases being tested are not as exhaustive as here.

Technically, if the corner cases handling changes, then NumPy (by default) would just raise errors. However, with different error handler flags, you indeed might have breaking changes where a user code would incorrectly check that something is nan but not check that something is inf (or vice-versa). I think the more plausible breaking case is (1+1j)/(cmath.inf+cmath.infj) which currently gives nan+nanj instead of 0.

@skirpichev
Copy link
Member Author

@mdickinson, I'm not sure that numpy people are trying to reproduce CPython complex behaviour, here is the code:
https://github.com/numpy/numpy/blob/012e63d8b04636bc8e0a7b05312e5a8cc985f727/numpy/_core/src/npymath/npy_math_complex.c.src#L92-L120
Yes, they are using Smith's algorithm. But that's all. In some "corner cases" behaviour already different:

>>> np.__version__
'1.24.4'
>>> np.complex128(1)/np.complex128(0)
<stdin>:1: RuntimeWarning: divide by zero encountered in scalar divide
<stdin>:1: RuntimeWarning: invalid value encountered in scalar divide
(inf+nanj)

we could open an issue on the NumPy tracker alerting them to the upcoming change and giving them the chance to complain.

FYI: numpy/numpy#26560

@skirpichev
Copy link
Member Author

@serhiy-storchaka, it seems Mark OK with this change. We have numpy issue with pr, do we need something else?

@serhiy-storchaka serhiy-storchaka merged commit 2cb84b1 into python:main Jun 29, 2024
36 checks passed
@skirpichev skirpichev deleted the fix-complex-tdiv-119372 branch June 29, 2024 08:34
mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
In some cases, previously computed as (nan+nanj), we could
recover meaningful component values in the result, see
e.g. the C11, Annex G.5.2, routine _Cdivd().
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
In some cases, previously computed as (nan+nanj), we could
recover meaningful component values in the result, see
e.g. the C11, Annex G.5.2, routine _Cdivd().
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
In some cases, previously computed as (nan+nanj), we could
recover meaningful component values in the result, see
e.g. the C11, Annex G.5.2, routine _Cdivd().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants