-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119372: recover inf's and zeros in _Py_c_quot #119457
gh-119372: recover inf's and zeros in _Py_c_quot #119457
Conversation
In some cases, previosly computed as (nan+nanj), we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd(): >>> from cmath import inf, infj, nanj >>> (1+1j)/(inf+infj) # was (nan+nanj) 0j >>> (inf+nanj)/complex(2**1000, 2**-1000) (inf-infj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Lib/test/test_complex.py
Outdated
# test recover of infs if numerator has infinities and denominator is finite | ||
self.assertComplexesAreIdentical(complex('(inf-infj)')/(1+0j), complex('(inf-infj)')) | ||
self.assertComplexesAreIdentical(complex('(inf+infj)')/(0.0+1j), complex('(inf-infj)')) | ||
self.assertComplexesAreIdentical(complex('(nan+infj)')/complex(2**1000, 2**-1000), complex('(inf+infj)')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, how did you came to this test? Its result looks paradoxical at first glance, but makes sense after thinking about it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just a coverage test, that triggers underflow in ratio
. The hard part, as you noted, was to figure out why it's true. I've just checked the exact answer in the Diofant (probably will work in the SymPy too), e.g. for the real part:
In [43]: x = Symbol('x', extended_real=True)
In [44]: y = Symbol('y', positive=True, finite=True)
In [45]: expr = re(expand((x + I*oo)*(y-I/y)))
In [46]: expr.subs({x: Dummy(real=True)})
Out[46]: ∞
In [47]: expr.subs({x: Dummy(positive=True)})
Out[47]: ∞
In [48]: limit(expr, x, -oo)
Out[48]: ∞
BTW, Mathematica does more brave things, but hardly it's correct:
In[14]:= Hold[(x + I*Infinity)*(2^10-I/2^10)] (* no assumptions on x ! *)
10 I
Out[14]= Hold[(x + I Infinity) (2 - ---)]
10
2
In[15]:= ReleaseHold[%]
(1 + 1048576 I) Infinity
Out[15]= ------------------------
Sqrt[1099511627777]
In[16]:= ReleaseHold[%14/.{x->1-I*Infinity}]
Infinity::indet: Indeterminate expression -I Infinity + I Infinity encountered.
Out[16]= Indeterminate
However, it looks I should restore remark about corner cases. Not all are handled by this code (have no chance, because one component computed as non-nan). Here is an example:
>>> complex('(1+infj)')/complex(2**1000, 2**-1000)
(nan+infj)
This is correctly handled by C code both for gcc and clang, but it turns they use a different algorithm for tdiv (not Smith's method, rather something like Priest's algorithm; see full _Cdivd() code before handling nan+nanj case). Maybe it worth to adopt this, but probably it's out of the scope of this pr.
Edit:
An example, where current algorithm fails (bad imaginary part) is
example code and "fixed" Smith's algorithm (uses extended precision to compute ratio)
#include <complex.h>
#include <stdio.h>
#include <math.h>
/* CPython's version */
complex
tdiv_smith(complex a, complex b)
{
const double abs_breal = fabs(creal(b));
const double abs_bimag = fabs(cimag(b));
if (abs_breal >= abs_bimag) {
if (abs_breal != 0.0) {
const double ratio = cimag(b) / creal(b);
const double denom = creal(b) + cimag(b) * ratio;
return CMPLX((creal(a) + cimag(a) * ratio) / denom,
(cimag(a) - creal(a) * ratio) / denom);
}
}
else if (abs_bimag >= abs_breal) {
const double ratio = creal(b) / cimag(b);
const double denom = creal(b) * ratio + cimag(b);
return CMPLX((creal(a) * ratio + cimag(a)) / denom,
(cimag(a) * ratio - creal(a)) / denom);
}
return CMPLX(NAN, NAN);
}
complex
tdiv_smith2(complex a, complex b)
{
const double abs_breal = fabs(creal(b));
const double abs_bimag = fabs(cimag(b));
if (abs_breal >= abs_bimag) {
if (abs_breal != 0.0) {
const long double ratio = (long double) cimag(b) / (long double) creal(b);
const double denom = creal(b) + cimag(b) * ratio;
return CMPLX((creal(a) + cimag(a) * ratio) / denom,
(cimag(a) - creal(a) * ratio) / denom);
}
}
else if (abs_bimag >= abs_breal) {
const long double ratio = (long double) creal(b) / (long double) cimag(b);
const double denom = creal(b) * ratio + cimag(b);
return CMPLX((creal(a) * ratio + cimag(a)) / denom,
(cimag(a) * ratio - creal(a)) / denom);
}
return CMPLX(NAN, NAN);
}
int
main()
{
complex a = CMPLX(0x1p+1023, 0x1p-1023);
complex b = CMPLX(0x1p677, 0x1p-677);
complex z;
z = a / b;
printf("%e%+ei\n", creal(z), cimag(z));
z = tdiv_smith(a, b);
printf("%e%+ei\n", creal(z), cimag(z));
z = tdiv_smith2(a, b);
printf("%e%+ei\n", creal(z), cimag(z));
return 0;
}
$ cc -std=c11 a.c -lm && ./a.out
1.433437e+104-3.645561e-304i
1.433437e+104+0.000000e+00i
1.433437e+104-3.645561e-304i
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite odd to see NaNs disappearing in an arithmetic operation on operands that include NaNs. However, that seems unavoidable if we want to follow the C standard Annex G semantics, since its rules imply that complex(inf, nan)
should be regarded as a complex infinity, and then the division of a finite complex number by a complex infinity should be a complex zero (so can't have any NaN components).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite odd to see NaNs disappearing in an arithmetic operation on operands that include NaNs.
I think that could be reasonably interpreted only as shown above, i.e. nan
in components means "anything from the extended real line".
since its rules imply that complex(inf, nan) should be regarded as a complex infinity
I don't think there is such thing in the C standard. It has two types of infinities: when one component is finite, nonzero and other is infinite; other type (with few cases) is a directed infinity (phase is finite and absolute value is infinite). I.e. (1+infj)
and (inf+infj)
. That brings an interesting clash with CAS like Mathematica, that has no notion like the first type if infinities. E.g. ArcSin[Infinity]
is just -I Infinity
c.f. cmath.asin()
. See mpmath/mpmath#793
But CPython already adopts Annex G semantics in this respect, via cmath.isinf()
and cmath.isfinite()
.
@mdickinson, what do you think about changing Smith's algorithm to something more closer to C standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is such thing in the C standard.
Take a look at Annex G. Quoting from §G.3 "Conventions" in C99, paragraph 1:
A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).
The language is slightly changed in the most recent publicly available draft of C23 (n3096.pdf):
A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a quiet NaN).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, and the new addition looks clean. This brings us in line with C's Annex G and the explicit requirements in §G.5.1p4 (n3096.pdf).
I'm a bit worried about backwards compatibility. This is admittedly a corner case, and I agree that the new semantics are better than the old ones, but it's also changing behaviour that's been there for decades.
I'm not too worried that there's "real" code out there whose mathematical correctness relies on the old division semantics - that seems unlikely. But it does seem likely that this change could break code that's relying on reproducing the Python semantics. The most obvious example would be the behaviour of the NumPy complex types - the NumPy folks probably have tests that ensure that complex128
behaves the same as complex
in corner cases.
It may be worth checking the NumPy behaviour here - if complex128
currently has the same behaviour as Python's complex
in these corner cases, then we could open an issue on the NumPy tracker alerting them to the upcoming change and giving them the chance to complain.
My 2 cents:
Here are some relevant places where the complex128 tests take place: https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_umath.py#L632 (+ below and above code) Most of the basic operations tests are in https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_umath.py but the corner cases being tested are not as exhaustive as here. Technically, if the corner cases handling changes, then NumPy (by default) would just raise errors. However, with different error handler flags, you indeed might have breaking changes where a user code would incorrectly check that something is nan but not check that something is inf (or vice-versa). I think the more plausible breaking case is |
@mdickinson, I'm not sure that numpy people are trying to reproduce CPython complex behaviour, here is the code: >>> np.__version__
'1.24.4'
>>> np.complex128(1)/np.complex128(0)
<stdin>:1: RuntimeWarning: divide by zero encountered in scalar divide
<stdin>:1: RuntimeWarning: invalid value encountered in scalar divide
(inf+nanj)
FYI: numpy/numpy#26560 |
@serhiy-storchaka, it seems Mark OK with this change. We have numpy issue with pr, do we need something else? |
In some cases, previously computed as (nan+nanj), we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd().
In some cases, previously computed as (nan+nanj), we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd().
In some cases, previously computed as (nan+nanj), we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd().
In some cases, previosly computed as
(nan+nanj)
, we could recover meaningful component values in the result, see e.g. the C11, Annex G.5.2, routine _Cdivd():