Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove lexless and lexcmp (was: behavior of cmp() on NaNs?) #5290

Closed
jiahao opened this issue Jan 3, 2014 · 18 comments
Closed

remove lexless and lexcmp (was: behavior of cmp() on NaNs?) #5290

jiahao opened this issue Jan 3, 2014 · 18 comments
Assignees
Labels
deprecation This change introduces or involves a deprecation maths Mathematical functions
Milestone

Comments

@jiahao
Copy link
Member

jiahao commented Jan 3, 2014

1. Inconsistent logic between lexcmp and nextfloat

julia> nextfloat(Inf)
NaN

julia> lexcmp(Inf,NaN)
ERROR: InexactError()
 in lexcmp at operators.jl:32

julia> lexcmp(Inf,Inf)
ERROR: InexactError()
 in lexcmp at operators.jl:32

Given the definition of lexcmp according to the help text, throwing an error is the right thing to do given that >, == and < are all supposed to be false when either operand is a NaN and so none of the standard return values (1, 0, -1) can be correct. However, it probably shouldn't throw an InexactError, but something else instead (probably DomainError).

In contrast, I don't see a meaningful sense in which NaN is the "next" number after Inf. The standard rule for computing with floating-point Infs is to interpret it as a limit, in which case the correct result of nextfloat(Inf) should be Inf and not NaN (and similarly prevfloat(-Inf) returning -Inf. (Going by this rule, lexcmp(Inf,Inf) does correctly fail since there is a coalescence of x<x+eps(), x==x+0 and x>x-eps() as x->Inf, so that there is no single limiting value.)

2. Inconsistent behavior of nextfloat and prevfloat at different precisions

If we take the argument in the preceding paragraph to be correct, then in the following, only nextfloat(Inf16) is correct:

 julia> nextfloat(Inf) #should be Inf
NaN

julia> nextfloat(Inf32) #should be Inf32
NaN32

julia> nextfloat(Inf16) #should be Inf16
Inf16

julia> prevfloat(-Inf) #should be -Inf
NaN

julia> prevfloat(-Inf32) #should be -Inf32
NaN32

julia> prevfloat(-Inf16) #should be -Inf16
Inf16
JeffBezanson added a commit that referenced this issue Jan 3, 2014
@JeffBezanson
Copy link
Sponsor Member

In my view, lexcmp implements a total order, so ordinary NaN semantics don't apply. And, for example, we sort NaNs after everything else.

@JeffBezanson
Copy link
Sponsor Member

Also, currently cmp and lexcmp are the same for types with a canonical total order, including floating point types. But maybe cmp should require the arguments to be non-NaN, leaving the more permissive behavior to lexcmp.

@jiahao
Copy link
Member Author

jiahao commented Jan 3, 2014

What you just wrote seems to be the opposite of what is implemented, since it sounds like you want lexcmp(Inf,NaN) == -1. In contrast nextfloat and prevfloat have names that suggest standards-compliant floating-point behavior, but don't behave like logical consequences of < and > on floats.

I'm fine with having non-canonical behavior, but at the very least the functions that will do so should be documented.

@JeffBezanson
Copy link
Sponsor Member

IEEE 754 defines a standard total order on floats where NaN is greater than Inf, so isless(Inf,NaN)==true and lexcmp(Inf,NaN)==-1 are correct.

Unfortunately the IEEE 754 order also puts -NaN less than everything, which is not how we prefer to sort things. Normally the sign of NaN is not interpreted, but has specified behavior for copy, negate, abs, copySign, and totalOrder. I would argue that making all NaNs isequal is much more useful though; the only functions that might distinguish them are the comparison functions themselves.

IEEE 754 does specify that nextUp(Inf) == Inf, as you say, so we should probably change that.

@jiahao
Copy link
Member Author

jiahao commented Jan 3, 2014

So is the claim that lexcmp on two floating point operands should implement the totalOrder of IEEE 754-2008, Sec. 5.10? It sounds like the ordering you would get by casting all the bits to a signed integer of the same bitsize and ordering those.

If the answer to my question is 'yes', then there are further behaviors that need fixing:

julia> lexcmp(-0.0,0.0) #c.f. IEEE 754-2008 5.10.c.1: "totalOrder(−0, +0) is true."
0

julia> lexcmp(-0.0,float32(-0.0)) #??? I don't think the standard says anything about comparing different precisions - just let type promotion handle this?
0

And then there is the business of total ordering numerical equivalent bit patterns...

JeffBezanson referenced this issue Jan 3, 2014
for now cmp() uses the total ordering, but we might change it to give a
DomainError for NaN arguments
@StefanKarpinski
Copy link
Sponsor Member

This should be addressed at the same time as we change isless and hash for numbers. Any changes before then are just likely to have to change yet again.

@simonbyrne
Copy link
Contributor

FWIW, future C standards TS 18661-1:2014 (free draft) will offer a totalorder function that matches the IEEE-754 totalOrder predicate.

@tkelman tkelman removed this from the 0.4.x milestone May 12, 2016
@StefanKarpinski StefanKarpinski added maths Mathematical functions design Design of APIs or of the language itself and removed needs decision A decision on this change is needed labels Sep 13, 2016
@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@StefanKarpinski
Copy link
Sponsor Member

The nextfloat and prevfloat behavior seems to have been fixed already.

Here's one option for the comparison functions:

  • cmp and should match isless behavior for floats
  • lexcmp should match lexless which should use isless for floats

@simonbyrne
Copy link
Contributor

simonbyrne commented Nov 10, 2016

Crazy idea

@enum Cmp lessthan=-1 equal=0 greaterthan=1 incomparable=7

cmp(x,y,lt=isless) = lt(x,y) ? lessthan : lt(y,x) ? greaterthan : equal   # assumes total ordering
cmp(x,y,::typeof(<)) = isnan(x) ||  isnan(y) ? incomparable : x < y ? lessthan : y < x ? greaterthan : equal # for floating point

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Nov 10, 2016

My premise is that this is used for sorting things, not numerical algorithms, so the isless behavior is the sane one to use.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 22, 2016

related? #9381

@vtjnash vtjnash removed this from the 0.6.0 milestone Dec 22, 2016
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Jan 3, 2017
@StefanKarpinski
Copy link
Sponsor Member

I think not much more design is needed here and my previous comment should just be implemented.

@StefanKarpinski StefanKarpinski added help wanted Indicates that a maintainer wants help on an issue or pull request and removed design Design of APIs or of the language itself labels Jul 20, 2017
@simonbyrne
Copy link
Contributor

Can we deprecate cmp in favour of lexcmp? Basically, is cmp ever used when lexcmp wouldn't be suitable?

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Aug 24, 2017

We could also remove lexless and lexcmp completely. lexless has one method that calls lexcmp. lexcmp has:

  • Methods for arrays. These could be isless instead.
  • Method for real that's the same as the default cmp definition.
  • Method for Complex --- this is the only one that we probably wouldn't combine with isless or cmp. We can just deprecated it; I doubt it's used much.

EDIT: Was partly quoting Stefan here; see his comment.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 24, 2017

Let's go the other direction:

  • Define isless for arrays lexicographically (i.e. what lexless does), similar to tuples – there's no other sensible ordering for arrays.
  • Deprecate lexless and lexcmp for arrays to isless and cmp.
  • Remove lexless and lexcmp for complex numbers. Sorting is the main issue here, which can be easily expressed with sort!(complexes, by=z->(real(z), real(im))).
  • Add a three-arg cmp methods taking a comparison function first, i.e. cmp(<, x, y) = x < y ? -1 : y < x ? +1 : 0.

@ViralBShah
Copy link
Member

Are we doing this by feature freeze? Seems like the design is ready, but there hasn't been much work since.

@ViralBShah ViralBShah added the triage This should be discussed on a triage call label Nov 19, 2017
@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Nov 20, 2017
@JeffBezanson JeffBezanson changed the title behavior of cmp() on NaNs? remove lexless and lexcmp (was: behavior of cmp() on NaNs?) Nov 20, 2017
@JeffBezanson JeffBezanson added deprecation This change introduces or involves a deprecation and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Nov 20, 2017
@JeffBezanson
Copy link
Sponsor Member

Deprecating lexcmp and lexless is straightforward enough; do we also still want to change cmp to use isless for floats?

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Jan 3, 2018

And should we also remove LexicographicOrdering? (I assume yes.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation maths Mathematical functions
Projects
None yet
Development

No branches or pull requests

7 participants