-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix isless and cmp/lexcmp for floating point
for now cmp() uses the total ordering, but we might change it to give a DomainError for NaN arguments
- Loading branch information
1 parent
756a34a
commit 2343ba0
Showing
6 changed files
with
52 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref #5290
2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think you need to write up what the logic behind these comparison operators is. I lost track when
lexcmp
was introduced, which I still am not entirely happy with.2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as I said in the comments, this very well may not be the final version.
lexcmp
was introduced much later thancmp
, and we might have handled floats differently if they'd been introduced at the same time.We had
cmp
fall back toisless
, meaningcmp
uses the standard total order. I kept this behavior in this commit, for now.There are accurate descriptions of all of these in the help text. We have
<
cmp
,isless
lexcmp
,lexless
Our total order on floats (
isless
) is not IEEE-compliant, because of -NaN. This doesn't bother me too much and I don't think it's crucial to change it.But, we have a few options:
cmp
mean lexicographic total order, and get rid oflexcmp
cmp
in terms of<
, and give aDomainError
on NaNs, but allow lexcmp to work on NaNsisless
as it is now, but havelexcmp
andlexless
implement the IEEE order2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a decision is made, the description should go in the manual together with an explanation of the differences between
==
,isequal
andis/===
.2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gitfoxi I appreciate the thoughts on performance, but correctness has to come first. Obviously being fast and correct is ideal, but being fast and wrong is just pointless.
I believe it's the arithmetic comparison that can compile down to a single instruction, not the lexical one which is being corrected here.
2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not as bad as all that. Our
<
et. al. are what you typically use in numerical code, and they are of the single-instruction variety. In sorting, you have to deal with NaNs specially anyway, so we have careful custom sorting for float types. In fact even the typical float<
is not optimal, and we use integer comparisons instead. But the way this sorting algorithm behaves, at the end of the day, corresponds toisless
.2343ba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot actually use the standard IEEE 754 float comparison operations to correctly sort floating-point numbers. If you use IEEE ordering to sort floating-point numbers, most comparison-based algorithms will be incorrect because they depend on the sorted values being totally ordered. The presence of unordered elements like NaN completely screws things up because
!(x < y)
does not implyy <= x
. Moreover, your negative and positive zeros will be arbitrarily mixed together, which is not desirable.To correctly sort floating-point numbers you have to use a total ordering. One such ordering is specified by IEEE 754: negative NaNs, then negative values from -Inf to -0.0, then positive values from 0.0 to Inf, then positive NaNs. Matlab uses a different ordering: negative values from -Inf to -0.0, then positive values from 0.0 to Inf, and then all NaNs, regardless of their sign or payload – in fact, the sign and payload of NaNs will be destroyed by sorting. Neither of these total orderings is implemented in hardware that I'm aware of. Moreover, expressing either of these total orderings is pretty complicated. We implement
isless(x::Float64, y::Float64)
using an LLVM intrinsic like so:This translates approximately to
where
xi
andyi
arex
andy
respectively reinterpreted as integers. Although this is actually faster than one might expect, fortunately to do actual sorting of floating-point numbers, we don't use this operation at all. Instead, we do the following:This is all implemented in base/sort.jl and composes transparently with arbitrary sorting algorithms. The end-result is that floating-point arrays are sorted as if they used the
isless
operation with a stable sorting algorithm. The stability matters because NaNs are all equivalent according toisless
but they are actually programmatically distinguishable. Moreover, we want sorting an untyped array that happens to only hold floats to end up in the same order as a typed array of floats. Since sorting an untyped array would use a stable sort and theisless
operation, the NaNs end up at the end of the untyped array in the same order they appeared in the original array. That's why we do step 1 above that way. We verify that our sorting is stable for NaNs by generating NaNs with random payloads and sorting them.The end result of all of this is that our floating-point sorting is fast, correct, and consistent, regardless of the type of array that's sorted.