implement Base.hash(x::DecFP.DecimalFloatingPoint) #97

Jasha10 · 2019-07-25T04:55:37Z

We could use Base.hash(x::DecFP.DecimalFloatingPoint, h::UInt) = hash(x.x, h). But I have thought of a problem:
If x and y are of type Dec64, and if x == y, then we must be able to guarantee hash(x) == hash(y). Since the definition of Base.:(==)(x::Dec64, y::Dec64) makes a ccall to the Intel Decimal Floating-Point Math Library, I doubt that x == y will imply hash(x.x) == hash(y.x).
Any ideas about how we can provide a good hash function for use in Julia's Dict, Set, etc?

The text was updated successfully, but these errors were encountered:

Jasha10 · 2019-07-28T01:46:20Z

For example, we do have d64"-0.0" == d64"0.0" but we do not have hash(d64"-0.0".x) == hash(d64"0.0".x)

jmkuhn · 2019-07-28T15:57:30Z

For hashing we want isequal(x, y) to imply hash(x) == hash(y). See isequal for the differences vs == for floating point numbers. In particular:

julia> -0.0 == 0.0
true

julia> isequal(-0.0, 0.0)
false

julia> hash(-0.0) == hash(0.0)
false

So for -0.0/0.0 a simple implementation for hash() would already do the right thing. Where we would have problems is with:

julia> x = d64"1"
1.0

julia> y = d64"1.0"
1.0

julia> isequal(x, y)
true

julia> reinterpret(UInt64, x)
0x31c0000000000001

julia> reinterpret(UInt64, y)
0x31a000000000000a

In this case x is stored as 1e0 and y is stored as 10e-1. They compare as equal but would have different hashes. I'll look at the library to see if they offer any function to normalize these to the same representation for hashing.

stevengj · 2019-07-29T16:01:19Z

For hashing we want isequal(x, y) to imply hash(x) == hash(y)

This is tricky because it should ideally work for arguments of different numeric types. For example, Base gives:

julia> hash(3.0) == hash(3.0f0) == hash(3) == hash(3+0im)
true

Accomplishing this in a reasonably efficient way was only possible after a lot of effort by @StefanKarpinski (in JuliaLang/julia#6624).

Doing hash(x.x) for DecFP types, i.e. hashing the raw binary representation, would definitely break this.

stevengj · 2019-07-29T16:15:03Z

Right now the only thing I can think of is something like:

function Base.hash(x::Dec64, h::UInt)
    xf = Float64(x)
    isequal(xf, x) && return(xf, h)
    # ...convert to Rational{BigInt} and call hash(::Real, h) ...
end

Converting to Rational{BigInt} is, of course, crazy inefficient, but in principle I guess we could implement some version of the rational hashing algorithm that does this implicitly (one 64-bit word at a time).

StefanKarpinski · 2019-07-29T22:24:51Z

Here is what you need to implement:

https://github.com/JuliaLang/julia/blob/442d1599e62b1a3a453bd/base/hashing2.jl#L28-L94

If you implement Base.decompose that follows the rules described there, then you will automatically get correct hashing. Namely:

decompose(x): non-canonical decomposition of rational values as num*2^pow/den.

The decompose function is the point where rational-valued numeric types that support
hashing hook into the hashing protocol. decompose(x) should return three integer
values num, pow, den, such that the value of x is mathematically equal to

num*2^pow/den

The decomposition need not be canonical in the sense that it just needs to be some
way to express x in this form, not any particular way – with the restriction that
num and den may not share any odd common factors. They may, however, have powers
of two in common – the generic hashing code will normalize those as necessary.

Special values:

x is zero: num should be zero and den should have the same sign as x

x is infinite: den should be zero and num should have the same sign as x

x is not a number: num and den should both be zero

I don't really understand the details of the DecFP type, but I presume that it represents rational values of the form n*10^p where n and p are both integers. You want the following return value for Base.decompose, assuming that n and p have been extracted:

p ≥ 0 ? (n*5^p, p, 1) : (n, p, 5^-p)

You need to be careful that the return values can't overflow, of course. If n*5^p could overflow Int then you'll need a bigger integer type; similarly for 5^p (if p can have a larger negative size than positive, this could be a concern without the other happening). You'll also want to be careful of type stability between these two branches.

It may be possible to get even greater efficiency by implementing a more specialized version of Base.hash that short-circuits the use of decompose but it would have to behave as if the decompose that I've described has been used in order to be correct.

StefanKarpinski · 2019-07-29T22:26:29Z

Oh, you also may need to move powers of 5 from n in the negative p case.

Jasha10 · 2019-08-02T00:45:08Z

Thanks so much for your input, Stefan. After some thinking, I have come up with the following routine for extracting integer numbers n and p such that x==n*10^p:

using DecFP
precision(::Dec32)=7
precision(::Dec64)=16
precision(::Dec128)=34
function extract_n_and_p(x::DecFP.DecimalFloatingPoint)
    e = exponent10(x)
    p = e + 1 - precision(x)
    n = ldexp10(x, -p)
    return (n,p)
end

The precision function encodes the number of digits in the significand of the decimal floating-point number. These precision numbers are taken from Table 3.6 of the IEEE Standard 754-2008 (which is what the Intel library wrapped by DecFP.jl implements). According to the standard, computing n = ldexp10(x,-p) guarantees that n will be an integer (so long as x is finite and not NaN).

Regarding the possibility of integer overflow: we are guaranteed that n satisfies
-10^precision(x) <= n <= 10^precision(x),
where x is the DecFP number from which n is extracted. So if p is positive, we have to handle integers in the range
-10^precision(x) * 5^p <= n * 5^p <= 10^precision(x) * 5^p.
In general, we have pmin(x) <= p <= pmax(x) where pmax and pmin are given below:

pmax(::Dec32 ) =  90
pmin(::Dec32 ) = -87
pmax(::Dec64 ) =  369
pmin(::Dec64 ) = -366
pmax(::Dec128) =  6111
pmin(::Dec128) = -6108

So, for example, if x is of type Dec64 and the extracted exponent p is positive, we might have to deal with integer numbers in the range
-10^16 * 5^369 <= n*5^p <= 10^16 * 5^360.
Is it ok for the numerator returned from decompose be a BigInt?

stevengj · 2019-08-02T00:51:01Z

You need to be careful that the return values can't overflow, of course. If n*5^p could overflow Int then you'll need a bigger integer type; similarly for 5^p

That's the basic problem here — realmax(Dec64) is 9.999999999999999e384, which means that 5^p could be as big as 5^384 ≈ 2.54e268. Hence we would have to use BigInt to use your decompose function, which will be very slow.

StefanKarpinski · 2019-08-02T16:27:59Z

All may not be lost. This might be avoidable with some additional cleverness. See hash_integer:

https://github.com/JuliaLang/julia/blob/442d1599e62b1a3a453bd/base/hashing2.jl#L5-L26

This implements hashing of the integer values num, pow and den. The first method is a generic one which is generally fairly efficient but not specialized while the second method is a fast method specialized for the in-memory representation of BigInts—it doesn't do any intermediate BigInt computations, which require additional allocations. You can't quite do the same thing here, but you don't need to go to BigInt either. Instead, you could compute 5^p % UInt directly and get blocks of binary digits incrementally. It's a tricky but it seems plausible with a bit of modular arithmetic.

StefanKarpinski · 2019-08-02T16:52:29Z

In particular, UInt(5)^p already computes the last block of binary digits, i.e. big(5)^p % UInt. The hard part is figuring out what the quotient of that division is so that we can repeat this.

There is always the "nuclear option" of changing the way Base hashing works. I doubt many packages are hooking into it at this point and we can easily figure out the ones that do so.

StefanKarpinski · 2020-05-14T21:30:23Z

Did this get fixed?

stevengj · 2020-05-14T21:45:11Z

Yes, although the solution involved BigInt (#114). It would be nice to have faster hashing and rational comparisons in the future.

jmkuhn · 2020-05-14T21:49:08Z

Yes, a working version is currently in DecFP#master. PR #114 implemented Base.decompose() and #125 added tests for hash().

StefanKarpinski · 2020-05-14T21:49:26Z

Ok, glad it could be fixed, that is a shame about the necessity for bigints, but hopefully it can become more optimized in the future.

This comment has been minimized.

Sign in to view

Jasha10 mentioned this issue Aug 12, 2019

Decimal not usable as Dict key: no hash method defined JuliaMath/Decimals.jl#35

Open

jmkuhn mentioned this issue Mar 18, 2020

Comparison with Rational #108

Closed

jmkuhn mentioned this issue May 13, 2020

hash() tests #125

Merged

stevengj closed this as completed in #125 May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement Base.hash(x::DecFP.DecimalFloatingPoint) #97

implement Base.hash(x::DecFP.DecimalFloatingPoint) #97

Jasha10 commented Jul 25, 2019 •

edited

Loading

Jasha10 commented Jul 28, 2019 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

jmkuhn commented Jul 28, 2019 •

edited

Loading

stevengj commented Jul 29, 2019 •

edited

Loading

stevengj commented Jul 29, 2019 •

edited

Loading

StefanKarpinski commented Jul 29, 2019 •

edited

Loading

StefanKarpinski commented Jul 29, 2019

Jasha10 commented Aug 2, 2019 •

edited

Loading

stevengj commented Aug 2, 2019 •

edited by StefanKarpinski

Loading

StefanKarpinski commented Aug 2, 2019 •

edited

Loading

StefanKarpinski commented Aug 2, 2019

StefanKarpinski commented May 14, 2020

stevengj commented May 14, 2020 •

edited

Loading

jmkuhn commented May 14, 2020

StefanKarpinski commented May 14, 2020

implement Base.hash(x::DecFP.DecimalFloatingPoint) #97

implement Base.hash(x::DecFP.DecimalFloatingPoint) #97

Comments

Jasha10 commented Jul 25, 2019 • edited Loading

Jasha10 commented Jul 28, 2019 • edited Loading

This comment has been minimized.

This comment has been minimized.

jmkuhn commented Jul 28, 2019 • edited Loading

stevengj commented Jul 29, 2019 • edited Loading

stevengj commented Jul 29, 2019 • edited Loading

StefanKarpinski commented Jul 29, 2019 • edited Loading

StefanKarpinski commented Jul 29, 2019

Jasha10 commented Aug 2, 2019 • edited Loading

stevengj commented Aug 2, 2019 • edited by StefanKarpinski Loading

StefanKarpinski commented Aug 2, 2019 • edited Loading

StefanKarpinski commented Aug 2, 2019

StefanKarpinski commented May 14, 2020

stevengj commented May 14, 2020 • edited Loading

jmkuhn commented May 14, 2020

StefanKarpinski commented May 14, 2020

Jasha10 commented Jul 25, 2019 •

edited

Loading

Jasha10 commented Jul 28, 2019 •

edited

Loading

jmkuhn commented Jul 28, 2019 •

edited

Loading

stevengj commented Jul 29, 2019 •

edited

Loading

stevengj commented Jul 29, 2019 •

edited

Loading

StefanKarpinski commented Jul 29, 2019 •

edited

Loading

Jasha10 commented Aug 2, 2019 •

edited

Loading

stevengj commented Aug 2, 2019 •

edited by StefanKarpinski

Loading

StefanKarpinski commented Aug 2, 2019 •

edited

Loading

stevengj commented May 14, 2020 •

edited

Loading