Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints #37233

Merged
merged 1 commit into from
Oct 19, 2016

Conversation

michaelwoerister
Copy link
Member

This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the BLAKE2b hash function, an improved version of the BLAKE sha-3 finalist.

I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):

SipHash
libcore: 0.199 seconds
libstd:  0.090 seconds

BLAKE2b
libcore: 0.162 seconds
libstd:  0.078 seconds

If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.

I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)

r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)

@eddyb
Copy link
Member

eddyb commented Oct 17, 2016

I agree on removing the sha256 implementation. Not sure what we should do for TypeId.

@nikomatsakis
Copy link
Contributor

The code looks good to me. I'm a bit confused by this sentence:

Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits)

What does using "two SipHashes" mean?

@michaelwoerister
Copy link
Member Author

What does using "two SipHashes" mean?

Have two SipHash states initialized with different seeds so they are (hopefully) statistically independent. That way you get two different 64 bit hashes. Whether that would actually be as good as a 128 bit hash, I'm not sure.

@nikomatsakis
Copy link
Contributor

@michaelwoerister

Have two SipHash states initialized with different seeds so they are (hopefully) statistically independent. That way you get two different 64 bit hashes. Whether that would actually be as good as a 128 bit hash, I'm not sure.

OK. This takes as a given that 64 bits is insuficient (which is fine). I'm curious to know how this compares to one 64-bit hash as today, just for completeness. I guess that would be roughly a 2x slowdown? (Hashing overall is not that expensive, though, iirc.)

@michaelwoerister
Copy link
Member Author

michaelwoerister commented Oct 17, 2016

Here are some numbers for the 64 bit case:

SipHash (128 bit)
libcore: 0.199 seconds
libstd:  0.090 seconds

BLAKE2b (128 bit)
libcore: 0.162 seconds
libstd:  0.078 seconds

SipHash (64 bit)
libcore: 0.151 seconds
libstd:  0.075 seconds

@nikomatsakis
Copy link
Contributor

@michaelwoerister I'd say that's a fairly convincing argument that this is low overhead =)

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Oct 18, 2016

📌 Commit d07523c has been approved by nikomatsakis

eddyb added a commit to eddyb/rust that referenced this pull request Oct 19, 2016
…nikomatsakis

ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints

This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist.

I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):

```
SipHash
libcore: 0.199 seconds
libstd:  0.090 seconds

BLAKE2b
libcore: 0.162 seconds
libstd:  0.078 seconds
```

If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.

I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)

r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
eddyb added a commit to eddyb/rust that referenced this pull request Oct 19, 2016
…nikomatsakis

ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints

This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist.

I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):

```
SipHash
libcore: 0.199 seconds
libstd:  0.090 seconds

BLAKE2b
libcore: 0.162 seconds
libstd:  0.078 seconds
```

If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.

I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)

r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
bors added a commit that referenced this pull request Oct 19, 2016
@bors bors merged commit d07523c into rust-lang:master Oct 19, 2016
@brson
Copy link
Contributor

brson commented Oct 24, 2016

@michaelwoerister I'm fine with replacing sha with blake.

@tcosprojects
Copy link

Why Blake2b and not SHA3?

@michaelwoerister
Copy link
Member Author

@tcosprojects Because Blake2b is faster to compute if these measurements are to be believed: blake (from https://blake2.net)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants