-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace hash with faster and better finalized hash #37
Conversation
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
Note that should this get merged for optimal performance the |
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
My recollection is that rustc doesn't hash a lot of strings so issue 1 didn't seem very impactful in practice, did your investigations show the opposite maybe? FWIW, a while back I also tried this same hash function for the integers, but with a different constant from the same paper and various finalization ops -- as such a sequence is simple enough that my rust lookalike of hash-prospector finds it very quickly. It wasn't an obvious improvement over the current hash at the time, and had less promising results than #18. (Edit: for comparison, here are #18's results, cycles here https://perf.rust-lang.org/compare.html?start=d53f1e8fbf891cf84fcb11eb078a27e528df795a&end=71812917089730aa5abc2f6c5978c6718c9b0f5f&stat=cycles%3Au) |
The performance run for this PR: https://perf.rust-lang.org/compare.html?start=0160bff4b1bffa241299aba8c8c63e7a3cd871fe&end=c358310bb0ab238e25e60f1e77f96e0a2b270d0b |
@lqd The integer hash is the more important one for sure, but there are also some string hashes in the top 99% of usage:
|
It would be good to squash all the commits together before merging this, because they're not logically separate changes. |
With the additional clarifications via comments and squashing I am happy with this. I don't normally give r+ to PRs in this repo so I will leave that to @Nilstrieb(?) |
@orlp And thanks for answering my questions and helping me get past my initial skepticism :) There have been multiple unsuccessful attempts by multiple people to improve on FxHash, some mentioned in this blog post, so it's nice to finally see success! |
Do you think it might be worth renaming |
|
It's just a whole lot of pain for little gain. I think my comment in the readme is fine (and also explains why its called
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you very much! I doubted you at RustNL but I will never doubt you again😊😊
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
…ogaloo, r=<try> Use new faster fxhash. This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133. r? `@ghost` `@bors` try `@rust-timer` queue
Update rustc-hash to version 2 This brings in the new algorithm. see rust-lang/rustc-hash#37 and rust-lang#125133
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
The current FxHasher design has a few issues:
i
do not affect bits less significant thani
. Since most hash table implementations (including hashbrown) compute their bucket indices from the least significant bits, this means everything can collide for a hash table of size 2^n if all differences are in bits n or above.With this PR I aim to fix 1 by adding a proper string hash, and mitigate 2 by adding a bit rotation as a finalizer. This bit rotation moves the high-entropy high bits into the low bits expected by hash table implementations. I also simplified the hasher to a polynomial hash, which further improved compile times and makes the hasher less of an arbitrary mish-mash of operations and more a combination of justifiable and understandable components.
I will make a dummy PR on rust-lang to do a perf run for this PR.