-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Use little-endian load for std::hash #561
Conversation
6bad478
to
8020d35
Compare
TODO: std::hash unit tests are pretty bad - changing BE load to LE produces the same value for the given test cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.
include/evmc/evmc.hpp
Outdated
@@ -827,31 +815,25 @@ namespace std | |||
template <> | |||
struct hash<evmc::address> | |||
{ | |||
/// Hash operator using FNV1a-based folding. | |||
/// Hash operator using (3a + b) folding of the address "words". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 3a + b
? Some homebrew hashing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindof. We have this progression of options:
- Fold all words with XOR.
- Fold all words with ADD. A bit better than XOR because discards less information. But also symmetric.
- "Classic" multiply by prime/odd number and add:
fold(a,b) { return 3*a + b }
.
The 3 is used because has the same performance as 1 and 2. The multiply is done by lea
instruction and the throughput is the same because of the executing multiple instructions in the same time. I.e. latency of "multiply" is hidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would 1) and 2) work? The same bytes in a different order would result in the same hash. Or do you mean not only xor/add, but some shifting/etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just word0 ^ word1 ^ word2 ^ word3
. Similarly add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, we discussed that this is only used as a quick lookup, but the actual data is then compared at a match, so clashes do not matter.
8020d35
to
d0a8f5b
Compare
Codecov Report
@@ Coverage Diff @@
## master #561 +/- ##
==========================================
- Coverage 91.31% 91.30% -0.01%
==========================================
Files 22 22
Lines 3119 3118 -1
==========================================
- Hits 2848 2847 -1
Misses 271 271 |
Using FNV is pretty solid. Would be nice to confirm if your hashmap is using std::hash and benchmark this change with silkworm. |
I've checked and the hashmap does use std::hash. There's a tiny performance gain: 0.1 h win out of 16.5 h of executing the first 11M blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm indifferent on this. At least the first commit for adding more tests should be merged.
a0fdd25
to
a138c51
Compare
a138c51
to
f124cc8
Compare
In the final version there is only switch to little-endian loading. See the updated description. |
This replaces the big-endian loads with little-endian loads in hash functions for
evmc::address
andevmc::bytes32
.Performance improvements are significant.
Originally, I also tried much simpler word folding
fold(a, b): 3*a + b
. These hashes does not look very random any more, and hash of zero is zero. Furthermore, it only improves performance (over little-endian version) for hash functions inlined in a loop, what is probably not the case for hash maps.We can revisit more optimizations here, but we should build some hashmap performance testing up front (e.g. see https://stackoverflow.com/a/62345875/725174).