Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Use little-endian load for std::hash #561

Merged
merged 2 commits into from
Feb 16, 2021
Merged

C++: Use little-endian load for std::hash #561

merged 2 commits into from
Feb 16, 2021

Conversation

chfast
Copy link
Member

@chfast chfast commented Nov 2, 2020

This replaces the big-endian loads with little-endian loads in hash functions for evmc::address and evmc::bytes32.
Performance improvements are significant.

hash_<evmc::bytes32, hash<evmc::bytes32>>_mean                           -0.2973         -0.2973          2335          1641          2335          1641
hash_<evmc::bytes32, noinline_hash<evmc::bytes32>>_mean                  -0.1559         -0.1559          3045          2571          3045          2571
hash_<evmc::address, hash<evmc::address>>_mean                           -0.4009         -0.4009          1323           793          1323           793
hash_<evmc::address, noinline_hash<evmc::address>>_mean                  -0.2762         -0.2762          1955          1415          1955          1415

Originally, I also tried much simpler word folding fold(a, b): 3*a + b. These hashes does not look very random any more, and hash of zero is zero. Furthermore, it only improves performance (over little-endian version) for hash functions inlined in a loop, what is probably not the case for hash maps.

hash_<evmc::bytes32, hash<evmc::bytes32>>_mean                           -0.1432         -0.1432          1641          1406          1641          1406
hash_<evmc::bytes32, noinline_hash<evmc::bytes32>>_mean                  -0.0006         -0.0006          2571          2569          2571          2569
hash_<evmc::address, hash<evmc::address>>_mean                           -0.1896         -0.1896           793           643           793           643
hash_<evmc::address, noinline_hash<evmc::address>>_mean                  -0.0087         -0.0087          1415          1403          1415          1403

We can revisit more optimizations here, but we should build some hashmap performance testing up front (e.g. see https://stackoverflow.com/a/62345875/725174).

@chfast
Copy link
Member Author

chfast commented Nov 2, 2020

TODO: std::hash unit tests are pretty bad - changing BE load to LE produces the same value for the given test cases.

Copy link
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

@@ -827,31 +815,25 @@ namespace std
template <>
struct hash<evmc::address>
{
/// Hash operator using FNV1a-based folding.
/// Hash operator using (3a + b) folding of the address "words".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 3a + b? Some homebrew hashing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindof. We have this progression of options:

  1. Fold all words with XOR.
  2. Fold all words with ADD. A bit better than XOR because discards less information. But also symmetric.
  3. "Classic" multiply by prime/odd number and add: fold(a,b) { return 3*a + b }.

The 3 is used because has the same performance as 1 and 2. The multiply is done by lea instruction and the throughput is the same because of the executing multiple instructions in the same time. I.e. latency of "multiply" is hidden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would 1) and 2) work? The same bytes in a different order would result in the same hash. Or do you mean not only xor/add, but some shifting/etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just word0 ^ word1 ^ word2 ^ word3. Similarly add.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, we discussed that this is only used as a quick lookup, but the actual data is then compared at a match, so clashes do not matter.

Base automatically changed from optimize_cpp_compare to master November 2, 2020 19:45
@codecov-io
Copy link

Codecov Report

Merging #561 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
- Coverage   91.31%   91.30%   -0.01%     
==========================================
  Files          22       22              
  Lines        3119     3118       -1     
==========================================
- Hits         2848     2847       -1     
  Misses        271      271              

@chfast
Copy link
Member Author

chfast commented Nov 2, 2020

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

Using FNV is pretty solid. Would be nice to confirm if your hashmap is using std::hash and benchmark this change with silkworm.

@yperbasis
Copy link
Member

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

Using FNV is pretty solid. Would be nice to confirm if your hashmap is using std::hash and benchmark this change with silkworm.

I've checked and the hashmap does use std::hash. There's a tiny performance gain: 0.1 h win out of 16.5 h of executing the first 11M blocks.

Copy link
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent on this. At least the first commit for adding more tests should be merged.

@chfast chfast force-pushed the optimize_cpp_hash branch 2 times, most recently from a0fdd25 to a138c51 Compare February 16, 2021 10:10
@chfast
Copy link
Member Author

chfast commented Feb 16, 2021

In the final version there is only switch to little-endian loading. See the updated description.

@chfast chfast merged commit b606331 into master Feb 16, 2021
@chfast chfast deleted the optimize_cpp_hash branch February 16, 2021 10:35
@chfast chfast changed the title C++: Use simpler 3a + b folding in std::hash C++: Use little-endian load for std::hash Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants