Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on SHA3-Keccak256 since v0.10.7 #468

Closed
Slixe opened this issue Apr 20, 2023 · 12 comments
Closed

Performance issue on SHA3-Keccak256 since v0.10.7 #468

Slixe opened this issue Apr 20, 2023 · 12 comments

Comments

@Slixe
Copy link

Slixe commented Apr 20, 2023

Hey there,

After an update to the last version for this crate, I've found a strange problem, my performance have decreased to half of what I get with v0.10.6.

I'm using a Ryzen 3700X with 32GB on Debian.
In case you believe its my fault, here is what I saw:
To see the performance issue you can run cargo run --release --bin xelis_miner -- --benchmark --iterations=1000000
on this commit:
xelis-project/xelis-blockchain@7a31b0b
And on the previous commit to compare.

Thanks

@newpavlov
Copy link
Member

Could you run your benchmark with keccak downgraded to an older version? Changes in sha3 v0.10.7 are either disabled by default (the asm feature), or purely additive, so they should not influence performance. It also could be a compiler regression, so I suggest running benchmarks with older Rust versions.

@tarcieri
Copy link
Member

I'm very surprised to hear about any performance regressions on x86/x86_64, as all of the relevant changes I can think of were ARM-related

@tarcieri
Copy link
Member

@Slixe what would be really helpful is if you could git bisect the changes in this repo and see if you can narrow down where it occurred

@Slixe
Copy link
Author

Slixe commented Apr 21, 2023

Could you run your benchmark with keccak downgraded to an older version? Changes in sha3 v0.10.7 are either disabled by default (the asm feature), or purely additive, so they should not influence performance. It also could be a compiler regression, so I suggest running benchmarks with older Rust versions.

I've tried with the keccak v0.1.2 on sha3 v0.10.7 and still the same. I've also tried with lower version of digest too, same result.

I've tried also on rust toolchain 1.69.0 (latest), 1.68, 1.65, 1.64, same for these versions...

Performances are still much better on sha3 v0.10.6, and I don't understand why.

@Slixe
Copy link
Author

Slixe commented Apr 21, 2023

@Slixe what would be really helpful is if you could git bisect the changes in this repo and see if you can narrow down where it occurred

According to the git bisect, the bad commit is 9b218cf where my performance are bad.

This change is what reduced my performances: 9b218cf#diff-90cc61bbc907b3ec5bb54982793be9757502097a40a475873b6bab46d5874889L19

If I put keccak::f1600(&mut self.state); again in absorb_block my performances are good again.

@tarcieri
Copy link
Member

Interesting /cc @aewag

@aewag
Copy link
Contributor

aewag commented Apr 21, 2023

Hmm, my first guess: maybe it is because the round_count is no longer a const and this hinders the compiler to do some optimizations.

Could you change the call within absorb_block to the keccak_p as following:

keccak::keccak_p(&mut self.state, DEFAULT_ROUND_COUNT);

This should result in the same speed as with using f1600.

@Slixe
Copy link
Author

Slixe commented Apr 21, 2023

No, still not, it's improving only by ~1-5% but still not as much as I had before the update.

@aewag
Copy link
Contributor

aewag commented Apr 28, 2023

Thanks. Just looked again into it. The issue lies within the generic approach of keccak_p. If keccak_p is called LaneSize is evaluated at runtime. Thus, the high performance difference.

I just wrote a fix and tested it. My approach would be to add functions for keccak_p<S> and S out of [200, 400, 800, 1600]. Similar to the keccak_f routines.

@aewag
Copy link
Contributor

aewag commented Apr 28, 2023

Following are the benchmarks evaluated with keccak_p and p1600 functions:

current master of sha3 crate:

test sha3_256_10    ... bench:         141 ns/iter (+/- 45) = 70 MB/s
test sha3_256_1000  ... bench:      14,652 ns/iter (+/- 7,731) = 68 MB/s
test sha3_256_10000 ... bench:     144,600 ns/iter (+/- 53,622) = 69 MB/s

with p1600 routines added in keccak crate and used in sha3 crate:

running 3 tests
test sha3_256_10    ... bench:          40 ns/iter (+/- 2) = 250 MB/s
test sha3_256_1000  ... bench:       3,985 ns/iter (+/- 271) = 250 MB/s
test sha3_256_10000 ... bench:      39,327 ns/iter (+/- 1,277) = 254 MB/s

@tarcieri If the draft PR in the sponges directory looks good to you, I would add, after a keccak crate release, a PR to the hashes repository.

@tarcieri
Copy link
Member

tarcieri commented May 1, 2023

@aewag can you update this PR to use keccak from git so @Slixe can confirm the issue is fixed?

@tarcieri
Copy link
Member

tarcieri commented May 5, 2023

Fixed in v0.10.8: #474

@tarcieri tarcieri closed this as completed May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants