Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Final exponentiation #75

Merged
merged 12 commits into from
Sep 21, 2021
Merged

Optimize the Final exponentiation #75

merged 12 commits into from
Sep 21, 2021

Conversation

yelhousni
Copy link
Collaborator

@yelhousni yelhousni commented Sep 15, 2021

#74
[WIP] [Ready to merge]

This PR, implements Karabina's compressed cyclotomic squaring and decompression. It also merges @gbotrel work on F_p inverse.

The trick of mixing the GS and Karabina cyclotomic squares in Expt() combined with the new inverse yields speedups in the final exponentiation for BLS12-377, BLS12-381 and BLS24-315 but not BW6-761 and BW6-633.

  • BLS12-377: s=46 --> 14% speedup
  • BLS12-381: s=32 --> 6% speedup
  • BLS24-315: s=20 --> 12% speedup [edit] 14% speedup with Montgomery batch inverse

N.B. For BLS12-381, there another series of 15 0's but it's not worth doing it with Karabina. If we manage to make the inverse even faster (e.g. Pornin with assembly), then it might lead to some speedup with a Montgomery batch inverse.

@yelhousni
Copy link
Collaborator Author

[WIP] because AMD behaves differently than Intel with assembly.. We might go back to a pure Go inverse version or implement Pornin's inverse (in pure Go).

@yelhousni yelhousni changed the base branch from master to develop September 20, 2021 07:32
@yelhousni
Copy link
Collaborator Author

With asm-inverse removed (on AWS z1d.large):

[ec2-user@ip-172-31-25-164 ~]$ benchcmp old-fe-bls381.log new-fe-bls381.log
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                          old ns/op     new ns/op     delta
BenchmarkFinalExponentiation-2     362464        354933        -2.08%
[ec2-user@ip-172-31-25-164 ~]$ benchcmp old-fe-bls377.log new-fe-bls377.log
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                          old ns/op     new ns/op     delta
BenchmarkFinalExponentiation-2     487347        434029        -10.94%
[ec2-user@ip-172-31-25-164 ~]$ benchcmp old-fe-bls24.log new-fe-bls24.log
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                          old ns/op     new ns/op     delta
BenchmarkFinalExponentiation-2     1149723       1008101       -12.32%

@yelhousni
Copy link
Collaborator Author

N.B.: The current tradeoff of (inverse cost / series of 0's size) allows only BLS24-315 to benefit from the Montgomery BatchDecompression() trick.

@gbotrel gbotrel merged commit 2faa637 into develop Sep 21, 2021
@yelhousni yelhousni deleted the feat/karabina branch September 23, 2021 10:16
@yelhousni yelhousni mentioned this pull request Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants