This repository has been archived by the owner on Jul 1, 2022. It is now read-only.
element.Mul now uses ADOX ADCX and MULX when available #13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As described in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf
This new version is constant time.
Implemented only when nbWords <= 6; need to manage register spilling into memory for larger moduli.
Squaring has an ASM implementation in this branch that's not plugged, as it is currently slower than MUL
Preliminary benchmarks:
for a 4 word modulus (bn256):
for a 6 word modulus (bls377, bls381):
cycles count: 121 (vs 125 for
herumi/mcl
)internal/template/asm/mul.go
contains code generation for the amd64 multiplication. It checks for availability of ADDX and MULX and jump to the standard version if not present.To benefit from the ADCX and ADOX carry chains we split the inner loops in 2 (https://hackmd.io/@zkteam/modular_multiplication) :