element.Mul now uses ADOX ADCX and MULX when available #13

gbotrel · 2020-04-07T05:41:21Z

As described in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf
This new version is constant time.
Implemented only when nbWords <= 6; need to manage register spilling into memory for larger moduli.
Squaring has an ASM implementation in this branch that's not plugged, as it is currently slower than MUL

Preliminary benchmarks:
for a 4 word modulus (bn256):

bn256 [develop] $ benchcmp old.txt new.txt 
benchmark                       old ns/op     new ns/op     delta
BenchmarkMulAssignELEMENT-8     28.1          21.8          -22.42%
BenchmarkMulAssignELEMENT-8     28.8          21.8          -24.31%
BenchmarkMulAssignELEMENT-8     28.9          21.8          -24.57%

for a 6 word modulus (bls377, bls381):
cycles count: 121 (vs 125 for herumi/mcl)

benchmark                       old ns/op     new ns/op     delta
BenchmarkMulAssignELEMENT-8     53.9          40.7          -24.49%
BenchmarkMulAssignELEMENT-8     54.0          40.9          -24.26%
BenchmarkMulAssignELEMENT-8     54.0          40.3          -25.37%

internal/template/asm/mul.go contains code generation for the amd64 multiplication. It checks for availability of ADDX and MULX and jump to the standard version if not present.

To benefit from the ADCX and ADOX carry chains we split the inner loops in 2 (https://hackmd.io/@zkteam/modular_multiplication) :

    // for i=0 to N-1
    // 		for j=0 to N-1
    // 		    (A,t[j])  := t[j] + a[j]*b[i] + A
    // 		m := t[0]*q'[0] mod W
    // 		C,_ := t[0] + m*q[0]
    // 		for j=1 to N-1
    // 		    (C,t[j-1]) := t[j] + m*q[j] + C
    // 		t[N-1] = C + A

…4, clone to element.MulAssign)

…ower than mul

gbotrel added 9 commits April 6, 2020 22:41

element/: Mul now uses ADCX, ADOX and MULX on amd64 when available

2052e6f

go vet: fix wrong label in generated assembly

ec75404

circleci: remove long test in benches (timeout)

d607307

asm mul: added noescape directive

461c5bf

mul asm: added comments in template

4a59a7f

fix hardcoded modulus in tests

259eb41

fix hardcoded modulus in tests

12e7f61

element.Mul: exposing MulAssign package function (asm version on amd6…

4df7f45

…4, clone to element.MulAssign)

fix non-asm exposed MulAssign when using CIOS method

6885d9a

gbotrel mentioned this pull request Apr 7, 2020

Benchmarks #12

Closed

gbotrel added 4 commits April 7, 2020 18:00

better .s (assembly) formatting

76948e0

cosmetics to make golint happy

9d688ff

square uses mulAssign on x64 as current implementation with ADX is sl…

d33deab

…ower than mul

rm .s files from previous generation

79bdded

gbotrel marked this pull request as ready for review April 8, 2020 03:40

gbotrel merged commit bac1925 into develop Apr 8, 2020

gbotrel deleted the asm branch April 8, 2020 03:43

This was referenced Apr 8, 2020

Merging develop into master #14

Merged

Merge develop into master Consensys/gnark-crypto#2

Merged

Feature/mimc7 goff iden3/go-iden3-crypto#16

Merged

arnaucube mentioned this pull request Apr 8, 2020

Update to goff v0.2.0 iden3/go-iden3-crypto#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

element.Mul now uses ADOX ADCX and MULX when available #13

element.Mul now uses ADOX ADCX and MULX when available #13

gbotrel commented Apr 7, 2020 •

edited

Loading

element.Mul now uses ADOX ADCX and MULX when available #13

element.Mul now uses ADOX ADCX and MULX when available #13

Conversation

gbotrel commented Apr 7, 2020 • edited Loading

gbotrel commented Apr 7, 2020 •

edited

Loading