-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signed-digit multi-comb for ecmult_gen #546
Conversation
Oh wow! I wasn't expecting to see a big (relative to what we normally get) speedup any time soon. |
#if USE_COMB | ||
ctx->prec = (secp256k1_ge_storage (*)[COMB_BLOCKS][COMB_POINTS])secp256k1_ecmult_gen_ctx_prec; | ||
#if COMB_OFFSET | ||
secp256k1_ge_from_storage(&ctx->offset, &secp256k1_ecmult_gen_ctx_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secp256k1_ecmult_gen_ctx_offset
is not declared except in gen_context.c, so this line doesn't compile for me when I set the parameters to 4/4/16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a build system issue; there's no dependency of gen_context on ecmult_gen.h (and presumably others). At the moment, after changing comb parameters in ecmult_gen.h, you'd need to touch gen_context.c (or just make clean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has burned me multiple times while testing -- @sipa can you advise how to fix this?
/* The remaining COMB_* parameters are derived values, don't modify these. */ | ||
#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING) | ||
#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0)) | ||
#define COMB_OFFSET (COMB_BITS == 256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation somewhere about what the offset does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: e8beef9 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
The CI test failure (1407.7) seems spurious, might need investigation. |
Agreed it seems spurious - I had the same issue on #557. I kicked travis to see if it works this time. |
src/ecmult_gen.h
Outdated
|
||
/* COMB_BLOCKS, COMB_TEETH, COMB_SPACING must all be positive and the product of the three (COMB_BITS) | ||
* must evaluate to a value in the range [256, 288]. The resulting memory usage for precomputation | ||
* will be COMB_POINTS_TOTAL * sizeof(secp256k1_ge_storage). */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should comment that COMB_SPACING
should not exceed 32 or else the bit_pos
logic in secp256k1_ecmult_gen
stops working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t see why, and all cases in the table above passed the tests before benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I forgot to make clean
after recompiling again. The tests do pass now.
But my reasoning was that you're only looking at one word of recoded
at once, so if the bit_pos
index jumps forward by more than one word, eventually bit_pos >> 5
will increment by more than one per iteration and you'll have skipped an entire word. But I'm re-reading the code with fresher eyes and now I see that this is exactly the intended behaviour.
|
||
/* The remaining COMB_* parameters are derived values, don't modify these. */ | ||
#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING) | ||
#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should drop COMB_GROUPED
because it's impossible for 32 % COMB_TEETH
to be 0. (If COMB_TEETH
is zero, things obviously won't work...but if it's ≥ 32, then several other things break: COMB_MASK
overflows; the bits =
expression in ecmult_gen_impl.h:228 left-shifts too far; I also get this bizarre error about the size of prec
being negative in ecmult_gen.h:71.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, once COMB_TEETH
gets much past 10 we start running into trouble in secp256k1_ecmult_gen_context_build
because we put COMB_POINTS_TOTAL
-many gejs on the stack.
Given that this is a ridiculous thing to do I don't think we should try to support it, e.g. by using heap allocations rather than stack allocations in gen_context_build. We should just assume it doesn't happen and drop COMB_GROUPED
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are misreading 32%T, which is 0 when T is a small power of 2. Still, practical values of COMB_TEETH probably peak at 8, and there is the stack concern as you say, so we should probably have the precompiler constrain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I misread 32 % COMB_TEETH
(which is 0 for 1,2,4,8) as COMB_TEETH % 32
(which can't be 0). Let me reassess COMB_GROUPED in that light.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4c8ff9b adds precompiler guards on the comb constants.
Since Other than that, and my nits about extreme parameter settings, ACK. |
src/ecmult_gen_impl.h
Outdated
bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1F)) & 1; | ||
bits |= bit << tooth; | ||
bit = recoded[bit_pos >> 5] >> (bit_pos & 0x1F); | ||
bits &= ~(1 << tooth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bit-clearing line isn't actually needed, because bits
starts at 0 and 1 << tooth
is unique on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ‘bit’ may now contain junk (or noise) bits beyond bit 0. Iterations after the first therefore need to clear the target bit (and we try to leave noise in the high bits). The change is in response to [1], which I think @gmaxwell mentioned on IRC a while back.
[1] https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I understand, I was misreading bit << tooth
as (bit & 1) << tooth
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the variable is named bit
:P
ACK except that I'd really like to fix the |
bit_pos = comb_off; | ||
for (block = 0; block < COMB_BLOCKS; ++block) { | ||
#if COMB_GROUPED | ||
bits = recoded[bit_pos >> 5] >> (bit_pos & 0x1F); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be done as part of a cmov ladder, like the old algorithm, to avoid cache-timing attacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is reading a window from the scalar. IIUC, the cmov ladder you mean is the _ge_storage_cmov loop at lines 255-257.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes! I missed that - you do indeed have a cmov ladder at the point where the secret data is actually used. Thanks.
I've opened a rebased PR with somewhat better integration added in #693. |
Another iteration: #1058. |
See section 3.3 of https://eprint.iacr.org/2012/309 for a description of the algorithm. Briefly, the scalar is recoded into signed-binary form, then divided into several blocks. A separate precomp. table is prepared for each block, and performing a multiplication is done using one comb per block, interleaved.
This implementation is constant-time, preserves the existing scalar blinding, but the NUMS group element is not yet used, perhaps not really useful (no zeroes in the signed-digit recoding).
Static precomputation is not yet implemented.Settings are overridden to let the exhaustive tests work.You can play with the comb parameters in ecmult_gen.h .
Compared to the existing approach, this gives improved performance/memory tradeoffs, and allows considerable flexibility in the parameters depending on platform details.
The following table gives an idea of the sort of tradeoffs available (bench_sign results - best "min" of 3, asm=no, 64bit field and scalar, -O3, Haswell):
For existing approach: 44.6us (64KiB precomp. data)