Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed-digit multi-comb for ecmult_gen #546

Closed
wants to merge 8 commits into from

Conversation

peterdettman
Copy link
Contributor

@peterdettman peterdettman commented Aug 4, 2018

See section 3.3 of https://eprint.iacr.org/2012/309 for a description of the algorithm. Briefly, the scalar is recoded into signed-binary form, then divided into several blocks. A separate precomp. table is prepared for each block, and performing a multiplication is done using one comb per block, interleaved.

This implementation is constant-time, preserves the existing scalar blinding, but the NUMS group element is not yet used, perhaps not really useful (no zeroes in the signed-digit recoding). Static precomputation is not yet implemented. Settings are overridden to let the exhaustive tests work.

You can play with the comb parameters in ecmult_gen.h .

Compared to the existing approach, this gives improved performance/memory tradeoffs, and allows considerable flexibility in the parameters depending on platform details.

The following table gives an idea of the sort of tradeoffs available (bench_sign results - best "min" of 3, asm=no, 64bit field and scalar, -O3, Haswell):

Blocks Teeth Spacing Memory (KiB) Time (us)
43 6 1 86 39.2
22 6 2 44 39.7
11 6 4 22 40.1
4 6 11 8 41.0
4 5 13 4 42.6
2 5 26 2 44.6
2 4 32 1 48.4
1 4 64 0.5 53.3
1 3 86 0.25 63.1
1 2 128 0.125 82.3
1 1 256 0.0625 140

For existing approach: 44.6us (64KiB precomp. data)

@gmaxwell
Copy link
Contributor

Oh wow! I wasn't expecting to see a big (relative to what we normally get) speedup any time soon.

#if USE_COMB
ctx->prec = (secp256k1_ge_storage (*)[COMB_BLOCKS][COMB_POINTS])secp256k1_ecmult_gen_ctx_prec;
#if COMB_OFFSET
secp256k1_ge_from_storage(&ctx->offset, &secp256k1_ecmult_gen_ctx_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secp256k1_ecmult_gen_ctx_offset is not declared except in gen_context.c, so this line doesn't compile for me when I set the parameters to 4/4/16.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a build system issue; there's no dependency of gen_context on ecmult_gen.h (and presumably others). At the moment, after changing comb parameters in ecmult_gen.h, you'd need to touch gen_context.c (or just make clean).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has burned me multiple times while testing -- @sipa can you advise how to fix this?

/* The remaining COMB_* parameters are derived values, don't modify these. */
#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))
#define COMB_OFFSET (COMB_BITS == 256)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation somewhere about what the offset does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: e8beef9 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@peterdettman
Copy link
Contributor Author

peterdettman commented Sep 23, 2018

The CI test failure (1407.7) seems spurious, might need investigation.

@apoelstra
Copy link
Contributor

Agreed it seems spurious - I had the same issue on #557. I kicked travis to see if it works this time.

src/ecmult_gen.h Outdated

/* COMB_BLOCKS, COMB_TEETH, COMB_SPACING must all be positive and the product of the three (COMB_BITS)
* must evaluate to a value in the range [256, 288]. The resulting memory usage for precomputation
* will be COMB_POINTS_TOTAL * sizeof(secp256k1_ge_storage). */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should comment that COMB_SPACING should not exceed 32 or else the bit_pos logic in secp256k1_ecmult_gen stops working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see why, and all cases in the table above passed the tests before benchmarking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I forgot to make clean after recompiling again. The tests do pass now.

But my reasoning was that you're only looking at one word of recoded at once, so if the bit_pos index jumps forward by more than one word, eventually bit_pos >> 5 will increment by more than one per iteration and you'll have skipped an entire word. But I'm re-reading the code with fresher eyes and now I see that this is exactly the intended behaviour.


/* The remaining COMB_* parameters are derived values, don't modify these. */
#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))
Copy link
Contributor

@apoelstra apoelstra Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop COMB_GROUPED because it's impossible for 32 % COMB_TEETH to be 0. (If COMB_TEETH is zero, things obviously won't work...but if it's ≥ 32, then several other things break: COMB_MASK overflows; the bits = expression in ecmult_gen_impl.h:228 left-shifts too far; I also get this bizarre error about the size of prec being negative in ecmult_gen.h:71.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, once COMB_TEETH gets much past 10 we start running into trouble in secp256k1_ecmult_gen_context_build because we put COMB_POINTS_TOTAL-many gejs on the stack.

Given that this is a ridiculous thing to do I don't think we should try to support it, e.g. by using heap allocations rather than stack allocations in gen_context_build. We should just assume it doesn't happen and drop COMB_GROUPED.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are misreading 32%T, which is 0 when T is a small power of 2. Still, practical values of COMB_TEETH probably peak at 8, and there is the stack concern as you say, so we should probably have the precompiler constrain it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I misread 32 % COMB_TEETH (which is 0 for 1,2,4,8) as COMB_TEETH % 32 (which can't be 0). Let me reassess COMB_GROUPED in that light.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4c8ff9b adds precompiler guards on the comb constants.

@apoelstra
Copy link
Contributor

Since USE_COMB is now always set, I think it's worth adding another commit that removes the old precomp code.

Other than that, and my nits about extreme parameter settings, ACK.

bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1F)) & 1;
bits |= bit << tooth;
bit = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);
bits &= ~(1 << tooth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bit-clearing line isn't actually needed, because bits starts at 0 and 1 << tooth is unique on each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ‘bit’ may now contain junk (or noise) bits beyond bit 0. Iterations after the first therefore need to clear the target bit (and we try to leave noise in the high bits). The change is in response to [1], which I think @gmaxwell mentioned on IRC a while back.

[1] https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I understand, I was misreading bit << tooth as (bit & 1) << tooth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the variable is named bit :P

@apoelstra
Copy link
Contributor

ACK except that I'd really like to fix the make clean issue.

bit_pos = comb_off;
for (block = 0; block < COMB_BLOCKS; ++block) {
#if COMB_GROUPED
bits = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done as part of a cmov ladder, like the old algorithm, to avoid cache-timing attacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is reading a window from the scalar. IIUC, the cmov ladder you mean is the _ge_storage_cmov loop at lines 255-257.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes! I missed that - you do indeed have a cmov ladder at the point where the secret data is actually used. Thanks.

@sipa
Copy link
Contributor

sipa commented Nov 11, 2019

I've opened a rebased PR with somewhat better integration added in #693.

@real-or-random
Copy link
Contributor

I've opened a rebased PR with somewhat better integration added in #693.

Closing this in favor of #693.

@sipa
Copy link
Contributor

sipa commented Dec 29, 2021

Another iteration: #1058.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants