-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low-footprint mode #189
Comments
Ideas:
|
Signing table is now static by default, which delivers a significant fraction of the need here for small devices. |
As a side note to PR #337, is there a security risk for using the wNAF algorithm (i.e.
The verification precomputed context tables (~1.375MB in code comments) can also be eliminated by adding the result of two wNAF
Overall, I see 3 different ecmult implementations: I am happy to give it a go if it makes sense. |
For security, the fixed-window NAF precludes one of the blinding techniques we're using (the nums blinding)-- which provides speculative improvements against power side-channel attacks. I'd be sad to lose it, but it's likely not critical. I've also personally scrutinized its side-channel behavior much less carefully than ecmult_gen (because its newish and it's also considerably more complex) though this can be fixed. I'm surprised your signing with ecmult_const was only half the speed, esp since ecmult_const precomputes its own table on the fly. Haven't tried it myself. If so, it suggests to me that making ecmult_gen also use the fixed window NAF but with the current table size (which would take 32kb) would actually be faster than ecmult_gen as is, in spite of the NAF conversion costs and constant time negations (presumably due to needing half the number of constant time conditional moves). In verify; Changing to 2x secp256k1_ecmult_const() is a 2.36x slowdown on my laptop-- does your version pass the tests? ecdsa_verify: min 57.2us / avg 57.4us / max 57.7us Probably better would be ecmult_gen + ecmult_const for this-- for me thats taking only 106 microseconds, so only a 1.84x slowdown; and is sharing the same table with signing. It's still needlessly losing a lot of performance due to constant size operation. But I suppose if you're being very size miserly you wouldn't want code bloat for non-constant time versions of these functions. Similarly, some further code size reduction by replacing some of the specialized code with more generic versions (e.g. the weaker normalization with the full normalization). Alternatively, ecmult_const could be made to support performing multiple multiplies concurrently, which would also be much faster than calling it twice... esp if for G it used a pre-computed table. The design of ecmult_const wouldn't make it easy to use different table sizes for different bases in a multi-exp, so I suspect a multi-exp-ified ecmult_const with precomputed G would be slower than ecmult_gen + ecmult_const_variable_base_fixed_base. We probably need to do some more IPR analysis of the fixed window NAF used in _const before deciding if we could use it in a non-module role.
Take care not to conflate ROM tables with RAM ones, ecmult also only uses a window A sized dynamic table, the rest is constant (and ecmult_gen uses almost no dynamic memory). For many cases this makes a big difference; because ram is usually quite limited while rom usually much less so. |
Thank you for the detailed reply.
Yes, it passes the tests. I just checked again and got the same performance numbers. This is the quick hack I used: https://github.com/dbitbox/secp256k1/blob/smallverify/src/ecmult_impl.h#L269 A 32kB ecmult_gen table would be workable for me. Then an ecmult_gen + ecmult_const sounds attractive for verification. One other situation to give motivation for making the code size as small as possible is the use case of verifying the authenticity of signed firmware updates. In this case, the verification code would need to share space inside a bootloader, which of course should be small in order to give more room for the firmware. |
Hi - any progress on this? I see it's hanging for soo long. Need it desperately for an embedded device with 256KiB RAM and 1MiB flash and trying to find the right way and balance to reduce RAM footprint of ecmult precompute - as it tries to fill all RAM and kills the app. |
@vhnatyk |
This issue is related to #622. |
@real-or-random Yes - sorry that was a bit lazy question in hopes someone competent and focused would reply.
Y, by the time of your reply I got some bits of info from connected PRs about WINDOW_A and WINDOW_G tuning 😄 but, since this page was the only I got after a day or two of googling, hopefully, your reply as a contributor will save somebody else's good few hours, not to mention a couple more of research through the confusing tree of PR's. Btw I came here from rust-secp256k1 noticed you there as well. Hopefully, it will get update sources soon as I see PR is open already. Not sure I'm fit for PR or review, but going to work on that and willing to help - saw there some discussion/PR's started in rust-secp256k1#115 - will continue there. Thanx 👍 |
Isn't it possible to verify/sign without any precumputed tables? |
Yes, probably - but what about security of constant time and/or performance. Hmm need to find out with tests, what exactly from secp256k1 I'm using |
hmm I'm not big on side channel attacks, |
@elichai There really isn't any need for that. We have perfectly fine variable time and constant time algorithms. There is also no need to run without tables entirely; just make them small enough so it's ok (you can very reasonably make them just a few kB or even less). Doing so will still be vastly more efficient than using algorithm with no tables at all. |
Yes, exactly 👍 - the problem I couldn't make it work with all the defines like |
#1058 will support signing & key generation with a 2 KiB precomputed table. |
The current strategy is optimized for large fast chips with huge cache. Especially signing would be useful on some embedded devices where multiple megabytes of pre-computation is not acceptable. Reasonably low memory can be accomplished just by lowering window usage, but we could also consider a separate compile-time low-footprint option.
The text was updated successfully, but these errors were encountered: