Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pointMultiply performance has a problem #5

Open
junderw opened this issue Jul 22, 2023 · 7 comments
Open

pointMultiply performance has a problem #5

junderw opened this issue Jul 22, 2023 · 7 comments

Comments

@junderw
Copy link

junderw commented Jul 22, 2023

I was adding this library to our benchmarks (we had noble before but only implemented one of the functions)

It does very well, and surprisingly almost matches WASM performance on some of the less intensive operations.

But there was a huge outlier in pointMultiply.

Comparing to the other JS ones (elliptic based) the time is an order of magnitude too slow.

Maybe looking into it is warranted. If it doesn't matter, please ignore. (I'm not sure where pointMultiply is used tbh, maybe some taproot stuff)

====================================================================================================
Benchmarking function: pointMultiply
----------------------------------------------------------------------------------------------------
tiny-secp256k1 (WASM): 53.07 us/op (18841.54 op/s), ±2.95 %
[email protected] (C++ addon, NAN/V8): 13.94 us/op (71724.94 op/s), ±2.41 %
[email protected] (elliptic): 326.44 us/op (3063.31 op/s), ±7.54 %
[email protected] (C++ addon, N-API): 14.48 us/op (69076.89 op/s), ±2.81 %
[email protected] (elliptic): 165.79 us/op (6031.57 op/s), ±2.57 %
[email protected] (BigInt): 2404.29 us/op (415.92 op/s), ±1.21 %
----------------------------------------------------------------------------------------------------
Fastest: [email protected] (C++ addon, NAN/V8)
====================================================================================================
@landabaso
Copy link
Member

Hey @junderw,

Thanks for flagging this! I've been able to replicate your results. Your benchmark findings align with my observations - the pointMultiply function is noticeably slower compared to others.

Quick note though, the log entries list "[email protected]", but we're actually testing with "@noble/secp256k1": "^1.7.1", which might lead to some confusion.

To rule out any issues with my wrapper functions around noble, I removed the data assertions and similar elements.

After profiling the code, I've discovered that around 86% of the execution time is taken up specifically by P.multiply(t) in:

const _pointMultiply = (p, tweak, isCompressed) => {
  const P = necc.Point.fromHex(p);
  const h = typeof tweak === 'string' ? tweak : necc.utils.bytesToHex(tweak);
  const t = BigInt(`0x${h}`);
  return P.multiply(t).toRawBytes(isCompressed);
};

I'm wondering, @paulmillr, if you might have any insights into this. I'm certainly open to the idea that I might not be using noble optimally. Would it be possible to gain some performance improvement by moving to noble 2.0, or is this slowdown just an inherent part of the process?

I'll keep this thread open for a bit in case anyone else has suggestions or potential fixes. Cheers!

@paulmillr
Copy link
Contributor

It's worth looking into why elliptic has this advantage. However, elliptic is unsafe. For one, it exposes multiplication exponent bits: shorter private keys would take less time to calculate. (https://minerva.crocs.fi.muni.cz)

With js implementations, there's always a question of "how fast do we actually need to be?". Non-base-point multiplication is used mostly in ECDH. So far we've received 0 reports about its slowness.

@paulmillr
Copy link
Contributor

Also, I think elliptic author acknowledges the need for noble aka safer crypto libs, since he is a github sponsor.

@junderw
Copy link
Author

junderw commented Jul 25, 2023

One way to sort-of test if that's the issue is to restrict the benchmark's set of scalars from a random distribution to lower values (maybe 1 - 255) and see if that closes the gap. If so, that's definitely the cause. (Edit: oops, I mean the other way around, we should restrict to higher values like (n - 255 to n - 1)

I'll try messing around with it later today. (It might also help catch similar issues in the other impls (even though they are compiled from libsecp256k1, optimizers have been known to mess with constant-time-ness before))

@junderw
Copy link
Author

junderw commented Jul 25, 2023

It looks like the fixtures used to measure the benchmark were using low scalar values (like 1,2,3,4 etc.) so I switched it to n-1 n-2 n-3 n-4 and re-ran. (I'm doing some other computations on my PC rn so the variance % of the runs is a bit unstable, but most are within 10%)

I would assume that this would close the gap a bit, but interestingly enough, tiny didn't change, and secp256k1 changed a bit +25%.

bitcoinerlab (using noble) stayed the same, at about 7x ~ 8x tiny elliptic.

I guess this is just a product of:

elliptic uses a theoretically non-constant-time multiply algorithm.

vs.

noble uses a theoretically constant-time multiple algorithm that is more expensive.

But the weird thing: pointToScalar (which is basically just pointMultiply with the point being the generator G) is actually faster with noble... which leads me to believe the reason is somewhere in the bitcoinerlab wrapper. (See benchmarks below)

====================================================================================================
Benchmarking function: pointMultiply
----------------------------------------------------------------------------------------------------
tiny-secp256k1 (WASM): 52.79 us/op (18941.55 op/s), ±8.08 %
[email protected] (elliptic): 334.93 us/op (2985.68 op/s), ±3.84 %
[email protected] (elliptic): 198.63 us/op (5034.50 op/s), ±6.39 %
[email protected] (BigInt): 2546.22 us/op (392.74 op/s), ±7.05 %
----------------------------------------------------------------------------------------------------
Fastest: tiny-secp256k1 (WASM)
====================================================================================================
====================================================================================================
Benchmarking function: pointFromScalar
----------------------------------------------------------------------------------------------------
tiny-secp256k1 (WASM): 185.52 us/op (5390.29 op/s), ±1.86 %
[email protected] (elliptic): 291.36 us/op (3432.16 op/s), ±2.94 %
[email protected] (elliptic): 294.77 us/op (3392.52 op/s), ±3.36 %
[email protected] (BigInt): 199.11 us/op (5022.39 op/s), ±2.43 %
----------------------------------------------------------------------------------------------------
Fastest: tiny-secp256k1 (WASM)
====================================================================================================

@paulmillr
Copy link
Contributor

paulmillr commented Jul 26, 2023

pointToScalar is actually faster with noble

Yes, because noble precomputes multiples of base point. Just like native libsecp256k1.

@paulmillr
Copy link
Contributor

noble
G * a x 6,639 ops/sec @ 150μs/op
G * b x 6,615 ops/sec @ 151μs/op
G * c x 6,597 ops/sec @ 151μs/op
P * a x 554 ops/sec @ 1ms/op
P * b x 552 ops/sec @ 1ms/op
P * c x 554 ops/sec @ 1ms/op

noble unsafe
P * a x 84,480 ops/sec @ 11μs/op ± 3.84% (min: 10μs, max: 174μs)
P * b x 1,173 ops/sec @ 851μs/op
P * c x 1,107 ops/sec @ 902μs/op

elliptic
G * a x 21,039 ops/sec @ 47μs/op ± 2.17% (min: 42μs, max: 919μs)
G * b x 22,470 ops/sec @ 44μs/op
G * c x 3,038 ops/sec @ 329μs/op
P * a x 19,363 ops/sec @ 51μs/op ± 2.90% (min: 47μs, max: 737μs)
P * b x 1,280 ops/sec @ 780μs/op
P * c x 1,276 ops/sec @ 783μs/op

gist https://gist.github.com/paulmillr/cccaf4e57aa54f5d6946922a3c36fb4a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants