Potential private key derivation optimization #5125

joostjager · 2021-03-19T14:59:25Z

I've been running load tests for receiving payments and generated flame graphs to get a feel for where time is spent. One thing that stood out is how expensive BtcWalletKeyRing.DerivPrivKey is. That method not only derives a key, but also accesses the database for update.

To find out if the derived keys are all different or the same, I patched lnd to add an extra marker-method to the call tree deriveRootPrivKey. Execution flows through that marker-method only when the root of a key family is requested.

Below is the resulting flame graph where deriveRootPrivKey is marked in purple.

The same root key is requested over and over again for signing the invoice and ECDH onion operations. This seems like a candidate for optimization, for example through caching.

Another observation is that invoice add and update are quite costly. It is unfortunate that bbolt only offers full database locks which cause concurrent payments to block each other.

The text was updated successfully, but these errors were encountered:

Crypt-iQ · 2021-03-19T15:30:46Z

Nice flame graphs -- are you suggesting caching private keys?

joostjager · 2021-03-19T15:32:35Z

Yes I am, possibly with a timer to clear them again if that really makes it more secure. Caching one key is probably enough: family 6, index 0.

Crypt-iQ · 2021-03-19T18:42:15Z

The private key is already in memory because we extract it which is already kind of bad, but I believe the plan is to have the option to remove all private key handling from lnd (via pubkey import). So I don't think caching private keys is in line with that approach. Plus any application on the host machine could attempt to flush the computer's cache and read these private keys (even unprivileged applications).

@wpaulino @guggero thoughts?

joostjager · 2021-03-19T19:06:36Z

If all key handling will be handled by an HSM, it won't even be possible to cache I'd say. But I don't know how far off that is. I had the impression that pubkey import was intended just for funding and not for things like signing invoices and decrypting the onion payload, or am I mistaking?

Another option is perhaps to only cache the database part, but still rederive the key every time.

Plus any application on the host machine could attempt to flush the computer's cache and read these private keys (even unprivileged applications).

Not sure what you mean by 'flushing the computer's cache', but is this any different from reading the private key that is in memory already now?

guggero · 2021-03-22T09:22:48Z

We fetch the encrypted extended private key of the account in question from the DB every time. Maybe it would be enough to cache the encrypted version of it, if the DB part is indeed slow? Then only the decryption and derivation would happen every time. Not sure how much that reduces the call time though.

We made a big step towards the signing abstraction, see @wpaulino's PR: #5047
Now the only thing that's left is to put every signing call behind an RPC call. Then our HSM would be a secondary lnd that has all keys and would sign requests it gets over RPC. So that secondary lnd would still be able to implement said caching above.

joostjager · 2021-03-22T10:16:45Z

We fetch the encrypted extended private key of the account in question from the DB every time.

Is that really the case? I looked a bit at the code (maybe not good enough) and it seems that ScopedKeyManager is cached across multiple DerivePrivKey calls as well as the accountInfo instances (containing private key material) that are kept inside ScopedKeyManager.

Crypt-iQ · 2021-03-22T19:50:13Z

Not sure what you mean by 'flushing the computer's cache', but is this any different from reading the private key that is in memory already now?

Processes can't just read other processes memory unless they have special permissions (gdb for example). So the way that unprivileged processes can read memory is via side-channel cache attacks (see: Flush + Reload). Ideally private keys are zero-ed out after use, since it may take a while for the GC to run or for that memory to be re-used in non-GC'd languages. From a theoretical standpoint, the less time private keys reside in memory, the better. That said, if the computer is already compromised, there are just-as-bad things a malicious program can do besides trying to extract a private key. If the key is going in-and-out of memory as seems to be the case now, there's not too much of a difference security-wise between what we do now and caching it.

wpaulino · 2021-03-22T20:15:17Z

Another option is perhaps to only cache the database part, but still rederive the key every time.

Looking at the code path, this is already the case. We cache the account information (which includes the private/public keys among other things) and derive from there. The database transaction is only really there for when we haven't cached the account information yet. Maybe it's worth re-working that to see the performance gains without the database transaction?

joostjager · 2021-03-22T21:46:23Z

I am running a benchmark with a lot of parallel threads. Because of bbolt's global transaction mechanism, I can imagine a lot of blocking there because of all kinds of other, unrelated db writes such as invoice updates and channel updates.

wpaulino · 2021-03-22T22:47:26Z

Sounds like it could be worthwhile to try a version of DerivePrivKey that doesn't open a database transaction then.

Gossip messages shouldn't have an effect on the wallet's database.

joostjager · 2021-03-23T06:30:04Z

Gossip messages shouldn't have an effect on the wallet's database.

With channel updates, I meant channel state updates. But that and the invoice updates are indeed happening in a different db, forgot about that. Concurrent calls to DerivePrivKey may be blocking each other still.

joostjager · 2021-04-06T17:10:48Z

@wpaulino I can look at trying that alternative version, but why not return the private key from cache straight away if it is cached in memory already? It seems to me that there is no point in having that db transaction then.

Also something to be cognizant of is that every bbolt transaction requires two fsync calls. For example on google cloud, one such call takes about 2.5 ms, so this (redundant) key lookup blocks all other db txes in walletdb for 5 ms. Just this by itself already caps the max tps at 200.

wpaulino · 2021-04-06T22:15:04Z

@wpaulino I can look at trying that alternative version, but why not return the private key from cache straight away if it is cached in memory already? It seems to me that there is no point in having that db transaction then.

The only private keys cached are for the different accounts of each key scope, so you'd still have to derive from that. If the derivation process is too expensive, especially after done multiple times for the same key, then you could look into adding a cache at that level as well.

Roasbeef · 2021-08-14T00:40:41Z

Went with the option of caching things internally in btcwallet here: #5629. This'll also speed up the multi-sig+HTLC signing operations, as no DB transaction will be requried for those as well.

In this commit, we start to optimistically use the new private key cache that was added to btcwallet. As is, btcwallet will cache the decrypted account keys for each scope in memory. However, the existing methods to derive a child key from those account keys requires a write database transaction, and will re-derive the private key using BIP-32 each time. The newly added `DeriveFromKeyPathCache` lets us skip all this and directly use a cache assuming the account info is already cached. The new logic will try to use this method, but if it fails fall back to the existing `DeriveFromKeyPath` method. All calls after this will use this new cached key. Fixes lightningnetwork#5125. Basic benchmark before the btcwallet change and after: ``` benchmark old ns/op new ns/op delta BenchmarkDerivePrivKey-8 22840583 125 -100.00% benchmark old allocs new allocs delta BenchmarkDerivePrivKey-8 89 2 -97.75% benchmark old bytes new bytes delta BenchmarkDerivePrivKey-8 10225 24 -99.77% ```

Roasbeef added enhancement Improvements to existing features / behaviour optimization wallet The wallet (lnwallet) which LND uses labels Mar 19, 2021

Roasbeef added this to the v0.14.0 milestone Jun 16, 2021

Roasbeef assigned wpaulino Jul 1, 2021

Roasbeef mentioned this issue Aug 13, 2021

keychain+lnwallet: when fetching priv keys or signing try to use cache #5629

Merged

Roasbeef closed this as completed in #5629 Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential private key derivation optimization #5125

Potential private key derivation optimization #5125

joostjager commented Mar 19, 2021

Crypt-iQ commented Mar 19, 2021

joostjager commented Mar 19, 2021 •

edited

Loading

Crypt-iQ commented Mar 19, 2021 •

edited

Loading

joostjager commented Mar 19, 2021

guggero commented Mar 22, 2021

joostjager commented Mar 22, 2021 •

edited

Loading

Crypt-iQ commented Mar 22, 2021

wpaulino commented Mar 22, 2021

joostjager commented Mar 22, 2021

wpaulino commented Mar 22, 2021

joostjager commented Mar 23, 2021

joostjager commented Apr 6, 2021

wpaulino commented Apr 6, 2021

Roasbeef commented Aug 14, 2021

Potential private key derivation optimization #5125

Potential private key derivation optimization #5125

Comments

joostjager commented Mar 19, 2021

Crypt-iQ commented Mar 19, 2021

joostjager commented Mar 19, 2021 • edited Loading

Crypt-iQ commented Mar 19, 2021 • edited Loading

joostjager commented Mar 19, 2021

guggero commented Mar 22, 2021

joostjager commented Mar 22, 2021 • edited Loading

Crypt-iQ commented Mar 22, 2021

wpaulino commented Mar 22, 2021

joostjager commented Mar 22, 2021

wpaulino commented Mar 22, 2021

joostjager commented Mar 23, 2021

joostjager commented Apr 6, 2021

wpaulino commented Apr 6, 2021

Roasbeef commented Aug 14, 2021

joostjager commented Mar 19, 2021 •

edited

Loading

Crypt-iQ commented Mar 19, 2021 •

edited

Loading

joostjager commented Mar 22, 2021 •

edited

Loading