Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential private key derivation optimization #5125

Closed
joostjager opened this issue Mar 19, 2021 · 14 comments · Fixed by #5629
Closed

Potential private key derivation optimization #5125

joostjager opened this issue Mar 19, 2021 · 14 comments · Fixed by #5629
Assignees
Labels
enhancement Improvements to existing features / behaviour optimization wallet The wallet (lnwallet) which LND uses
Milestone

Comments

@joostjager
Copy link
Contributor

I've been running load tests for receiving payments and generated flame graphs to get a feel for where time is spent. One thing that stood out is how expensive BtcWalletKeyRing.DerivPrivKey is. That method not only derives a key, but also accesses the database for update.

To find out if the derived keys are all different or the same, I patched lnd to add an extra marker-method to the call tree deriveRootPrivKey. Execution flows through that marker-method only when the root of a key family is requested.

Below is the resulting flame graph where deriveRootPrivKey is marked in purple.

flamereceive

The same root key is requested over and over again for signing the invoice and ECDH onion operations. This seems like a candidate for optimization, for example through caching.

Another observation is that invoice add and update are quite costly. It is unfortunate that bbolt only offers full database locks which cause concurrent payments to block each other.

@Crypt-iQ
Copy link
Collaborator

Nice flame graphs -- are you suggesting caching private keys?

@joostjager
Copy link
Contributor Author

joostjager commented Mar 19, 2021

Yes I am, possibly with a timer to clear them again if that really makes it more secure. Caching one key is probably enough: family 6, index 0.

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Mar 19, 2021

The private key is already in memory because we extract it which is already kind of bad, but I believe the plan is to have the option to remove all private key handling from lnd (via pubkey import). So I don't think caching private keys is in line with that approach. Plus any application on the host machine could attempt to flush the computer's cache and read these private keys (even unprivileged applications).

@wpaulino @guggero thoughts?

@joostjager
Copy link
Contributor Author

If all key handling will be handled by an HSM, it won't even be possible to cache I'd say. But I don't know how far off that is. I had the impression that pubkey import was intended just for funding and not for things like signing invoices and decrypting the onion payload, or am I mistaking?

Another option is perhaps to only cache the database part, but still rederive the key every time.

Plus any application on the host machine could attempt to flush the computer's cache and read these private keys (even unprivileged applications).

Not sure what you mean by 'flushing the computer's cache', but is this any different from reading the private key that is in memory already now?

@Roasbeef Roasbeef added enhancement Improvements to existing features / behaviour optimization wallet The wallet (lnwallet) which LND uses labels Mar 19, 2021
@guggero
Copy link
Collaborator

guggero commented Mar 22, 2021

We fetch the encrypted extended private key of the account in question from the DB every time. Maybe it would be enough to cache the encrypted version of it, if the DB part is indeed slow? Then only the decryption and derivation would happen every time. Not sure how much that reduces the call time though.

We made a big step towards the signing abstraction, see @wpaulino's PR: #5047
Now the only thing that's left is to put every signing call behind an RPC call. Then our HSM would be a secondary lnd that has all keys and would sign requests it gets over RPC. So that secondary lnd would still be able to implement said caching above.

@joostjager
Copy link
Contributor Author

joostjager commented Mar 22, 2021

We fetch the encrypted extended private key of the account in question from the DB every time.

Is that really the case? I looked a bit at the code (maybe not good enough) and it seems that ScopedKeyManager is cached across multiple DerivePrivKey calls as well as the accountInfo instances (containing private key material) that are kept inside ScopedKeyManager.

@Crypt-iQ
Copy link
Collaborator

Not sure what you mean by 'flushing the computer's cache', but is this any different from reading the private key that is in memory already now?

Processes can't just read other processes memory unless they have special permissions (gdb for example). So the way that unprivileged processes can read memory is via side-channel cache attacks (see: Flush + Reload). Ideally private keys are zero-ed out after use, since it may take a while for the GC to run or for that memory to be re-used in non-GC'd languages. From a theoretical standpoint, the less time private keys reside in memory, the better. That said, if the computer is already compromised, there are just-as-bad things a malicious program can do besides trying to extract a private key. If the key is going in-and-out of memory as seems to be the case now, there's not too much of a difference security-wise between what we do now and caching it.

@wpaulino
Copy link
Contributor

Another option is perhaps to only cache the database part, but still rederive the key every time.

Looking at the code path, this is already the case. We cache the account information (which includes the private/public keys among other things) and derive from there. The database transaction is only really there for when we haven't cached the account information yet. Maybe it's worth re-working that to see the performance gains without the database transaction?

@joostjager
Copy link
Contributor Author

I am running a benchmark with a lot of parallel threads. Because of bbolt's global transaction mechanism, I can imagine a lot of blocking there because of all kinds of other, unrelated db writes such as invoice updates and channel updates.

@wpaulino
Copy link
Contributor

Sounds like it could be worthwhile to try a version of DerivePrivKey that doesn't open a database transaction then.

Gossip messages shouldn't have an effect on the wallet's database.

@joostjager
Copy link
Contributor Author

Gossip messages shouldn't have an effect on the wallet's database.

With channel updates, I meant channel state updates. But that and the invoice updates are indeed happening in a different db, forgot about that. Concurrent calls to DerivePrivKey may be blocking each other still.

@joostjager
Copy link
Contributor Author

@wpaulino I can look at trying that alternative version, but why not return the private key from cache straight away if it is cached in memory already? It seems to me that there is no point in having that db transaction then.

Also something to be cognizant of is that every bbolt transaction requires two fsync calls. For example on google cloud, one such call takes about 2.5 ms, so this (redundant) key lookup blocks all other db txes in walletdb for 5 ms. Just this by itself already caps the max tps at 200.

@wpaulino
Copy link
Contributor

wpaulino commented Apr 6, 2021

@wpaulino I can look at trying that alternative version, but why not return the private key from cache straight away if it is cached in memory already? It seems to me that there is no point in having that db transaction then.

The only private keys cached are for the different accounts of each key scope, so you'd still have to derive from that. If the derivation process is too expensive, especially after done multiple times for the same key, then you could look into adding a cache at that level as well.

@Roasbeef
Copy link
Member

Went with the option of caching things internally in btcwallet here: #5629. This'll also speed up the multi-sig+HTLC signing operations, as no DB transaction will be requried for those as well.

Roasbeef added a commit to Roasbeef/lnd that referenced this issue Aug 26, 2021
In this commit, we start to optimistically use the new private key cache
that was added to btcwallet. As is, btcwallet will cache the decrypted
account keys for each scope in memory. However, the existing methods
to derive a child key from those account keys requires a write database
transaction, and will re-derive the private key using BIP-32 each time.

The newly added `DeriveFromKeyPathCache` lets us skip all this and
directly use a cache assuming the account info is already cached. The
new logic will try to use this method, but if it fails fall back to the
existing `DeriveFromKeyPath` method. All calls after this will use this
new cached key.

Fixes lightningnetwork#5125.

Basic benchmark before the btcwallet change and after:
```
benchmark                    old ns/op     new ns/op     delta
BenchmarkDerivePrivKey-8     22840583      125           -100.00%

benchmark                    old allocs     new allocs     delta
BenchmarkDerivePrivKey-8     89             2              -97.75%

benchmark                    old bytes     new bytes     delta
BenchmarkDerivePrivKey-8     10225         24            -99.77%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features / behaviour optimization wallet The wallet (lnwallet) which LND uses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants