Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dynamodb List() by only getting keys #21159

Merged
merged 5 commits into from
Jun 7, 2024

Conversation

thomashargrove
Copy link
Contributor

List() in physical.Backend only returns keys, so there is no need to retrieve values from DynamoDB. ProjectionExpression tells the query what fields we want, but since Key and Path are reserved words in DynamoDB we need to alias them using ExpressionAttributeNames. This change reduces the amount of traffic from DynamoDB to Vault.

Tested on my local instance, and I verified in a debugger that values are no longer on the wire. I don't see any unit tests covering dynamodb.List(), and making a full mock of dynamodb would be quite a bit of work.

@hashicorp-cla
Copy link

hashicorp-cla commented Jun 12, 2023

CLA assistant check
All committers have signed the CLA.

Copy link

@Rahul-Manglani Rahul-Manglani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VioletHynes
Copy link
Contributor

Hi there @thomashargrove !

Sorry that this has lingered so long. I did some research and it looks like this approach makes sense. I'm not the biggest dynamodb expert but everything seems good and I'd love to get this in.

If you're still interested, we'd need a changelog associated with this PR. It would need to be named 21159.txt in the changelog directory, and something like storage/dynamodb: improved speed of list operation, or whatever you think is valuable to note (I'm not sure where hasChildren is used). You can look at other recent entries for inspiration.

It would also be great if you could show a before/after with the output of a command that uses this method, just for my own sanity. I'll also kick off CI just to make sure everything's green.

Thanks in advance!

@thomashargrove
Copy link
Contributor Author

Thanks for taking a look at this. I made two build of vault, on with my changes and one without. I ran one at a time, both pointing to the same backend DynamoDB table. I pre-populated 500 secrets in one path with random data in them to give them some size.

On the build with my change I did a directory listing twice:

$ time vault kv list -tls-skip-verify kv/listtest/
...
real	0m2.255s
user	0m0.053s
sys	0m0.032s
$ time vault kv list -tls-skip-verify kv/listtest/
...
real	0m0.327s
user	0m0.047s
sys	0m0.023s

And using nettop I captured the bytes_in from the vault server before and after each list call:

$ nettop -P -L 1 -p 91974
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:35:59.072582,vault.91974,,,503347,357841,1318,0,2136,,,,,,,,,,,,

$ nettop -P -L 1 -p 91974
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:36:03.095878,vault.91974,,,643276,406383,1318,0,4772,,,,,,,,,,,,

$ nettop -P -L 1 -p 91974
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:36:20.229671,vault.91974,,,645604,421238,1318,0,7408,,,,,,,,,,,,
 
$ nettop -P -L 1 -p 91974
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:36:23.338528,vault.91974,,,773622,427347,1318,0,7408,,,,,,,,,,,,

The average bytes_in was 133,974.

The same test against Vault without my change:

$ time vault kv list -tls-skip-verify kv/listtest
real	0m2.329s
user	0m0.050s
sys	0m0.030s

$ time vault kv list -tls-skip-verify kv/listtest
real	0m0.384s
user	0m0.051s
sys	0m0.026s

and

$ nettop -P -L 1 -p 91525
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:33:34.887110,vault.91525,,,828415,669808,1318,0,4835,,,,,,,,,,,,
$ nettop -P -L 1 -p 91525
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:33:51.521955,vault.91525,,,1090191,810069,1318,0,6482,,,,,,,,,,,,

$ nettop -P -L 1 -p 91525
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:34:32.048074,vault.91525,,,1067321,857685,1318,0,11580,,,,,,,,,,,,
 $ nettop -P -L 1 -p 91525
time,,interface,state,bytes_in,bytes_out,rx_dupe,rx_ooo,re-tx,rtt_avg,rcvsize,tx_win,tc_class,tc_mgt,cc_algo,P,C,R,W,arch,
10:34:35.509749,vault.91525,,,1288377,863724,1561,0,11580,,,,,,,,,,,,

Average bytes_in is 241,416. Overall almost a 50% reduction in bytes on the wire, but almost no change in latency. If I had a larger directory with bigger secrets I expect more speedup. (100KiB is pretty quick to move)

As for hasChildren() this is used for the recursive delete to clean up placeholder entries in DynamoDB. I verified it worked by creating a key with nested directories then removing it:

$ vault kv put -tls-skip-verify kv/deletetest/a/b/c/d k=v
======= Secret Path =======
kv/data/deletetest/a/b/c/d

======= Metadata =======
Key                Value
---                -----
created_time       2024-06-07T17:41:12.179886Z
custom_metadata    <nil>
deletion_time      n/a
destroyed          false
version            1

$ vault kv list -tls-skip-verify kv/deletetest/a/b/
Keys
----
c/

$ vault kv metadata delete -tls-skip-verify kv/deletetest/a/b/c/d
Success! Data deleted (if it existed) at: kv/metadata/deletetest/a/b/c/d

$ vault kv list -tls-skip-verify kv/deletetest/a/b/
No value found at kv/metadata/deletetest/a/b

Copy link
Contributor

@VioletHynes VioletHynes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great, and thanks for the super detailed response. This helps me feel confident about merging this in (which is of course important without tests).

Thank you for the contribution, and we really appreciate your patience and your work here. I'll get CI running and try and get this merged as soon as I can.

@VioletHynes VioletHynes merged commit 2756303 into hashicorp:main Jun 7, 2024
66 of 67 checks passed
@VioletHynes
Copy link
Contributor

Thanks again, and thanks for your patience in awaiting a review for this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants