Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear cache: Add new UpdateResources method to update multiple resources at once without doing a stow update and limit unneeded allocations when processing delta watches #546

Merged
merged 3 commits into from
Mar 16, 2022

Conversation

valerian-roche
Copy link
Contributor

In use cases where linear cache is used with eds and delta watches, the user can currently only update the cache in two ways:

  • full stow set (SetResources): this requires the user to build the entire list to then pass it to the cache, which then hashes everything to figure out what changed. This is very resource intensive (in our case ~1s of CPU). After that it will only check once for delta watches and return all changed endpoints at once
  • provide detailed updates one by one (through UpdateResource and DeleteResource): this does not require the user to keep a full view of the data, and does not consume much cpu in hashing (only hashing the resources actually updated), but it will check all delta watches for each call, and will forward endpoints changed a key at a time (more traffic with clients)

In our case the main issue switching to the second part is the cpu usage (and time spent) on updates when updating multiple resources (though with still num_updates << total_keys, e.g. 50 vs 5000) while multiple delta watches exist (~20)
We then end up spending more than 120ms of CPU (for ~50 updates/deletes) with profiles like (this is aggregated for updates between 0 and 50):
image

In order to improve this, two changes are in this PR:

  • Do not pre-allocate the slice of filtered resources in most cases
    • When the delta watch is non-wildcard, this is irrelevant to pre-allocate for the total list of resources, and most checks will return no change at all
    • When the delta watch is wildcard, pre-allocate when there is no version passed, but not in other cases when most resources have not been updated
  • Provide a method to update/delete multiple resources at once. This allows only one iteration across watches, improving resource usage and limiting traffic between the control-plane and the clients

After the fix, runtime of the update (only includes iterating through calls to UpdateResource/DeleteResource before and on call to UpdateResources after) is greatly improved and stable (blue line below)
image

@alecholmez
Copy link
Contributor

alecholmez commented Mar 16, 2022

Thanks for this PR! I'll review this first thing tomorrow. I just approved the run, can you go ahead and fix up the lint checks?

@valerian-roche
Copy link
Contributor Author

Thanks for this PR! I'll review this first thing tomorrow. I just approved the run, can you go ahead and fix up the lint checks?

Just pushed the fix for the lint (at least working locally)

…urces at once without doing a stow update and limit unneeded allocations when processing delta watches

Signed-off-by: Valerian Roche <[email protected]>
Copy link
Contributor

@alecholmez alecholmez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nits/questions. Overall looks great thanks for the enhancements!

pkg/cache/v3/linear.go Show resolved Hide resolved
pkg/cache/v3/delta.go Outdated Show resolved Hide resolved
pkg/cache/v3/delta.go Outdated Show resolved Hide resolved
…ses as the version of deleted objects is not kept anyway

Signed-off-by: Valerian Roche <[email protected]>
Copy link
Contributor

@alecholmez alecholmez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the PR! Once Github fixes actions we'll let the build run then get this merged if all is well.

@valerian-roche
Copy link
Contributor Author

Would an owner be able to trigger the checks to confirm it is all good?

@alecholmez
Copy link
Contributor

Would an owner be able to trigger the checks to confirm it is all good?

I am an owner :) unfortunately the APIs are actually down: https://www.githubstatus.com/ Github seems to be having an outage. I'm watching that page to see when they get resolved. Sorry for the delay!

@valerian-roche
Copy link
Contributor Author

My bad, didn't see that it was down, sorry about that!

@alecholmez alecholmez self-assigned this Mar 16, 2022
@alecholmez
Copy link
Contributor

@valerian-roche can you push a dummy commit? I think CI needs a trigger. Some instructions here to do so: https://github.com/envoyproxy/go-control-plane/blob/main/CONTRIBUTING.md#triggering-ci-re-run-without-making-changes

Signed-off-by: Valerian Roche <[email protected]>
@alecholmez alecholmez merged commit 91e32d4 into envoyproxy:main Mar 16, 2022
@alecholmez
Copy link
Contributor

@valerian-roche just merged! Thanks again nice work

@valerian-roche
Copy link
Contributor Author

Thanks for the fast review and merge, greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants