Scope watch on "pods" to only pods associated with the local node #716

jacksontj · 2019-11-12T22:37:20Z

Prior to this patch the aws-cni plugin would create a watch for all
pods in the cluster to then client-side filter them to only the ones
assigned to the node. This is not only wasteful of resources (as we're
throwing away the majority of data once we get it) this causes some
significant performance/capacity issues on sizeable clusters. In some
cases we've seen the initial watch call return ~500MB! This is a
significant savings of space and resources (both client and server side)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Prior to this patch the aws-cni plugin would create a watch for *all* pods in the cluster to then client-side filter them to only the ones assigned to the node. This is not only wasteful of resources (as we're throwing away the majority of data once we get it) this causes some significant performance/capacity issues on sizeable clusters. In some cases we've seen the initial `watch` call return ~500MB! This is a significant savings of space and resources (both client and server side)

jaypipes

I'm actually having trouble understanding why this was ever like this... :/

Big +1 from me.

jacksontj · 2019-11-12T23:22:02Z

After reading through it again, the only place that uses this is cni-metrics-helper which uses the list of all CNIs to aggregate the metrics. If that is still necessary it seems that it should have its own separate watch for all pods in the same daemonset (instead of requiring watching all pods in the entire cluster).

mogren · 2019-11-12T23:39:41Z

@jacksontj You are absolutely correct, thanks a lot for this PR.

I was just about to comment saying that we should definitely do this change for workerPods in the CNI, and then handle cniPods differently in the cni-metrics-helper. There I guess we can just filter on the DaemonSet/aws-node (or ns kube-system?) instead. @jaypipes?

jacksontj · 2019-11-12T23:40:34Z

Glad to be of help :)

IMO you should be watching for a little as necessary. So for the cni-metrics-helper I think you want to filter for both kube-system and DaemonSet/aws-node to avoid pulling more data than necessary.

jacksontj · 2019-11-13T18:11:51Z

@mogren are you guys okay to merge this as-is and cleanup the cni-metrics-helper in another PR? Or do we need to address that in this PR? If it needs to be addressed here, any suggestions? (I don't use that helper, so not sure how to test it easily).

mogren · 2019-11-13T18:20:59Z

@jacksontj I'm fine with merging this first, then fixing the metrics helper. The current v1.5.5 build will work with the v1.6.x CNI as well, since we have not changed or added any prometheus metrics. I'll add a follow up issue to fix it.

jacksontj · 2019-11-13T18:21:42Z

Sounds good to me, thanks for the quick responses!

@mogren any plans to kick this out in a release soon? Checking in as this is a pretty large liability to close up :)

mogren · 2019-11-13T18:37:18Z

@jacksontj New releases take quite a while to verify. If you need this urgently, I would check out the release-1.5.5 branch, apply this fix, then build your own image.

To build:

git co release-1.5.5
<apply patch>

git tag v1.5.5-watcher-fix
export VERSION=v1.5.5-watcher-fix

make docker

docker tag amazon/amazon-k8s-cni:v1.5.5-watcher-fix <your repository>:v1.5.5-watcher-fix
docker push <your repository>:v1.5.5-watcher-fix

kubectl patch daemonset aws-node -n kube-system \
-p '{"spec": {"template": {"spec": {"containers": [{"image": " <your repository>:v1.5.5-watcher-fix","name":"aws-node"}]}}}}'

jacksontj · 2019-11-13T18:38:20Z

I have already created a build :) I was just checking in on release cadence there as I don't like running a fork, but I'll just keep an eye out for a new release.

For anyone else that needs this and can't wait I have a forked build of 1.5.3 on quay: quay.io/jacksontj/aws-cni:v1.5.3-2-g70bf43d7

mogren self-requested a review November 12, 2019 22:56

mogren added the enhancement label Nov 12, 2019

jaypipes approved these changes Nov 12, 2019

View reviewed changes

nckturner approved these changes Nov 13, 2019

View reviewed changes

mogren merged commit d73d118 into aws:master Nov 13, 2019

jacksontj deleted the discover_pods branch November 13, 2019 18:23

mogren mentioned this pull request Nov 13, 2019

Update cni-metrics-helper watcher #717

Closed

mogren mentioned this pull request Dec 20, 2019

Fix metrics watcher #768

Merged

jacksontj mentioned this pull request Jul 7, 2020

first commit wish/nodetaint#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope watch on "pods" to only pods associated with the local node #716

Scope watch on "pods" to only pods associated with the local node #716

jacksontj commented Nov 12, 2019

jaypipes left a comment

jacksontj commented Nov 12, 2019

mogren commented Nov 12, 2019

jacksontj commented Nov 12, 2019 •

edited

Loading

jacksontj commented Nov 13, 2019

mogren commented Nov 13, 2019

jacksontj commented Nov 13, 2019 •

edited

Loading

mogren commented Nov 13, 2019

jacksontj commented Nov 13, 2019 •

edited

Loading

Scope watch on "pods" to only pods associated with the local node #716

Scope watch on "pods" to only pods associated with the local node #716

Conversation

jacksontj commented Nov 12, 2019

jaypipes left a comment

Choose a reason for hiding this comment

jacksontj commented Nov 12, 2019

mogren commented Nov 12, 2019

jacksontj commented Nov 12, 2019 • edited Loading

jacksontj commented Nov 13, 2019

mogren commented Nov 13, 2019

jacksontj commented Nov 13, 2019 • edited Loading

mogren commented Nov 13, 2019

jacksontj commented Nov 13, 2019 • edited Loading

jacksontj commented Nov 12, 2019 •

edited

Loading

jacksontj commented Nov 13, 2019 •

edited

Loading

jacksontj commented Nov 13, 2019 •

edited

Loading