Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependencyViolation: resource sg has a dependent object #194

Closed
metral opened this issue Jul 10, 2019 · 19 comments · Fixed by #214
Closed

DependencyViolation: resource sg has a dependent object #194

metral opened this issue Jul 10, 2019 · 19 comments · Fixed by #214
Assignees
Labels
kind/bug Some behavior is incorrect or out of spec
Milestone

Comments

@metral
Copy link
Contributor

metral commented Jul 10, 2019

We're hitting an issue on EKS cluster tear downs: the security group fails (http 400) on a deletion request due to a lingering ENI created by AWS for the worker instance that does not get deleted first.

A manual work around is going into the console and deleting the ENI first, and then the sg can be deleted as expected.

The only diff between this setup and a vanilla EKS cluster is that there is a workload deployed to k8s with a classic ELB that gets all torn down together on pulumi destroy, but the ENI shows it belongs to the instance, so doubt its related to the LB.

Semi-related / same err message: hashicorp/terraform-provider-aws#1671 (comment).

[ /aws-ts-eks-migrate-nodegroups ] Diagnostics:
[ /aws-ts-eks-migrate-nodegroups ]   aws:ec2:SecurityGroup (migrate-nodegroups-nodeSecurityGroup):
[ /aws-ts-eks-migrate-nodegroups ]     error: Plan apply failed: deleting urn:pulumi:p-it-argon-aws-ts-eks-d389e1a1::migrate-nodegroups::eks:index:Cluster$aws:ec2/securityGroup:SecurityGroup::migrate-nodegroups-nodeSecurityGroup: DependencyViolation: resource sg-07d3e15caa6895678 has a dependent object
[ /aws-ts-eks-migrate-nodegroups ]      status code: 400, request id: 5c77f9c1-6b9c-4691-9556-1776c831d86
@metral metral added the bug label Jul 10, 2019
@lukehoban
Copy link
Member

a lingering ENI created by AWS

Do you have details on what this ENI is?

The only diff between this setup and a vanilla EKS cluster is that there is a workload deployed to k8s with a classic ELB that gets all torn down together on pulumi destroy, but the ENI shows it belongs to the instance, so doubt its related to the LB.

This feels suggestive that the ENI is created by Kubernetes. Are you sure this is not something that the Kubernetes load balancer support is doing?

@lukehoban
Copy link
Member

lukehoban commented Jul 10, 2019

Ahh - so CNI allocates ENIs on instances dynamically as pods are scheduled. Likely related to that?

https://github.com/aws/amazon-vpc-cni-k8s#eni-allocation

@metral
Copy link
Contributor Author

metral commented Jul 10, 2019

Ahh - so CNI allocates ENIs on instances dynamically as pods are scheduled. Likely related to that?

Correct. Pod IPs are attached via the ENI so pods sticking around during deletion could be the root source of this. We've seen other instantiations of this problem in general, where pulumi does not know about resources stood up and managed by k8s and/or EKS creating similar scenarios.

@metral
Copy link
Contributor Author

metral commented Jul 11, 2019

Note, this DependencyViolation does not only occur for security groups. I hit the same error on another run, but this time the VPC was blocked from deletion. Oddly enough, rerunning a deletion again in pulumi resulted in the error showing a 2nd time, but when attempting the deletion in the AWS console for the VPC, it succeeded immediately.

Further deep dives points leaves me to conclude that this is a bug of sorts in AWS/EKS, k8s, and/or the aws-k8s-cni plugin.

@metral
Copy link
Contributor Author

metral commented Jul 19, 2019

We're seeing this DependencyViolation issue other places:

@metral
Copy link
Contributor Author

metral commented Jul 19, 2019

Updates:

  • It's possible, but not definitive that aws-cni is a source for this issue: Leaking Network Interfaces (ENI) aws/amazon-vpc-cni-k8s#69 (comment)
  • In eksctl they closed out delete VPC resources eksctl-io/eksctl#103 with PR Delete LoadBalancer Services before the cluster eksctl-io/eksctl#1010 - that goes in and deletes Service LB's, as they believe that is the source of their issue.
    • pulumi/examples/aws-ts-eks-hello-world and pulumi/eks/tests/migrate-nodegroups both have LB's in their stack deployed through k8s Service's, but the Service does get deleted on a pulumi destroy before a security group or VPC does - maybe this because there isn't enough time between deletions of the LB, and the rest of the used VPC networking stack? See the LB-service get deleted before the secgroup is attempted to be deleted before erroring.
    • That said, we primarily see leaked ENIs that were once attached to an instance of an ASG that was used for these Service LB's pods - no matter if we drained & deleted - so this could be LB related in k8s/AWS/EKS?

@lukehoban thoughts?

@metral
Copy link
Contributor Author

metral commented Jul 19, 2019

@lukehoban
Copy link
Member

From the two most recent comments, I can’t quite tell - is the assumption this is a CNI related issue, or a Kubernetes LB issue? There seem to be references to both - but they seem like they would be different things with different fixes.

@metral
Copy link
Contributor Author

metral commented Jul 19, 2019

TBH I'm not sure. Both cases seem valid and possibly related, but I can't definitively say.

Though leaked ENI's have always been the common result of the failure to delete the secgroup that causes the DependencyViolation.

I mention the LB issues because it may have to do with this issue, and is the common denominator between a failed CI run in pulumi/examples#348 (pulumi/examples with no nodegroup migration, draining, or deletion), and in similar test runs we saw for eks #195 (pulumi/eks with nodegroup migration, draining and deletion). But it could also be an orthogonal issue that has nothing to do with this.

@metral
Copy link
Contributor Author

metral commented Jul 22, 2019

Output of a pulumi/eks CI run from today that hit this issue (Pulumi test debugging is enabled): https://gist.github.com/metral/aa5ffa6dd07be202a0dd99eece2b71f2.

Again, the cause of 400 on the secgroup deletion req is because there is a leaked ENI - it is available, but not deleted in the secgroup, which holds up the secgroup deletion.

Having seen this exact issue a couple of times now, aws-cni#69 is seemingly the potential source.

Here's the ENI attached the secgroup from the CI output linked:
image

@lukehoban
Copy link
Member

I wonder if for tests we should try setting WARM_ENI_TARGET=0 on the CNI plug-in to prevent it from adding extra ENIs to its warm pool? It feels like that might help avoid this.

@metral
Copy link
Contributor Author

metral commented Jul 31, 2019

@lukehoban I set WARM_ENI_TARGET=0 on the cluster: vpcCniOptions: { warmEniTarget: 0 } for the migrate-nodegroups test and ran 10 tests. All 10 tests failed due to Pods not being ready cause of timeouts, and one run still hit the OP DependencyViolation issue - see below.

It does not seem that dropping the WARM_ENI_TARGET=0 from its default of 1 is recommended, nor is it a path forward to fix this issue.

[ mples/tests/migrate-nodegroups ]   kubernetes:apps:Deployment (echoserver):
[ mples/tests/migrate-nodegroups ]     error: Plan apply failed: 1 error occurred:
[ mples/tests/migrate-nodegroups ]      * resource apps-4jmmj0kn/echoserver-60vmf11f was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred for 'echoserver-60vmf11f'
[ mples/tests/migrate-nodegroups ]
[ mples/tests/migrate-nodegroups ]   kubernetes:apps:Deployment (nginx-ing-cntlr):
[ mples/tests/migrate-nodegroups ]     error: Plan apply failed: 1 error occurred:
[ mples/tests/migrate-nodegroups ]      * resource apps-4jmmj0kn/nginx-ing-cntlr-ek5rz1dn was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred for 'nginx-ing-cntlr-ek5rz1dn'
[ mples/tests/migrate-nodegroups ]
[ mples/tests/migrate-nodegroups ]   kubernetes:core:Service (echoserver):
[ mples/tests/migrate-nodegroups ]     error: Plan apply failed: 1 error occurred:
[ mples/tests/migrate-nodegroups ]      * resource apps-4jmmj0kn/echoserver-ynz5h0ky was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred for 'echoserver-ynz5h0ky'
[ mples/tests/migrate-nodegroups ]
[ mples/tests/migrate-nodegroups ]   kubernetes:core:Service (nginx-ing-cntlr):
[ mples/tests/migrate-nodegroups ]     error: Plan apply failed: 1 error occurred:
[ mples/tests/migrate-nodegroups ]      * resource apps-4jmmj0kn/nginx-ing-cntlr was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: Timeout occurred for 'nginx-ing-cntlr'

@lukehoban
Copy link
Member

That's unfortunate...

Looking into aws/amazon-vpc-cni-k8s#69 a bit more, as well as the other symptoms here and IPAMD implementation, it does appear that this must be the case we are hitting:

Give that we need to create the ENI, attach it to then instance, then set the termination policy the current design will always have a gap where the plugin is either creation (and has not set termination policy) or deleting (removed termination policy, and possibly detached) when the process/instance is killed causing the resource to be leaked.

There has been some talk of creating a centralized container to allocate ENIs and IPs and assign them outside of the local daemon. This would allow for clean up after termination. There is some more design that needs to happen but it is something I think is worth the investment. I don't have a timeline for it at this point though.

In particular, if we kill the instance at any point in this code it looks like it will leak the ENI.

My best guess for a workaround would be to wait for CNI warm pool population to acquiesce before killing the instances/workloads. Can we just delay 5 minutes in ExtraRuntimeValidation in tests perhaps?

We might also be able to get IPAMD logs to understand more about what ENIs it is trying to allocate when?

@metral
Copy link
Contributor Author

metral commented Jul 31, 2019

In particular, if we kill the instance at any point in this code it looks like it will leak the ENI.

My best guess for a workaround would be to wait for CNI warm pool population to acquiesce before killing the instances/workloads. Can we just delay 5 minutes in ExtraRuntimeValidation in tests perhaps?

Yes, I'll give this a shot.

For context, in CloudTrail we're seeing the error coming from aws-cni's use of aws-sdk-go as it attempts to DeleteNetworkInterface:

...
    "userAgent": "aws-sdk-go/1.19.21 (go1.12.5; linux; amd64)",
    "errorCode": "Client.InvalidParameterValue",
    "errorMessage": "Network interface 'eni-05b60027f352a12b2' is currently in use.",
    "requestParameters": {
        "networkInterfaceId": "eni-05b60027f352a12b2"
    },
...

We might also be able to get IPAMD logs to understand more about what ENIs it is trying to allocate when?

I've tried tailing the aws-cni logs as well as kube-proxy on cluster tear down's and nothing meaningful is logged before the log stream gets clipped due to the tear down.

I've considered also adding fluentd to the setup to funnel container logs to CloudWatch, but there is always about a 1 minute delay to flush the logs to CW, and this would also expand the footprint of the test. That said, I'll look into seeing if this isn't too much trouble, and if it can help gather more insight as to what's taking place.

@mogren
Copy link

mogren commented Aug 2, 2019

Hey, sorry about the WARM_ENI_TARGET=0 issue. The way the CNI works, setting ENIs to 0 and also having WARM_IP_TARGET=0 will make it not allocate any IPs to the ENIs at all. Since pods only gets IPs from the internal "data store" that keeps track of attached IPs, all pods will just be stuck.

I can think of two ways to fix this issue. When WARM_ENI_TARGET=0:

  • Give a warning and set it to 1. (Setting WARM_IP_TARGET still takes precedence.)
  • If WARM_IP_TARGET is also 0, allocate all IPs to the first attached ENI. Once all IPs on that one are used up, attach another ENI and allocate all IPs on it.

@metral
Copy link
Contributor Author

metral commented Aug 6, 2019

Thanks for the reply @mogren and follow-up - much appreciated!

  • Give a warning and set it to 1. (Setting WARM_IP_TARGET still takes precedence.)
  • If WARM_IP_TARGET is also 0, allocate all IPs to the first attached ENI. Once all IPs on that one are used up, attach another ENI and allocate all IPs on it.

This makes sense. IIUC though, I don't believe these settings deal with the leaked ENI issue that's causing the OP.

Were there any follow up issues to track these items?

@metral metral added this to the 0.26 milestone Aug 7, 2019
@mogren
Copy link

mogren commented Aug 13, 2019

@metral Opened a PR to handle the WARM_ENI_TARGET=0 case

Also released v1.5.3 with some more startup fixes. (aws/amazon-vpc-cni-k8s@65873cf might be of interest to you, solves aws/amazon-vpc-cni-k8s#537)

@metral
Copy link
Contributor Author

metral commented Aug 13, 2019

Opened a PR to handle the WARM_ENI_TARGET=0 case

Thank you @mogren!

Also released v1.5.3 with some more startup fixes. (aws/amazon-vpc-cni-k8s@65873cf might be of interest to you, solves aws/amazon-vpc-cni-k8s#537)

I appreciate the update! aws/amazon-vpc-cni-k8s@65873cf resolved our leaked ENI issue in v1.5.2. I'll make sure to check out the new fixes in v1.5.3 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants