Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrashLoopBackOff due to EC2MetadataError #573

Closed
janavenkat opened this issue Dec 14, 2022 · 7 comments
Closed

CrashLoopBackOff due to EC2MetadataError #573

janavenkat opened this issue Dec 14, 2022 · 7 comments

Comments

@janavenkat
Copy link

Related issue #455

For security reasons I changed the EC2 instance metadata hop limit from 2 to 1, this causing ingress controller crashes because of getting access denied from aws instance metadata endpoint.

By reading the repo document

This ingress controller uses the EC2 instance metadata of the worker node where it's currently running to find the additional details about the cluster provisioned by Kubernetes on top of AWS. This information is used to manage AWS resources for each ingress objects of the cluster.

Am using EKS cluster and ingress controller setup with IAM role for service account. Is there any way to disable the ingress controller not to request the EC2 instance metadata?

@szuecs
Copy link
Member

szuecs commented Dec 19, 2022

@janavenkat what is the security impact by having the hop limit to 2?
For me this sounds not really relevant and likely won't fix.

@janavenkat
Copy link
Author

@szuecs thank you for the response.

  1. https://youtu.be/_VcmdlV6xaY?t=875 this is the recommendation from AWS to set hop limit to 1

  2. controller needs to connect instance metadata? if I understand the docs correctly

This ingress controller uses the EC2 instance metadata of the worker node where it's currently running to find the additional details about the cluster provisioned by Kubernetes on top of AWS. This information is used to manage AWS resources for each ingress objects of the cluster.

I didn't provisioned cluster by using Kubernetes on top of AWS.

@jbilliau-rcd
Copy link
Contributor

We having the exact same issue, but on only one cluster out of 80+, not sure why. Debug logs:

time="2022-12-30T17:23:58Z" level=info msg="starting /kube-ingress-aws-controller v0.14.0"
--
Fri, Dec 30 2022 12:23:58 pm | time="2022-12-30T17:23:58Z" level=debug msg=aws.NewAdapter
Fri, Dec 30 2022 12:23:58 pm | time="2022-12-30T17:23:58Z" level=debug msg=aws.ec2metadata.GetMetadata
Fri, Dec 30 2022 12:23:58 pm | 2022/12/30 17:23:58 DEBUG: Request ec2metadata/GetToken Details:
Fri, Dec 30 2022 12:23:58 pm | ---[ REQUEST POST-SIGN ]-----------------------------
Fri, Dec 30 2022 12:23:58 pm | PUT /latest/api/token HTTP/1.1
Fri, Dec 30 2022 12:23:58 pm | Host: 169.254.169.254
Fri, Dec 30 2022 12:23:58 pm | User-Agent: aws-sdk-go/1.44.102 (go1.19.3; linux; amd64)
Fri, Dec 30 2022 12:23:58 pm | Content-Length: 0
Fri, Dec 30 2022 12:23:58 pm | X-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600
Fri, Dec 30 2022 12:23:58 pm | Accept-Encoding: gzip
Fri, Dec 30 2022 12:23:58 pm |  
Fri, Dec 30 2022 12:23:58 pm |  
Fri, Dec 30 2022 12:23:58 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed
Fri, Dec 30 2022 12:26:02 pm | caused by: Put "http://169.254.169.254/latest/api/token": read tcp 10.150.25.8:42566->169.254.169.254:80: read: connection reset by peer
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Request ec2metadata/GetMetadata Details:
Fri, Dec 30 2022 12:26:02 pm | ---[ REQUEST POST-SIGN ]-----------------------------
Fri, Dec 30 2022 12:26:02 pm | GET /latest/meta-data/instance-id HTTP/1.1
Fri, Dec 30 2022 12:26:02 pm | Host: 169.254.169.254
Fri, Dec 30 2022 12:26:02 pm | User-Agent: aws-sdk-go/1.44.102 (go1.19.3; linux; amd64)
Fri, Dec 30 2022 12:26:02 pm | Accept-Encoding: gzip
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Response ec2metadata/GetMetadata Details:
Fri, Dec 30 2022 12:26:02 pm | ---[ RESPONSE ]--------------------------------------
Fri, Dec 30 2022 12:26:02 pm | HTTP/1.1 401 Unauthorized
Fri, Dec 30 2022 12:26:02 pm | Connection: close
Fri, Dec 30 2022 12:26:02 pm | Content-Type: text/plain
Fri, Dec 30 2022 12:26:02 pm | Date: Fri, 30 Dec 2022 17:26:02 GMT
Fri, Dec 30 2022 12:26:02 pm | Server: EC2ws
Fri, Dec 30 2022 12:26:02 pm | Content-Length: 0
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Validate Response ec2metadata/GetMetadata failed, attempt 0/3, error EC2MetadataError: failed to make EC2Metadata request
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | status code: 401, request id:
Fri, Dec 30 2022 12:26:02 pm | time="2022-12-30T17:26:02Z" level=fatal msg="EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id: "

We using an explicit IAM role though, so not sure why it needs to connect to ec2 instance metadata....doesnt it only need to do that when using the worker node IAM role, in cases where you ARENT using an explicit, controller-only role via OIDC?

@szuecs
Copy link
Member

szuecs commented Jan 5, 2023

@jbilliau-rcd we use the metadata to auto-detect the vpcId and clusterId, call stack:

https://github.com/zalando-incubator/kube-ingress-aws-controller/blob/master/aws/adapter.go#L812

adapter.manifest, err = buildManifest(adapter, clusterID, vpcID)

awsAdapter, err = aws.NewAdapter(clusterID, controllerID, vpcID, debugFlag, disableInstrumentedHttpClient)

if err = loadSettings(); err != nil {

func loadSettings() error {

you can pass these flags to omit auto detection:

kingpin.Flag("cluster-id", "ID of the Kubernetes cluster used to lookup cluster related resources tagged with `kubernetes.io/cluster/<cluster-id>` tags. Auto discovered from the EC2 instance where the controller is running if not specified.").
StringVar(&clusterID)
kingpin.Flag("vpc-id", "VPC ID for where the cluster is running. Used to lookup relevant subnets. Auto discovered from the EC2 instance where the controller is running if not specified.").

@szuecs
Copy link
Member

szuecs commented May 7, 2024

@jbilliau-rcd did this happen again to you?

You showed a connection reset by peer, which likely means some AWS internal issue happened at the time.

Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed
Fri, Dec 30 2022 12:26:02 pm | caused by: Put "http://169.254.169.254/latest/api/token": read tcp 10.150.25.8:42566->169.254.169.254:80: read: connection reset by peer

@jbilliau-rcd
Copy link
Contributor

Hmmm nope, mustve been transient, we run this controller on 170 clusters and all seem healthy.

@szuecs
Copy link
Member

szuecs commented May 7, 2024

Most likely we can not do anything here

@szuecs szuecs closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants