CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 #455

SarumObjects · 2021-11-30T13:05:02Z

I've followed the guidelines here: https://github.com/zalando-incubator/kube-ingress-aws-controller/blob/master/deploy/kops.md but kube-ingress-aws-controller restarts every 2-10 minutes.
When I follow the log of the pod I get this error "EC2MetadataError: failed to make EC2Metadata request"
I have rebuilt the cluster and deleted it several times and cannot create the load balancer or target groups - although I have in the past. One of our clusters is still running so I have compared it in detail - and have got no differences except in the name.

We are blocked. This is our development environment. The instances have public & private IPs and the VPCs & SGs have been generated correctly.

Where should I look now please?
John

AlexanderYastrebov · 2021-12-02T11:31:35Z

Hello. What is the controller version you are using? Could you provide a more detailed error log message?

szuecs · 2021-12-02T15:20:38Z

@SarumObjects What kind of controller version and AWS integration do you use?
Kube2iam and all others had issues with jtblin/kube2iam#130.
The error message looks like aws/aws-sdk-go#870 and this is quite old and should be fixed by recent kubernetes AWS iam integrations.

SarumObjects · 2021-12-02T17:09:07Z

@szuecs v0.12 (I downloaded :latest) and created the cluster with Kops (1.22.22). I've built several similar clusters in the last 24 months (we're running one as prod) and I have burned and built a QA cluster (same script) some 4 times. The cluster validates successfully but when I install kube-ingress-aws-controller/skipper (same manifest as our prod cluster - different name) I get this error: "EC2MetadataError: failed to make EC2Metadata request"
@AlexanderYastrebov : This is the total log! I don't know how to debug this controller. I've searched the documentation for 'debug and 'verbose' - and I am stuck for over a week.

szuecs · 2021-12-02T17:39:49Z

@SarumObjects I think just paste the logs here until the crash would be great!

Latest version meaning v0.12.12?
We just merged updates to aws-sdk, maybe you want to try v0.12.14, when it's released in some minutes (automated process).

Also interesting would be if you would past the output of kubectl describe pods kube-ingress-aws-controller-....

We don't have really knowledge about Kops. Is the version you are referring to the same as the Kubernetes version?

SarumObjects · 2021-12-03T10:27:24Z

@szuecs the 'latest' still restarts.
here's the output from kubectl describe pods kube-ingress-aws-controller-..
kiac-describe.txt

kops is version 1.22.2
kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:41:42Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

AlexanderYastrebov · 2021-12-03T11:05:54Z

@SarumObjects Could you get ingress controller logs as well (kubectl logs kube-ingress-aws-controller-...)?

SarumObjects · 2021-12-03T11:56:13Z

@AlexanderYastrebov this the command and the complete log:
kubectl -n kube-system logs -f kube-ingress-aws-controller-5fbcd9fff8-vqrvg
time="2021-12-03T11:52:45Z" level=info msg="starting /kube-ingress-aws-controller v0.12.14"
time="2021-12-03T11:54:48Z" level=fatal msg="EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id:

AlexanderYastrebov · 2021-12-03T12:59:29Z

Could you try to run with --debug option (it would print more details in the logs)?
401 suggests some kind of problem with AWS credentials.

SarumObjects · 2021-12-03T13:31:09Z

there's no --debug at the command line.

AlexanderYastrebov · 2021-12-03T13:34:20Z

there's no --debug at the command line.

~$ docker run -it --rm registry.opensource.zalan.do/teapot/kube-ingress-aws-controller:latest --help
INFO[0000] starting /kube-ingress-aws-controller v0.12.14 
usage: kube-ingress-aws-controller [<flags>]

Flags:
  --help                         Show context-sensitive help (also try --help-long and --help-man).
  --version                      Print version and exit
  --debug                        Enables debug logging level
...

kube-ingress-aws-controller/controller.go

Line 191 in 5c66137

    
           kingpin.Flag("debug", "Enables debug logging level").Default("false").BoolVar(&debugFlag)

SarumObjects · 2021-12-03T14:55:45Z

kubectl -n kube-system logs -f pod/kube-ingress-aws-controller-65775b947-dx9tl --ignore-errors=false
time="2021-12-03T14:50:47Z" level=debug msg=aws.NewAdapter
time="2021-12-03T14:50:47Z" level=debug msg=aws.ec2metadata.GetMetadata
2021/12/03 14:50:47 DEBUG: Request ec2metadata/GetToken Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /latest/api/token HTTP/1.1
Host: 169.254.169.254
User-Agent: aws-sdk-go/1.42.16 (go1.17.1; linux; amd64)
Content-Length: 0
X-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600
Accept-Encoding: gzip

time="2021-12-03T14:50:47Z" level=info msg="starting /kube-ingress-aws-controller v0.12.14"
2021/12/03 14:52:50 DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": read tcp 100.96.4.21:34662->169.254.169.254:80: read: connection reset by peer
2021/12/03 14:52:50 DEBUG: Request ec2metadata/GetMetadata Details:
---[ REQUEST POST-SIGN ]-----------------------------
GET /latest/meta-data/instance-id HTTP/1.1
Host: 169.254.169.254
User-Agent: aws-sdk-go/1.42.16 (go1.17.1; linux; amd64)
Accept-Encoding: gzip

2021/12/03 14:52:50 DEBUG: Response ec2metadata/GetMetadata Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 401 Unauthorized
Connection: close
Content-Type: text/plain
Date: Fri, 03 Dec 2021 14:52:50 GMT
Server: EC2ws
Content-Length: 0

2021/12/03 14:52:50 DEBUG: Validate Response ec2metadata/GetMetadata failed, attempt 0/3, error EC2MetadataError: failed to make EC2Metadata request

status code: 401, request id:

time="2021-12-03T14:52:50Z" level=fatal msg="EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id: "

szuecs · 2021-12-03T15:33:40Z

This log here:

 caused by: Put "http://169.254.169.254/latest/api/token": read tcp 100.96.4.21:34662->169.254.169.254:80: read: connection reset by peer

169.254.169.254 is the metadata service by AWS. It sent a TCP RST packet, instead of sending us the data required to access AWS APIs.

What Kubernetes IAM integration do you use?
For me this looks like not to be an issue by the controller, but either AWS or the IAM integration that should support in getting the IAM done.
Maybe also your EC2 nodes have not the right permissions to access metadata service to access AWS APIs with sts::assumeRole, which is required for all Kubernetes AWS IAM integrations.

SarumObjects · 2021-12-03T17:06:01Z

Thats helpful. I'll look into the IAM permissions.

szuecs · 2021-12-07T09:02:39Z

@SarumObjects let us know what the error was to share with other folks that might find this issue. After that we can close it.

SarumObjects · 2021-12-07T10:23:12Z

still investigating: https://kops.sigs.k8s.io/releases/1.22-notes/

SarumObjects · 2021-12-13T17:33:43Z

In the end, I simply had to change the Nodes.instanceMetadata form httpPutResponseHopLimit: 1 to httpPutResponseHopLimit: 3 and then the metadata query can run - but I'm blocked again (failed to get ingress list).
Closing this one with thanks.

jbilliau-rcd · 2022-12-30T17:26:19Z

Im having this exact same issue, out of nowhere, on ONE out of 80 clusters.....makes no sense. Where exactly did you change that setting @SarumObjects ? did you get it to work?

SarumObjects · 2022-12-31T16:08:19Z

@jbilliau-rcd I had to "kops edit cluster" the changes (httpPutResponseHopLimit: 3) rather than update them with a script (I have only 4 clusters of 3 nodes each). They continue to work but if I upgrade the clusters I now have terminate the nodes - which I do with a script giving time for the replacement nodes to start.
It's a very odd behaviour - but I've not got enough time to explore it. (If it ain't broke, don't fix it)

szuecs · 2023-01-03T16:49:58Z

@SarumObjects @jbilliau-rcd can you create a docs PR for kops update to highlight the version k8s update can trigger this?

Our current cluster setup is Kubernetes 1.21 and not kops, so I can not test on our side if it's kops related or not. We are migrating from crd v1beta1 and ingress v1beta1 since >1/2 year and soon we will update to 1.22.

jbilliau-rcd · 2023-01-03T17:14:06Z

@szuecs apologies, I dont quite understand what you are asking. You want me to put in a PR to update docs for what exactly? That this can happen if you go to 1.22? Do we know that for sure? I have plenty of clusters running EKS 1.22 just fine with 0.14.0 of this controller, with the following argument set in the pod spec - --ingress-api-version=networking.k8s.io/v1.

So we are already on 1.22, already using the new v1 ingress API, and it works on all clusters except one. Mind you...that one isnt even on 1.22! Its on 1.21, so I don't think this has anything to do with 1.22, looks more oidc/iam related.

szuecs · 2023-01-03T17:45:38Z

@jbilliau-rcd oh interesting so we need to investigate more. Right now we have to rely on you contributors.

jbilliau-rcd · 2023-01-04T00:37:43Z

So I ended up running this command:

aws ec2 modify-instance-metadata-options \
    --instance-id i-1234567898abcdef0 \
    --http-put-response-hop-limit 3 \
    --http-endpoint enabled

With the instance-id being the ec2 node that the Zalando pod was running on, and that fixed it! How this (so far) has only happened on one node is still puzzling to me, but that is the issue. It seems like the fix would need to be that the pod should never contact (or at least have the configuration option to never contact) the ec2 instance metadata service, instead only ever using OIDC to use it's own IAM role and not the role of the worker node. We give our Zalando ingress it's own role so the fact that it broke due to not being able to call the worker nodes metadata URL itself (presumably to use it's own if it needed to) kinda sucked :(

mikkeloscar · 2023-01-04T10:04:03Z

@jbilliau-rcd With this it should be possible to run the controller without needing to contact the ec2 instance metadata service: #376

jbilliau-rcd · 2023-01-18T22:09:18Z

Ah interesting....looks like that was merged 2 years ago!? Has this hidden argument always been available? Don't see it in any documentation anywhere.

mikkeloscar · 2023-01-20T10:16:36Z

Yeah, we should get this documented so it's more clear.

AlexanderYastrebov changed the title ~~CrashLoopBackOff~~ CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 Dec 3, 2021

SarumObjects closed this as completed Dec 13, 2021

janavenkat mentioned this issue Dec 14, 2022

CrashLoopBackOff due to EC2MetadataError #573

Closed

szuecs reopened this Jan 3, 2023

mikkeloscar mentioned this issue Jan 24, 2023

Document how to run outside of EC2 #580

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 #455

CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 #455

SarumObjects commented Nov 30, 2021

AlexanderYastrebov commented Dec 2, 2021

szuecs commented Dec 2, 2021

SarumObjects commented Dec 2, 2021

szuecs commented Dec 2, 2021 •

edited

Loading

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021 •

edited

Loading

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

szuecs commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

szuecs commented Dec 7, 2021

SarumObjects commented Dec 7, 2021

SarumObjects commented Dec 13, 2021

jbilliau-rcd commented Dec 30, 2022

SarumObjects commented Dec 31, 2022

szuecs commented Jan 3, 2023

jbilliau-rcd commented Jan 3, 2023

szuecs commented Jan 3, 2023 •

edited

Loading

jbilliau-rcd commented Jan 4, 2023

mikkeloscar commented Jan 4, 2023

jbilliau-rcd commented Jan 18, 2023

mikkeloscar commented Jan 20, 2023

CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 #455

CrashLoopBackOff due to EC2MetadataError: failed to make EC2Metadata request, status code: 401 #455

Comments

SarumObjects commented Nov 30, 2021

AlexanderYastrebov commented Dec 2, 2021

szuecs commented Dec 2, 2021

SarumObjects commented Dec 2, 2021

szuecs commented Dec 2, 2021 • edited Loading

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021 • edited Loading

SarumObjects commented Dec 3, 2021

AlexanderYastrebov commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

szuecs commented Dec 3, 2021

SarumObjects commented Dec 3, 2021

szuecs commented Dec 7, 2021

SarumObjects commented Dec 7, 2021

SarumObjects commented Dec 13, 2021

jbilliau-rcd commented Dec 30, 2022

SarumObjects commented Dec 31, 2022

szuecs commented Jan 3, 2023

jbilliau-rcd commented Jan 3, 2023

szuecs commented Jan 3, 2023 • edited Loading

jbilliau-rcd commented Jan 4, 2023

mikkeloscar commented Jan 4, 2023

jbilliau-rcd commented Jan 18, 2023

mikkeloscar commented Jan 20, 2023

szuecs commented Dec 2, 2021 •

edited

Loading

AlexanderYastrebov commented Dec 3, 2021 •

edited

Loading

szuecs commented Jan 3, 2023 •

edited

Loading