Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Envoy proxy failing with AWS STS credential errors #464

Open
shekharpalit opened this issue Jun 1, 2023 · 3 comments
Open

Bug: Envoy proxy failing with AWS STS credential errors #464

shekharpalit opened this issue Jun 1, 2023 · 3 comments
Labels
Bug Something isn't working

Comments

@shekharpalit
Copy link

SECURITY NOTICE: If you think you’ve found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here or email AWS security directly.

Summary
Our application, running on an AWS EKS cluster using AWS AppMesh, is experiencing connectivity issues. The application's pods are not able to reach out to the internet. We have set the egressFilter in AppMesh to ALLOW_ALL and the service account attached to the pods has the necessary IAM policies (AWSCloudMapFullAccess, AWSAppMeshFullAccess, and AWSAppMeshEnvoyAccess) associated.

When checking the logs of the Envoy proxy, we observed the following error message:

[2023-06-01 00:43:50.278][21][error][aws] [source/extensions/common/aws/credentials_provider_impl.cc:279] Could not load AWS credentials document from STS
[2023-06-01 00:43:50.288][15][warning][config] [./source/common/config/grpc_stream.h:163] StreamAggregatedResources gRPC config stream to appmesh-envoy-management.us-east-1.amazonaws.com:443 closed: 7, Unauthorized to perform appmesh:StreamAggregatedResources for arn:aws:appmesh:us-east-1:513177627844:mesh/example/virtualNode/user-management-service-virtual-node.

We have tried several troubleshooting steps, including verifying the IAM policies, IAM role's trust relationship, service account assignments, system time on the EKS nodes, and more, but the issue persists

The aicronaut app we are trying to run inside the pod after we activate the mesh

{"timeMillis":1685493940463,"thread":"main","level":"ERROR","loggerName":"io.micronaut.runtime.Micronaut","message":"Error starting Micronaut server: Unable to execute HTTP request: Network is unreachable","thrown":{"commonElementCount":0,"localizedMessage":"Unable to execute HTTP request: Network is unreachable","message":"Unable to execute HTTP request: Network is unreachable","name":"software.amazon.awssdk.core.exception.SdkClientException","cause":{"commonElementCount":45,"localizedMessage":"Network is unreachable","message":"Network is unreachable","name":"java.net.SocketException","extendedStackTrace":"java.net.SocketException: Network is unreachable\n\tat sun.nio.ch.Net.connect0(Native Method) ~[?:?]\n\tat sun.nio.ch.Net.connect(Net.java:579) ~[?:?]\n\tat sun.nio.ch.Net.connect(Net.java:568) ~[?:?]\n\tat sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588) ~[?:?]\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]\n\tat java.net.Socket.connect(Socket.java:633) ~[?:?]\n\tat org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) ~[httpclient-4.5.13.jar:4.5.13]\n\tat software.amazon.awssdk.http.apache.internal.conn.SdkTlsSocketFactory.connectSocket(SdkTlsSocketFactory.java:65) ~[apache-client-2.20.51.jar:?]\n\tat 

This is my yaml file which creates the virtual services, router, nodes

---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: {{ include "helm.fullname" . }}-vn
  namespace: {{ .Release.Namespace }}
spec:
  awsName: {{ include "helm.fullname" . }}-virtual-node
  podSelector:
    matchLabels:
      app: {{ include "helm.name" . }}
  listeners:
    - portMapping:
        port: {{ .Values.service.port }}
        protocol: http
  serviceDiscovery:
    dns:
      hostname: {{ include "helm.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualRouter
metadata:
  namespace: {{ .Release.Namespace }}
  name: {{ include "helm.fullname" . }}-vr
spec:
  awsName: {{ include "helm.fullname" . }}-virtual-router
  listeners:
    - portMapping:
        port: {{ .Values.service.port }}
        protocol: http
  routes:
    - name: {{ include "helm.fullname" . }}-route
      httpRoute:
        match:
          prefix: /
        action:
          weightedTargets:
            - virtualNodeRef:
                name: {{ include "helm.fullname" . }}-vn
              weight: 1
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
  namespace: {{ .Release.Namespace }}
  name: {{ include "helm.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local
spec:
  awsName: {{ include "helm.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local
  provider:
    virtualRouter:
      virtualRouterRef:
        name: {{ include "helm.fullname" . }}-vr


and this is my serviceaccount.yaml file which I am using in the helm

apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ include "helm.fullname" . }}
  namespace: {{ .Release.Namespace }}
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXX:role/eksctl-eks-addon-iamserviceaccount-default-d-Role1-XXXX
    

this is enabled in my deployment.yaml file in helm

  template:
    metadata:
      annotations:
        appmesh.k8s.aws/sidecarInjectorWebhook: enabled

Note:

When we deploy the service without the mesh, it deploys successfully 

Steps to Reproduce
- Run the application on the EKS cluster with AppMesh enabled.
- Try to make a network request from the application to an external resource.

Expected behavior
The application should be able to reach out to the internet and not present any STS credential-related errors in the Envoy logs.

Actual behavior
The application fails to reach the internet and the Envoy logs present STS credential-related errors.

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

@shekharpalit shekharpalit added the Bug Something isn't working label Jun 1, 2023
@suniltheta
Copy link

Can you please create a Support ticket for this issue? The issue seems specific to your setup.

@shekharpalit
Copy link
Author

Can you please create a Support ticket for this issue? The issue seems specific to your setup.

support is not being helpful here
can you please guide me what I am missing here and how to resolve this issue ?

@suniltheta
Copy link

suniltheta commented Jun 1, 2023

We need to understand why it is failing to load AWS credentials document from STS. Can you enable debug logs to know more details around why it fails?

Sometimes the AWS_WEB_IDENTITY_TOKEN_FILE will be missing if AWS_ROLE_ARN is manually specified.
By design the EKS pod identity webhook will not overwrite customer-defined AWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILE.” https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/pkg/handler/handler.go#L142-L154

AWS Troubleshooting docs: https://docs.aws.amazon.com/app-mesh/latest/userguide/troubleshooting-kubernetes.html#ts-kubernetes-irsa-not-working

This is just one known issue, but not sure what it is in your case. So through support ticket we would be able to get into the details of the issue. Can you please let me know if you already have an open ticket for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants