Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AWS IAM Kubernetes service account permissions #52625

Closed
ErikvanDuren opened this issue Feb 21, 2020 · 21 comments · Fixed by #81255
Closed

Support for AWS IAM Kubernetes service account permissions #52625

ErikvanDuren opened this issue Feb 21, 2020 · 21 comments · Fixed by #81255
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team

Comments

@ErikvanDuren
Copy link

Amazon EKS recently supports IAM permissions for Kubernetes service accounts.

I would be nice to have support for service account permissions implemented in (at least) the S3 repository plugin so it is possible to create snapshots to an S3 repository without having to update access keys and tokens on a regular basis.

@DaveCTurner DaveCTurner added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement labels Feb 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@DaveCTurner
Copy link
Contributor

I had a quick check and this looks to be implemented by the WebIdentityTokenCredentialsProvider which is in the SDK version we're already using. The tricky bit will be to construct a test fixture that sets up an environment similar enough to EKS to show that it does integrate properly.

@chonton
Copy link

chonton commented May 5, 2020

What are the security concerns with using system properties, environment variables, or profiles to store credentials? A web search reveals many opinions both for and against each method. Should Elastic be enforcing a particular security policy or should Elastic make deployers aware of the pros/cons of each method?

In Kubernetes, mapped files holding the credentials seems to be one of the recommended practices at this time. (Profile method)

@chonton
Copy link

chonton commented May 15, 2020

Elasticsearch team please chime in on which methods will be acceptable.

@jim-barber-he
Copy link

For this to work this file needs changing: https://github.com/elastic/elasticsearch/blob/master/distribution/src/bin/elasticsearch-env-from-file

When I try to use IAM Roles for Service accounts it currently errors with:

ERROR: File ..data/token (target of symlink /var/run/secrets/eks.amazonaws.com/serviceaccount/token from AWS_WEB_IDENTITY_TOKEN_FILE) must have file permissions 400 or 600, but actually has: 640

The token gets created with the following permissions:

$ ls -l /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token
-rw-r----- 1 root 1001 1001 May 21 23:49 /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token

By default this file has permission 600 and owned by root:root, but when this is set:

podSecurityContext:
  fsGroup: 1001
  runAsUser: 1001

Then the group ownership of the token is changed to match fsGroup and the group read bit set so that the token can be read.
Without that change, the token wouldn't be readable, so the hard check for permission 400 or 600 enforced in the elasticsearch-env-from-file script isn't suitable for this case.

@jim-barber-he
Copy link

jim-barber-he commented Jul 1, 2020

The distribution/src/bin/elasticsearch-env-from-file file was fixed for version 7.6.1 of ElasticSearch and so that is no longer a blocker.
I've been able to get ElasticSearch to start with IAM Roles for Service Accounts on ES 7.7.0, and the token is mounted correctly along with the environment variables injected correctly, but it still doesn't seem to be working at this point.
I'm seeing errors that contain the following when AWS access is required:

      "caused_by" : {
        "type" : "sdk_client_exception",
        "reason" : "sdk_client_exception: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/"
      }

@hamishforbes
Copy link

Has there been any progress on this issue?
I've not been able to get Elasticsearch 7.9.2 to use EKS IAM roles for S3 snaps even though the pod is correctly authenticated.

I'd very much like to not have to inject static credentials as this would be a step backwards from my current non-k8s deployment.
So far elasticsearch seems to be the only application i've found that doesn't correctly support web identities for AWS

What's the reasoning behind not just using the default AWS SDK credential chain here?

@pugnascotia pugnascotia removed their assignment Oct 21, 2020
@fculpo
Copy link

fculpo commented Nov 5, 2020

Hi,

ES is also one of the last application in our infrastructure not being able to use EKS IRSA (OIDC).

ES does not tries to read EKS IRSA token mounted in the pods:

ServiceAccount has the following annotation:

Annotations:   eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/ekssa-es-snapshots

Which results in these env vars to be mounted in the pods:

Environment:                                                                                                                
AWS_ROLE_ARN: arn:aws:iam::xxx:role/ekssa-es-snapshots
AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token

But ES keeps trying to use EC2 Metadata:

curl -X PUT -H 'Content-Type: application/json' "localhost:9200/_snapshot/s3_backup_bucket/backup-1" -d '{
    "include_global_state": true,
    "include_aliases": true
  }'

...

{"type":"sdk_client_exception","reason":"The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/"}}},"status":500}
...

Any way to make this working ?

@therealdwright
Copy link

therealdwright commented Jun 7, 2021

For anyone who stumbles upon this, there is a workaround posted here.

First, create a secret containing AWS credentials.
kubectl create secret generic aws-credentials --from-literal=s3.client.default.access_key=YOUR_ACCESS_KEY --from-literal=s3.client.default.secret_key=YOUR_SECRET_KEY

Then refer to the secret in the manifest.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: TheRealDwright-testing
spec:
  version: 7.13.0
  secureSettings:
    - secretName: aws-credentials
  nodeSets:
    - name: default
      count: 1
      podTemplate:
        spec:
        initContainers:
          - name: install-plugins
            command: ["/bin/sh", "-c"]
            args:
              - |
                bin/elasticsearch-plugin install --batch repository-s3

I hope this saves someone a couple of hours googling and we can merge #65923 soon.

@Aloshi
Copy link

Aloshi commented Jun 24, 2021

+1, we just ran into this today :(

@pedro-brentan
Copy link

@therealdwright the question is that it should use the service account, there are a lot of ways using access key and secret key, this isn't a problem at all. But service accounts are way more secure than credentials.

Is there any ETA for this issue?

@therealdwright
Copy link

@therealdwright the question is that it should use the service account, there are a lot of ways using access key and secret key, this isn't a problem at all. But service accounts are way more secure than credentials.

Is there any ETA for this issue?

I agree and I also would like to use IRSA also. I posted this as a workaround as I had assumed (incorrectly) the service account approach would work.

@andronux
Copy link

🆙
will it be possible to have the ability to use IRSA in the future?

@Axelcouty
Copy link

It seems the repository-s3 plugin uses AWS SDK v1 :

I highly doubt the SDK V1 supports the environment variable AWS_WEB_IDENTITY_TOKEN_FILE even with the sts dependency. (Actually tested this a few days ago for a java app).

That means the fix for this would be to

  1. Upgrade to aws sdk v2
  2. Add sts dependency

@aydasraf
Copy link

I am having the exact same issue, we are trying to use Service Accounts in EKS and despite the pod is correctly initated and we can see the below environment variables:

AWS_DEFAULT_REGION=eu-west-1
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token ( token is there)
AWS_REGION=eu-west-1
AWS_ROLE_ARN=arn:aws:iam::xxxx:role/zzzzz_role

Yet the s3 connections still fail, digging into the logs i could see the following error from s3 plugin, it may help in closing this issue. [Note: file is not there! ]

{"type": "server", "timestamp": "2021-10-11T11:31:45,720Z", "level": "WARN", "component": "c.a.a.p.i.BasicProfileConfigFileLoader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Unable to load config file null", 
"stacktrace": ["java.security.AccessControlException: access denied (\"java.io.FilePermission\" \"/usr/share/elasticsearch/.aws/config\" \"read\")",
"at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:?]",
"at java.security.AccessController.checkPermission(AccessController.java:1036) ~[?:?]",
"at java.lang.SecurityManager.checkPermission(SecurityManager.java:408) ~[?:?]",
"at java.lang.SecurityManager.checkRead(SecurityManager.java:747) ~[?:?]",
"at java.io.File.exists(File.java:826) ~[?:?]",
"at com.amazonaws.profile.path.config.SharedConfigDefaultLocationProvider.getLocation(SharedConfigDefaultLocationProvider.java:36) ~[aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.profile.path.AwsProfileFileLocationProviderChain.getLocation(AwsProfileFileLocationProviderChain.java:41) ~[aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.auth.profile.internal.BasicProfileConfigFileLoader.getProfilesConfigFile(BasicProfileConfigFileLoader.java:69) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.auth.profile.internal.BasicProfileConfigFileLoader.getProfile(BasicProfileConfigFileLoader.java:55) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.retry.internal.RetryModeResolver.profile(RetryModeResolver.java:92) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.retry.internal.RetryModeResolver.resolveRetryMode(RetryModeResolver.java:83) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.retry.internal.RetryModeResolver.<init>(RetryModeResolver.java:46) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.retry.RetryPolicy.<clinit>(RetryPolicy.java:35) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.retry.PredefinedRetryPolicies.<clinit>(PredefinedRetryPolicies.java:30) [aws-java-sdk-core-1.11.749.jar:?]",
"at com.amazonaws.ClientConfiguration.<clinit>(ClientConfiguration.java:89) [aws-java-sdk-core-1.11.749.jar:?]",
"at java.lang.Class.forName0(Native Method) [?:?]",
"at java.lang.Class.forName(Class.java:375) [?:?]",
"at org.elasticsearch.repositories.s3.S3RepositoryPlugin.lambda$static$0(S3RepositoryPlugin.java:48) [repository-s3-7.15.0.jar:7.15.0]",
"at java.security.AccessController.doPrivileged(AccessController.java:312) [?:?]",
"at org.elasticsearch.repositories.s3.S3RepositoryPlugin.<clinit>(S3RepositoryPlugin.java:42) [repository-s3-7.15.0.jar:7.15.0]",
"at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]",
"at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78) ~[?:?]",
"at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]",
"at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]",
"at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]",
"at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:753) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.plugins.PluginsService.loadBundle(PluginsService.java:695) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:496) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:158) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.node.Node.<init>(Node.java:367) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.node.Node.<init>(Node.java:288) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:219) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:219) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:167) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:158) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:114) [elasticsearch-cli-7.15.0.jar:7.15.0]",
"at org.elasticsearch.cli.Command.main(Command.java:79) [elasticsearch-cli-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:123) [elasticsearch-7.15.0.jar:7.15.0]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81) [elasticsearch-7.15.0.jar:7.15.0]"] }

@DaveCTurner
Copy link
Contributor

That's unrelated to this issue @aydasraf, this message is just noise. By default Elasticsearch hides it (see #56346) but it seems you've adjusted your logging config to show it anyway.

@arteam arteam self-assigned this Nov 30, 2021
arteam added a commit that referenced this issue Jan 19, 2022
…1255)

There have been many requests to support repository-s3 authentication via IAM roles in Kubernetes service accounts.

The AWS SDK is supposed to support them out of the box with the aws-java-sdk-sts library. Unfortunately, we can't use WebIdentityTokenCredentialsProvider from the SDK. It reads the token from AWS_WEB_IDENTITY_TOKEN_FILE environment variable which is usually mounted to /var/run/secrets/eks.amazonaws.com/serviceaccount/token and the S3 repository doesn't have the read permission to read it. We don't want to hard-code a file permission for the repository, because the location of AWS_WEB_IDENTITY_TOKEN_FILE can change at any time in the future and we would also generally prefer to restrict the ability of plugins to access things outside of their config directory.

To overcome this limitation, this change adds a custom WebIdentityCredentials provider that reads the service account from a symlink to AWS_WEB_IDENTITY_TOKEN_FILE created in the repository's config directory. We expect the end user to create the symlink to indicate that they want to use service accounts for authentification.

Service accounts are checked and exchanged for session tokens by the AWS STS. To test the authentification flow, this change adds a test fixture which mocks the assume-role-with-web-identity call to the service and returns a response with test credentials.

Fixes #52625
@hamishforbes
Copy link

So either I'm doing something wrong or the latest changes in 8.0 make this worse.
Posting a comment here as I assume most people who are interested are watching this issue.

My setup still had the role mapping enabled (and therefore the AWS_WEB_IDENTITY_TOKEN_FILE env var in the container), with no other changes this causes Elasticsearch to crashloop when upgraded to 8.0.1.
Because the S3 plugin sees the env var, doesn't find a matching symlink and throws an error, rather than just falling back to the existing static creds that have been working for the last x years this issue has been open.

Be aware when upgrading!

I added the symlink as per the doco

mkdir -p "/usr/share/elasticsearch/config/repository-s3";
ln -s $AWS_WEB_IDENTITY_TOKEN_FILE "/usr/share/elasticsearch/config/repository-s3/aws-web-identity-token-file"

This lets Elasticsearch start correctly with the s3 plugin installed and the env var injected.

Of course if I remove the old secureSettings workaround I still can't access S3 with IAM roles.

arteam added a commit that referenced this issue Mar 9, 2022
…oken (#84697)

Make sure users can use the static credentials even if there is a service account with IAM roles configured on the system.

See #52625 (comment)
arteam added a commit to arteam/elasticsearch that referenced this issue Mar 9, 2022
…oken (elastic#84697)

Make sure users can use the static credentials even if there is a service account with IAM roles configured on the system.

See elastic#52625 (comment)

(cherry picked from commit d965595)
arteam added a commit to arteam/elasticsearch that referenced this issue Mar 9, 2022
…oken (elastic#84697)

Make sure users can use the static credentials even if there is a service account with IAM roles configured on the system.

See elastic#52625 (comment)

(cherry picked from commit d965595)
arteam added a commit that referenced this issue Mar 9, 2022
…oken (#84697) (#84824)

Make sure users can use the static credentials even if there is a service account with IAM roles configured on the system.

See #52625 (comment)

(cherry picked from commit d965595)
arteam added a commit that referenced this issue Mar 9, 2022
…oken (#84697) (#84825)

Make sure users can use the static credentials even if there is a service account with IAM roles configured on the system.

See #52625 (comment)
@arteam
Copy link
Contributor

arteam commented Mar 9, 2022

Hi @hamishforbes!

Thank you very much for the report! It's indeed a bug: Elasticsearch shouldn't crash if the user didn't create a symlink to AWS_WEB_IDENTITY_TOKEN_FILE and should fallback to the static credentials in that case. It was fixed in #84697 which will be shipped in 8.0.2, 8.1.1 and 8.2.0.

arteam added a commit that referenced this issue Mar 14, 2022
…84585)

If we don't instruct to look up the region from the Endpoint URL, AWSSecurityTokenServiceClient tries to look it up implicitly in a custom way which requires reading the /.aws/config file for which we don't have a file permission.

The same approach is used for the general AmazonS3ClientBuilder

Resolves #83826, #52625
arteam added a commit to arteam/elasticsearch that referenced this issue Mar 14, 2022
…lastic#84585)

If we don't instruct to look up the region from the Endpoint URL, AWSSecurityTokenServiceClient tries to look it up implicitly in a custom way which requires reading the /.aws/config file for which we don't have a file permission.

The same approach is used for the general AmazonS3ClientBuilder

Resolves elastic#83826, elastic#52625
arteam added a commit to arteam/elasticsearch that referenced this issue Mar 14, 2022
…lastic#84585)

If we don't instruct to look up the region from the Endpoint URL, AWSSecurityTokenServiceClient tries to look it up implicitly in a custom way which requires reading the /.aws/config file for which we don't have a file permission.

The same approach is used for the general AmazonS3ClientBuilder

Resolves elastic#83826, elastic#52625
arteam added a commit that referenced this issue Mar 14, 2022
…84585) (#84946)

If we don't instruct to look up the region from the Endpoint URL, AWSSecurityTokenServiceClient tries to look it up implicitly in a custom way which requires reading the /.aws/config file for which we don't have a file permission.

The same approach is used for the general AmazonS3ClientBuilder

Resolves #83826, #52625
arteam added a commit that referenced this issue Mar 14, 2022
…84585) (#84947)

If we don't instruct to look up the region from the Endpoint URL, AWSSecurityTokenServiceClient tries to look it up implicitly in a custom way which requires reading the /.aws/config file for which we don't have a file permission.

The same approach is used for the general AmazonS3ClientBuilder

Resolves #83826, #52625
@hamishforbes
Copy link

hamishforbes commented Apr 22, 2022

Ok after much banging of heads against walls I think i've made this work

You must manually set an additional env var that is not set by the EKS / IAM integration automatically
The s3 repository code requires AWS_ROLE_SESSION_NAME to be set in addition to the automatic AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN env vars.

The session name is not required by the SDK normally, it just makes up a default value.
Not so the elastic S3 plugin

Depending on how you look at it, there's either a key piece of information missing from the documentation or this is just the incorrect implementation.
IMO the s3 plugin should mirror the default SDK behaviour as closely as possible and generate a default session name

@arteam thoughts?

edit: For those using the ECK operator you need to do something like

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
    name: your-elastic-cluster
spec:
  nodeSets:
  - name: default
    podTemplate:
      spec:
        containers:
        - env:
          - name: AWS_ROLE_SESSION_NAME
            value: repository-s3
          name: elasticsearch
        initContainers:
        - command:
          - sh
          - -c
          - mkdir -p "/usr/share/elasticsearch/config/repository-s3"; ln -s $AWS_WEB_IDENTITY_TOKEN_FILE
            "/usr/share/elasticsearch/config/repository-s3/aws-web-identity-token-file"
          name: symlink-token

Note you dont need to install the s3 plugin in an init container anymore either

arteam added a commit to arteam/elasticsearch that referenced this issue Apr 28, 2022
… not provided

The AWS SDK actually doesn't require the session name to be set and generates one if
it's not provided via the `AWS_ROLE_SESSION_NAME` environment variable.

See elastic#52625 (comment).
arteam added a commit that referenced this issue May 6, 2022
… not provided (#86255)

The AWS SDK actually doesn't require the session name to be set and generates one if
it's not provided via the AWS_ROLE_SESSION_NAME environment variable.

See #52625
arteam added a commit to arteam/elasticsearch that referenced this issue May 6, 2022
… not provided (elastic#86255)

The AWS SDK actually doesn't require the session name to be set and generates one if
it's not provided via the AWS_ROLE_SESSION_NAME environment variable.

See elastic#52625
arteam added a commit that referenced this issue May 6, 2022
… not provided (#86255) (#86502)

The AWS SDK actually doesn't require the session name to be set and generates one if
it's not provided via the AWS_ROLE_SESSION_NAME environment variable.

See #52625
@jerryguowei
Copy link

It's surprising that this feature is not backport to 7.17. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team
Projects
None yet