Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] [Python] Add functionality of STSProfileCredentialsProvider to default credentials chain for S3FileSystem #41794

Open
fjetter opened this issue May 23, 2024 · 4 comments

Comments

@fjetter
Copy link
Contributor

fjetter commented May 23, 2024

Describe the enhancement requested

Given a typical AWS credentials setup that defines IAM roles like the following

# ~/.aws/config
[default]
region=us-east-2
role_arn=arn:aws:iam::123456789012:role/RoleName
source_profile=default

# ~/.aws/credentials
[default]
aws_access_key_id=XXXXXXXXXXX
aws_secret_access_key=YYYYYYYYYYYY

almost all AWS sdks are interpreting this correctly as an assume-role method that generates a temporary STS token pair.

For example, using python this looks like

import boto3
b3sess = boto3.Session()
creds = b3sess.get_credentials()
{
    "method": creds.method,
    "secret": creds.secret_key[:5] + "...",
    "token": creds.token[:5] + "...",
}

{'method': 'assume-role', 'secret': 'jALbI...', 'token': 'IQoJb...'}

The C++ sdk is deviating from how the default credentials chain is implemented and is not supporting this kind of configuration but instead uses the plain access key + secret key pair that is found in the configuration which does not necessarily provide sufficient permissions.

Dask adopted the S3FileSystem as a more performant alternative to the existing default fsspec filesystem for its parquet reader but this lack of support in the C++ sdk is a bit of a nasty blocker for further adoption. We ended up writing a workaround for our benchmarking by using boto to read the credentials and initialize the S3FileSystem manually but this has a couple of flaws. For starters, this is pretty unergonomic and nontrivial but more importantly this prohibits the refresh of the token after expiration (max duration is 1hr)

There's been some discussion on the aws-sdk-cpp repo about this with a suggestion to implement an amended credentials chain, see here that includes the STSProfileCredentialsProvider but it's also pointed out that this is flawed as well.

Also related

I know this is ultimately a aws-sdk-cpp problem but end users of the arrow S3FileSystem do not have this transparency and expect things to "just work", particularly when consuming the python API and they are used from how boto and other libraries are parsing credentials.

cc @pitrou since you've been poking in this area recently

Component(s)

C++, Python

@pitrou
Copy link
Member

pitrou commented May 23, 2024

I know this is ultimately a aws-sdk-cpp problem but end users of the arrow S3FileSystem do not have this transparency and expect things to "just work", particularly when consuming the python API and they are used from how boto and other libraries are parsing credentials.

Which concrete resolution would you recommend we apply in our S3FileSystem implementation?

@fjetter
Copy link
Contributor Author

fjetter commented May 23, 2024

Which concrete resolution would you recommend we apply in our S3FileSystem implementation?

I have to admit that I haven't understood exactly what the flaws of the new toolchain is. I have a prototype locally and can confirm that the simple new toolchain as suggested here aws/aws-sdk-cpp#150 (comment) is a little buggy and I couldn't get it to working (I got it compiling / running but the way they parse the config is somehow different / broken).

I also saw that in dynamodb-shell they implemented a fix for this and they are also describing how the simple chain wasn't sufficient but the PR is a little complex and I haven't had the time to look through it, see awslabs/dynamodb-shell#13

@pitrou
Copy link
Member

pitrou commented May 23, 2024

I will admit to understanding nothing about this, but we do expose role-related parameters in S3FileSystem (e.G. role_arn). Does that not work for this?

@fjetter
Copy link
Contributor Author

fjetter commented May 23, 2024

If one provides a role_arn and a session_name it does work as expected because it is not using the default credentials provider chain and is in fact not even loading the aws config.

The issue is only ocuring if one is initializing the filesystem with its default constructor which I assume is the most frequently used way to use the filesystem. Many users will not want to remember or copy the ARN

Brainstorming possibilities to address this...

  • For a python-only fix this could actually use boto3 to do the config parsing and forward whatever boto spits out to the C++ backend which is then not using the c++ sdk
  • For a Python/C++ we may have to implement something similar as Add support for role assumption (delegated access). awslabs/dynamodb-shell#13 After a second thought, the changes over there are not that complex but they are re-implementing parts of the config parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants