-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not deprecate Botocore Session in upcoming release (0.8) #1104
Comments
Thanks for raising this issue @BTheunissen
The ability to refresh AWS credentials is important for long-running jobs. Let's open a ticket to track this feature. See this comment for the reason to deprecate |
@kevinjqliu Definitely fair enough that the reason for deprecation being that the catalog settings are generally exposed as a I'd be fine removing if the ticket to track credential refresh was written up, I'd take a crack at implementing it but honestly the workarounds I've had to do to support it for both the Python boto clients, and the underlying filesystem implementations is pretty hacky, there are some existing issues on the same topic open against the Arrow project as the guidance from AWS on properly supporting refreshable credentials is very spotty. |
@BTheunissen +1, opened #1129 to track this feature. It can be hacky for now. This feature is generally nice to have for the project |
@BTheunissen, I'm in the same situation as you, trying to use Pyiceberg with automatically refreshable AWS credentials. Would you be able to share how you made this work with the current version of Pyiceberg? The glue catalog picks up the session correctly, but it doesn't use it for accessing S3. |
you can either set glue and s3 credentials separately or use the unified AWS credential configs |
I'm setting |
@cshenrik Sorry about the lateness, I actually did a small internal fork of the library and added the following logic to
Passing the role_arn and session_name will let the S3 File System automatically refresh the credentials of the AWS C++ client used by the PyArrow file system, pretty tedious but working so far! |
Thanks for sharing that, @BTheunissen. I have to call a bespoke webservice for retrieving AWS credentials, so I can't use that implementation directly, but it's still good to see what others did. |
#1296 added the option to pass
@BTheunissen do you know if passing the role_arn will automatically refresh S3 credentials for long running jobs? For pyarrow doc just mentions
|
Feature Request / Improvement
The AWS parameter
botocore_session
has been flagged as deprecated as of #922, and is due to be removed at Milestone 0.8.I'd like to request that this parameter is not deprecated, and I'd be happy to add a PR to bring the credential name in-line with the rest of the updated client configuration.
botocore_session
is helpful to make available to override in order to support automatically refreshable credentials for long-running jobs.For example in my project I have the following boto3 utility code:
Which can be used as follows:
This allows the user to elapse over the IAM role-chaining limitation of 1 hour, very useful for reading extremely large tables.
I'd also like to contribute some of this code upstream at some point to support refreshable botocore sessions in both the AWS Glue/DynamoDB clients, as well as the underlying S3 file system code.
The text was updated successfully, but these errors were encountered: