-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pass aws credentials as to the executor #387
Comments
In general it's hard to make pass throughs to the handlers as they're initialized independently of the execution control plane. We could look at adding the ability to pass options from execute to the handlers, but I'd have to think about that some more. Today there's a few ways around this problem. The first would be to map the credentials to ENV variables for the papermill execution. This doesn't appear to be possible with the airflow operator as-implemented. The other supported pattern is to implement your own s3 handler and register it over the base handler with your credentials being passed in however you deem fit. You can extend the existing S3 class in this process fairly easily. And the third path, which is not a recommended pattern as we might change the contract in the future, is to generate the session, client, and s3 objects and assign them ahead of time over https://github.com/nteract/papermill/blob/master/papermill/s3.py#L143 so the class caches the connection objects. I'd also suggest making an airflow issue and see if we can make the operators able to configure iorw options a little more natively in the future. |
@MSeal wouldn't something like this work? Since self.client will always be 's3' and self.s3 is fetched from params already, we can use an user-provided session instance, or follow the current flow if it's None. Edit: Link to relevant code snippet Edit2: Since the S3 methods are only called way deep in the stack, we could use either constants like papermill.AWS_KEY or env vars (like BOTO3_ENDPOINT_URL is already used) to set overrideable Session() kwargs. |
@Gurulhu As your second edit notes, it's fairly deep in the stack, so the read call in the middle and the calls above don't pass arguments down. In theory if you opened all those levels to *args **kwargs you could sorta do it, but there's some kwarg contentions therein that might get confusing. The ENV presence variable is a cleaner way to tackle this, though a static class variable (like I'd be happy with someone adding paths here that folks see as helpful, but the arg passing through the whole stack is still awkward in the code imho if the static options aren't sufficient. |
Hey I'm trying to use the papermill airflow operator to read and execute notebook with airflow.
in our company we use a central airflow instance and we could not rely on having aws credentials in
~/.aws/credentials
nor we can use environment variables.because our airflow instance is hosted in kubernetes we could not relay on ec2 roles either.
our most feasible option is to use airflow connections and explicitly pass aws credentials to papermill.
I assume different user's will use different credentials and thus will be a good addition to have.
Currently,
S3Handler
does not get argument so there is no way to pass those to to theboto3
instance it abstracts.IMO it's a good addition to have
The text was updated successfully, but these errors were encountered: