Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5335] Update GCSHook methods so they need min IAM perms #5939

Merged
merged 2 commits into from
Aug 29, 2019

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Aug 28, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    After we refactored to using Storage client in GCS, we need more IAM permissions.

This is because we use get_bucket method which requires storage.bucket.list permission. Instead of that if we use bucket method that creates Bucket object we don't need the above permission.

This restores the behavior (IAM perms needed) of Airflow <=1.10.3

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Tests updated

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@kaxil kaxil requested review from potiuk, Fokko and mik-laj August 28, 2019 16:27
@mik-laj
Copy link
Member

mik-laj commented Aug 28, 2019

client = storage.Client()

bucket = client.get_bucket('gcp-transfer-first-target')

It immediately sends requests to the server.

bucket = client.bucket('gcp-transfer-second-target')

It does not send the request immediately. It is necessary to execute: bucket.reload()

Related discussion: #5054 (comment)

@mik-laj
Copy link
Member

mik-laj commented Aug 28, 2019

Full example code:

import logging
from google.cloud import storage

logging.basicConfig(level=logging.DEBUG)

client = storage.Client()
print("Execute get_bucket")
bucket = client.get_bucket('gcp-transfer-first-target')

print("-"*80)

print("Execute bucket")
bucket = client.bucket('gcp-transfer-second-target')

print("Execute reload")
bucket.reload()

Sample output:

Execute get_bucket
DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.googleapis.com:443
DEBUG:urllib3.connectionpool:https://www.googleapis.com:443 "GET /storage/v1/b/gcp-transfer-first-target?projection=noAcl HTTP/1.1" 200 563
--------------------------------------------------------------------------------
Execute bucket
Execute reload
DEBUG:urllib3.connectionpool:https://www.googleapis.com:443 "GET /storage/v1/b/gcp-transfer-second-target?projection=noAcl HTTP/1.1" 200 561

@kaxil
Copy link
Member Author

kaxil commented Aug 28, 2019

@mik-laj Do you suggest adding bucket.reload() ?

client = storage.Client()

bucket = client.get_bucket('gcp-transfer-first-target')

It immediately sends requests to the server.

bucket = client.bucket('gcp-transfer-second-target')

It does not send the request immediately. It is necessary to execute: bucket.reload()

Related discussion: #5054 (comment)

@mik-laj
Copy link
Member

mik-laj commented Aug 28, 2019

No. I don't want you to add this method because it will restore the previous behavior. I just started to check the difference between these two methods.

@kaxil
Copy link
Member Author

kaxil commented Aug 28, 2019

Agree with @mik-laj .

Although my reason for this change is not entirely the same but closely-related.

When using get_bucket it needs storage.buckets.list
While creating the Bucket object using bucket and then, for example, running get_blob just needs storage.objects.* level permissions and won't need permission across all buckets.

This has advantages when using multiple buckets and you have IAM on a bucket level.

@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Aug 28, 2019
Copy link
Member

@mik-laj mik-laj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the code and it looks correct, but I think it's worth testing it thoroughly using system tests. Unfortunately, we do not have such tests yet, but I hope that they will be created tomorrow ;-) Let's wait with merging for it.

@turbaszek
Copy link
Member

Confirmed. With bucket I can use roles/storage.objectViewer to list and download objects. With get_bucket this role is not sufficient because in case of blob it's required to have storage.buckets.get access.

@kaxil
Copy link
Member Author

kaxil commented Aug 29, 2019

Thanks @nuclearpinguin

@kaxil kaxil merged commit b1d3d55 into apache:master Aug 29, 2019
@kaxil kaxil deleted the gcs-nec-perms branch August 29, 2019 15:40
Jerryguo pushed a commit to Jerryguo/airflow that referenced this pull request Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants