Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add impersonation_scopes to BigQuery #38169

Merged
merged 10 commits into from
Mar 30, 2024
Merged

Conversation

ying-w
Copy link
Contributor

@ying-w ying-w commented Mar 15, 2024

closes: #33400

Currently scopes for google libraries are taken from connection parameters, this PR creates an additional parameter impersonation_scopes that will be used in situations when impersonation_chain is being used.

The flow to get credentials for gcp providers goes something like this

  1. A library like bigquery will call get_credentials()
    credentials=self.get_credentials(),
  2. This is provided by GoogleBaseHook
    credentials, _ = self.get_credentials_and_project_id()
  3. This then makes a call to self.scopes()
  4. scopes will then return scopes from connection extras or default (in _get_scopes())
    def scopes(self) -> Sequence[str]:

Possible alternative solutions

  1. Override scopes() definition in custom hook - as detailed in above issue BigQuery with impersonation_chain does not accept custom scopes #33400 (comment)
  2. Introduce a parameter called scopes - it would be confusing what takes precedence if there is scopes in parameter and connection
  3. Create a class for impersonation https://google-auth.readthedocs.io/en/master/reference/google.auth.impersonated_credentials.html

I would need some help testing and bringing this over the finish line

Should be able to do something like

bq_hook = BigQueryHook(
    gcp_conn_id="my_con",
    impersonation_chain="[email protected]",
    use_legacy_sql=False,
    location="us",
    api_resource_configs={"query": {"useQueryCache": False, "priority": "BATCH"}},
    scopes = (
    "https://www.googleapis.com/auth/cloud-platform",
    "https://www.googleapis.com/auth/drive",
    ),
)
bq_hook.scopes  # check that it got set

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Mar 15, 2024
@eladkal eladkal changed the title add impersonation_scopes Add impersonation_scopes to BigQuery Mar 16, 2024
@eladkal eladkal changed the title Add impersonation_scopes to BigQuery Add impersonation_scopes to BigQuery Mar 16, 2024
@eladkal
Copy link
Contributor

eladkal commented Mar 16, 2024

Can you add unit test to cover the change?

Copy link
Collaborator

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks are failing. Can you fix them? Can you add the unit test for the change suggested by @eladkal ?

@ying-w
Copy link
Contributor Author

ying-w commented Mar 18, 2024

i'll do another pass at it sometime this week

@ying-w
Copy link
Contributor Author

ying-w commented Mar 21, 2024

I need some help figuring out how to write a test. I see that there is a cloud/operators/test_bigquery.py file with a impersonation_chain call but no impersonation_chain call in cloud/hooks/test_bigquery.py

The other example I looked at

google_impersonation_chain=IMPERSONATION_CHAIN,

but I'm not quite sure what that line is doing

@shahar1
Copy link
Contributor

shahar1 commented Mar 21, 2024

cloud/hooks/test_bigquery.py

I need some help figuring out how to write a test. I see that there is a cloud/operators/test_bigquery.py file with a impersonation_chain call but no impersonation_chain call in cloud/hooks/test_bigquery.py

The other example I looked at

google_impersonation_chain=IMPERSONATION_CHAIN,

but I'm not quite sure what that line is doing

You could mock the impersonation chain as done in other tests of GCP hooks (hooks/test_compute.py, for example; there are more).
As for the other example, it is called google_impersonation_chain to differentiate it from a similar parameter in the transfer operator's target/source cloud platform (which is not necessarily GCP).

@ying-w
Copy link
Contributor Author

ying-w commented Mar 30, 2024

i think i fixed the test

@potiuk potiuk merged commit 0f51347 into apache:main Mar 30, 2024
40 checks passed
@ying-w ying-w deleted the scopes-for-impersonation branch March 31, 2024 17:32
mathiaHT pushed a commit to mathiaHT/airflow that referenced this pull request Apr 4, 2024
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery with impersonation_chain does not accept custom scopes
5 participants