-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Airflow 4923] Fix Databricks hook leaks API secret in logs #5635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Two questions:
- do you know why BaseHook.get_connection has this behavior of logging connections ONLY when there is a host set? Is this going to be true long term?
- would you guys consider having a third option like setting username to
token
and using the password field? I believe this JSON filed is always visible in plaintext in the airflow UI. (Can be a future PR, just a thought).
Hi @zenyui
Basically it is now similar to aws_hook functionality |
) * Update databricks operator * Updated token auth to get from extra_dejson * Update test DatabricksHookTokenTest to use get host from 'extra' (cherry picked from commit db770cf)
If somebody stumbles upon this PR, Ignore point 2, it is incorrect. Putting Login and Password causes it to Authenticate via Basic Auth. There can be a future PR to add this ability via adding an some identifier in the extra json to let us know we are using the token in the password field and authenticate via token rather then Basic Auth which keeps the token not in plain text. |
When users store token in the extra field for a Databricks connection the DatabricksHook leaks the token to the airflow logs. Adding ability(and updating docs) for the hook to get the host from the extra json will not cause the Basehook.get_connection to send the extra json to the airflow logs since 'host' field will be empty.
Jira
https://issues.apache.org/jira/browse/AIRFLOW-4923
Description
When users store token in the extra field for a Databricks connection the DatabricksHook leaks the token to the airflow logs. Adding ability(and updating docs) for the hook to get the host from the extra json will not cause the Basehook.get_connection to send the extra json to the airflow logs since 'host' field will be empty.
Tests
Updated testcase to look for host in extra json
tests.contrib.hooks.test_databricks_hook:DatabricksHookTokenTest.test_submit_run
Commits
Documentation
Code Quality
flake8