-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask passwords and sensitive info in task logs and UI #15599
Conversation
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
|
||
@task | ||
def my_func(): | ||
from airflow.utils.log.secrets_masker import mask_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sold that this is the right place for the public API to live.
I'm half tempted to make this a lazy import in airflow/__init__.py
so this becomes
from airflow.utils.log.secrets_masker import mask_secret | |
from airflow import mask_secret |
Anyone have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t feel like a top level import to me. Maybe the import path should be shorter, but probably not that short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of any ideas for what a shorter import path might look like without being top level 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it could be airflow.utils.mask_secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or airflow.log.mask_secret
? (Is it generally frawned upon to add a new module under airflow
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, though as soon as we do that I'm going to want to move a whole load of modules out of airflow.utils 😁 (probably no bad thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not going to complain if you do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell you what -- I'll leave this here for this PR, then follow up with a refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: airflow.log
vs airflow.logs
vs airflow.logging
Any preference @uranusjr ?
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
Would this be somehow masking too much and could provide an attack vector by changing another field containing the password string? For example, if I have username |
@uranusjr Possibly, but I don't think we need to defend against that level of attack -- it would only show up in the UI like that if it matches a connection the task has accessed. So yes, there's a theoretical attack surface here, but with the planned work of per-Connection ACLs etc, I think that is mitigated. Plus it only lets you validate the password if you already know it, but if you can change the DAG code, there are easier ways of exfil-ing the credentials. |
This isn't used anywhere yet, but it is the first step in not printing passwords in the logs
This masks secret values in logs for Connections and Variables. It behaves as follows: - Connection passwords are always masked, where-ever they appear. This means, if a connection has a password of `a`, then _every_ `a` in log messages would get replaced with `***` - "Sensitive" keys from extra_dejson are also masked. Sensitive is defined by the "existing" mechanism that the UI used, based upon the name of the key. - "Sensitive" Variables are also masked.
|
||
@task | ||
def my_func(): | ||
from airflow.utils.log.secrets_masker import mask_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t feel like a top level import to me. Maybe the import path should be shorter, but probably not that short.
task. | ||
|
||
It will also mask the value of a Variable, or the field of a Connection's extra JSON blob if the name contains | ||
any words in ('password', 'secret', 'passwd', 'authorization', 'api_key', 'apikey', 'access_token'). This list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can wr do a "include"/"exampleinclude" of it from the code so that we don't have to maintain this list at two places if in future we add another word here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved from elsewhere, so I'll leave it as it is for now
This masks sensitive values in logs for Connections and Variables. It behaves as follows: - Connection passwords are always masked, where-ever they appear. This means, if a connection has a password of `a`, then _every_ `a` in log messages would get replaced with `***` - "Sensitive" keys from extra_dejson are also masked. Sensitive is defined by the "existing" mechanism that the UI used, based upon the name of the key. - "Sensitive" Variables are also masked. (cherry picked from commit d295e70) (cherry picked from commit 76e3cc20606c05418d8824ec6b47629048c021cf)
This masks sensitive values in logs for Connections and Variables. It behaves as follows: - Connection passwords are always masked, where-ever they appear. This means, if a connection has a password of `a`, then _every_ `a` in log messages would get replaced with `***` - "Sensitive" keys from extra_dejson are also masked. Sensitive is defined by the "existing" mechanism that the UI used, based upon the name of the key. - "Sensitive" Variables are also masked. (cherry picked from commit d295e70)
Maybe just have a quick check on this, because on 2.1.0 with the setting hide_sensitive_var_conn_fields set to True, I still have in clear the information of the connection extra. thx. |
Still have it where? |
hide_sensitive_var_conn_fields is set to I assume the above field is for the extra json. The documentation mentions connection passwords to be masked by default which is not in my case. using Airflow 2.1. |
Which log file? |
for me it's in the task logs, example: |
Airflow DAG Task log file. ^ In this line it prints out the connection password as well. Quoting the source code which causes it to log in the DAG task log without masking
|
@thierryturpin Nothing in this line looks sensitive -- what did you expect to be masked? @raajpackt Can you give me a reproduction case that shows this behaviour please? |
sorry, my bad. my understanding was that instead of extra: {'vault_role': 'RL123'} I would have: extra: XXX. That everything of the extra would be masked in the logs. |
@thierryturpin Ah, as part of this change it aims to now only mask actual sensitive values. |
@ashb ok, thx for clarification. |
I'm having the same issue as @raajpackt, the passwords are showing in the logs when using get_connection(). |
Need more info please. Do you have any custom logging configured? Can you provide a minimal reproduction case please? |
I am trying to get my installation of airflow to mask passwords in the rendered template UI of a given task. It seems this pull request was merged and made available somewhere between versions 2.0.0 and 2.1.0
I have airflow 2.1.2, but I still see passwords being rendered and freely visible to anyone with access to airflow. Is there something I have to do to enable masking passwords rendered in templates to the UI? I'm using an Admin account locally to test. Here is the variable I would like to be masked in templates, airflow correctly masks it in the variable UI On the rendered template page of a given task, however, it freely shows the password to me: I have airflow 2.1.2 installed, What am I doing wrong here? How can I allow dockerized tasks to have access to the systems they need without broadcasting the credentials to everyone? Thanks! |
@cpdean Have you customized the Airflow logging at all? |
I don't think that I have, but I'm not sure. Here's the
In case it's relevant, I also have this in my
|
you still need to set the below 2 parameters into airflow.cfg in order to hide the secret in logs and rendered template. hide_sensitive_var_conn_fields = True |
Same issue as cpdean - I have the hide_sensitive_var_conn_fields as true with the sensitive_var_conn_names as you listed. I was under the impression those are from the json extras...not the native connections login/password. It is the native connections password that is being exposed in the logs |
@Jujubug88 Alternatively, you can try using own masking as suggested here. when you use vault to get the secret then before passing as params you can mask it. It will display as **** in log as well as rendered template. you can use the below code to mask the secret from the Vault. from airflow.utils.log.secrets_masker import mask_secret openssl_service_account_key_read_response = client.secrets.kv.read_secret_version(path=openssl_service_account_secret_path,mount_point=vault_mount_point) sample_commands=dedent(''' |
When I apply these config under the |
I am noticing in 2.1.2 that passwords are not masked when running in interactive console locally ... but in deployed environment they are 🤔 i.e. passwords from this log line in base hook:
|
@ashb ashb I am using the airflow version 2.2.4 in the AKS cluster. so, In our airflow logs, I can see the user name and passwords for connection, I already encrypt the passwords by using the fernet key, and I also changed the airflow conf file sensitive_var_conn_names = comma,separated,sensitive,names,password,secret,extra,passwd but still i could see the passwords in airflow logs anyone can help how to mask the connections passwords in airflow logs. |
I think (Hard to say) that your problem can be solved when #24362 is merged. |
Mask secret values from connections and variables
This masks secret values in logs for Connections and Variables.
It behaves as follows:
Connection passwords are always masked, where-ever they appear.
This means, if a connection has a password of
a
, then everya
inlog messages would get replaced with
***
"Sensitive" keys from extra_dejson are also masked. Sensitive is
defined by the "existing" mechanism that the UI used, based upon the
name of the key.
"Sensitive" Variables are also masked.
Still to do:
Example
Given this dag:
We get the following lines in the log for the python operator:
and for the Bash operator
And in the UI it looks like this
Closes #8421