Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide sensitive data in UI #8421

Closed
n4rk0o opened this issue Apr 17, 2020 · 24 comments · Fixed by #15599
Closed

Hide sensitive data in UI #8421

n4rk0o opened this issue Apr 17, 2020 · 24 comments · Fixed by #15599
Assignees
Labels
kind:feature Feature Requests
Milestone

Comments

@n4rk0o
Copy link

n4rk0o commented Apr 17, 2020

Description

I'm using Airflow for 2 years now and I have a plugin that get password for a specific account in a Vault and then push it through a XCOM to reuse it on another tasks.

The fact is that if the value is sensitive like a password, I can't hide it in the UI except for XCOM if I add an underscore in the prefix name of the key value.

Eg: kwargs['ti'].xcom_push('key':'_password', 'value':'my_value')

But for rendered template UI page, I didn't find anything similar, so if I try to pull a XCOM, it will show the value in the UI and I want to avoid it.

Maybe is it possible to add a condition in https://github.com/apache/airflow/blob/master/airflow/www/views.py after line 635

elif template_field.startswith('_'):
    html_dict[template_field] = ("<pre><code>sensitive data will not be exposed here</pre></code>")

Use case / motivation

I know that I can use connections but in my case, and due to security politic in my company, we have to store it in a dedicated Vault.

Related Issues

N/A

@n4rk0o n4rk0o added the kind:feature Feature Requests label Apr 17, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 17, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@kaxil
Copy link
Member

kaxil commented Apr 17, 2020

Airflow 1.10.10 allows getting connections from a Vault: https://airflow.apache.org/blog/airflow-1.10.10/#allow-retrieving-airflow-connections-variables-from-various-secrets-backend

Does that help your use-case?

@n4rk0o
Copy link
Author

n4rk0o commented Apr 17, 2020

Airflow 1.10.10 allows getting connections from a Vault: https://airflow.apache.org/blog/airflow-1.10.10/#allow-retrieving-airflow-connections-variables-from-various-secrets-backend

Does that help your use-case?

So basically, with Airflow 1.10.10, if I configure the airflow.cfg to use Hashicorp Vault, I can use connections and variables as usual but instead of getting data from Airflow database, it will got it from Vault?

@kaxil
Copy link
Member

kaxil commented Apr 17, 2020

Airflow 1.10.10 allows getting connections from a Vault: https://airflow.apache.org/blog/airflow-1.10.10/#allow-retrieving-airflow-connections-variables-from-various-secrets-backend
Does that help your use-case?

So basically, with Airflow 1.10.10, if I configure the airflow.cfg to use Hashicorp Vault, I can use connections and variables as usual but instead of getting data from Airflow database, it will got it from Vault?

Exactly. Here is one of the guide: https://www.astronomer.io/guides/airflow-and-hashicorp-vault/ to test it out locally and following docs:

@marcusianlevine
Copy link
Contributor

marcusianlevine commented Aug 10, 2020

@kaxil I'm trying to implement a very similar workflow using the new VaultBackend for Variables storage, but I noticed that sensitive Variables fetched from Vault can be leaked out from the UI via Rendered Templates if you include a Variable value in a field that is templated

Could we take a similar approach to AIRFLOW-45 and mask any Variables that match a set of pattern names in the Rendered Templates view?

@kaxil
Copy link
Member

kaxil commented Aug 11, 2020

aah I see, apologies @n4rk0o I should have read your description more carefully. Yes the Rendered UI Field currently exposes everything. We should have a way of hiding this.

I see two options here:

  1. Same as what @marcusianlevine described
  2. Have a flag in airflow.cfg to hide rendered templated_fields globally. Sometime though users would love to check if the field was rendered correctly and hence it acts as a good debugging tool.

@kaxil kaxil added this to the Airflow 2.0.0 milestone Aug 11, 2020
@marcusianlevine
Copy link
Contributor

marcusianlevine commented Aug 11, 2020

I do like the idea of having this be an optional behavior

I think it would be really great if we could selectively obscure Variables too - some are sensitive while others are not, which is why I like the approach in #1530 of using a list of patterns

Also, I noticed that Variables can be leaked via the Task Instance Details page too, even if the fields aren't rendered templates, so we should be sure to mask sensitive variables everywhere we can in the UI (obviously if they are logged by a task it's still possible to deliberately leak them through the logs...)

@kaxil
Copy link
Member

kaxil commented Aug 11, 2020

100% agree. Would you like to work on a proposal or a PR for that one, it would be definitely good to get this in for 2.0

@marcusianlevine
Copy link
Contributor

marcusianlevine commented Aug 11, 2020

Hmm it's been awhile since I've worked on the UI

I'll do some digging in the code and see if I can come up with a reasonable design...

Do you happen to know off-hand if there is any way to tell at run-time whether DAG code is being compiled by the webserver?

Because we need the Variable's key to do the conditional masking, we really need to mask the Variable values at the time that Variable.get is called, otherwise we'd have to search for every possible Variable value in every field's contents...

I'm also not sure how this would work with serialized DAGs: once they're serialized they will have the unmasked values, so if the webserver just reads the serialized DAG directly we'd have no way to know whether to mask certain Variables since they would already be just plain values

Maybe we need a more dynamic way of injecting Variables rather than reading them at DAG compile time and writing their values into the DB? I'm thinking some kind of Variable pointer syntax like <Var key=my-variable> that gets rendered differently based on whether the code is being run on the webserver or not

@kaxil
Copy link
Member

kaxil commented Aug 26, 2020

This should happen before the templates are rendered I think for hiding sensitive info in Rendered Fields

@ausiddiqui
Copy link

I have a similar use case, but passing the password from a connection object I created in the UI to the environment variable in the KubernetesPodOperator and it is appearing in plain text in the Rendered Template part of the UI for the task. There should be a way to avoid this being printed and visible.

@marcusianlevine
Copy link
Contributor

I should have some time to work on a proposal for this in the next few weeks, but I'm still not sure the best way to approach the design

The key question I'm stuck on is at what point the templates get filled in... obviously the executors need to have access to the unmasked values, but if they get injected at DAG compile time then they will be visible in the UI, so I'm guessing the unmasked injection needs to happen at task execution time

@rafaelpierre
Copy link

+1

@marcusianlevine have you succeeded on implementing this? I can also help if needed.

@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

I’m thinking about a combination of the two approaches:

  1. airflow.cfg will gain a flag (show_templated_fields) to control the behaviour on this.
  2. The value can also be set to all (show all fields), none (hide all), or public (hides fields that starts with an underscore).
  3. One of them is the default, probably all for backwards compatibility?

All proposed names are subject to discussion. I’m going to look into the implementation.

@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

WIP PR at #15151.

@kaxil
Copy link
Member

kaxil commented Apr 2, 2021

@uranusjr Actually how about controlling the template_field view via FAB Permissions?

That will allow Admins (and the roles that they allow) to still check those values for debugging but not all.

@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

Makes sense as well. Do you think there should be some kind of sensitive field naming convention (e.g. leading underscore as suggested previously) and separate permissions to control “public” and sensitive field visibilities, or just one blanket permission for all fields would suffice?

@kaxil
Copy link
Member

kaxil commented Apr 2, 2021

Makes sense as well. Do you think there should be some kind of sensitive field naming convention (e.g. leading underscore as suggested previously) and separate permissions to control “public” and sensitive field visibilities, or just one blanket permission for all fields would suffice?

re: Naming convention

The problem with that is that it is too big of a change and affects all the Operators as template_field, example:

template_fields = ('bash_command', 'env')
template_fields_renderers = {'bash_command': 'bash', 'env': 'json'}
template_ext = (
'.sh',
'.bash',
)
ui_color = '#f0ede4'
@apply_defaults
def __init__(
self,
*,
bash_command: str,
env: Optional[Dict[str, str]] = None,
output_encoding: str = 'utf-8',
skip_exit_code: int = 99,
**kwargs,
) -> None:
super().__init__(**kwargs)
self.bash_command = bash_command

and that it varies sometimes per user, as they might pass a sensitive value to BashOperator via a connection/variable to use an environment variable containing sensitive information.

Hence I think the blanket permission for all field should suffice

@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

Alternative solution based on permission is in #15158.

@rcramblit
Copy link

rcramblit commented Apr 16, 2021

I have a similar need that might not be covered by the solutions posed in this thread:
I am using the DockerOperator in Airflow 1.10.15 and want to pass some secrets to the docker container via DockerOperator's environment argument, but this argument is rendered in the AirflowUI in the Task Instance Details view. Is there any way to control what arguments are rendered there?

edit: I see there is a private_environment in the DockerOperator plugin for 2.x, so I am copying that functionality by overriding the _run_image method in a sublcass - in case anyone else happens along this issue.

@kaxil
Copy link
Member

kaxil commented Apr 27, 2021

Let's include hiding sensitive data everywhere including logs & rendered templated fields

@uranusjr
Copy link
Member

What needs to be done to hide sensitive data in logs? There’s already permission to hide task logs (RESOURCE_TASK_LOG).

@kaxil
Copy link
Member

kaxil commented Apr 29, 2021

What needs to be done to hide sensitive data in logs? There’s already permission to hide task logs (RESOURCE_TASK_LOG).

See #15599

@ppatel8-wooliex
Copy link

aah I see, apologies @n4rk0o I should have read your description more carefully. Yes the Rendered UI Field currently exposes everything. We should have a way of hiding this.

I see two options here:

  1. Same as what @marcusianlevine described
  2. Have a flag in airflow.cfg to hide rendered templated_fields globally. Sometime though users would love to check if the field was rendered correctly and hence it acts as a good debugging tool.

when you use vault to get the secret then before passing as params you can mask it. It will display as **** in log as well as rendered template.

you can use the below code to mask the secret from the Vault.

from airflow.utils.log.secrets_masker import mask_secret

openssl_service_account_key_read_response = client.secrets.kv.read_secret_version(path=openssl_service_account_secret_path,mount_point=vault_mount_point)
service_account_secret = openssl_service_account_key_read_response['data']['data'][openssl_service_account_key]
mask_secret(service_account_secret)

  sample_commands=dedent('''
    set +x;
    pwd;
    cd /home/airflow/gcs/data/bin/
    chmod 775 -R .;
    ./bin/test.sh '{{ params.service_account_secret }}';
    ''')
    task_1 = BashOperator(
        task_id='',
        bash_command=sample_commands,
        params={
        'service_account_secret' : service_account_secret
        }
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
9 participants