Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensitive variables don't get masked when rendered with airflow tasks test #17476

Closed
marclamberti opened this issue Aug 6, 2021 · 8 comments · Fixed by #24362
Closed

Sensitive variables don't get masked when rendered with airflow tasks test #17476

marclamberti opened this issue Aug 6, 2021 · 8 comments · Fixed by #24362
Labels
affected_version:2.1 Issues Reported for 2.1 good first issue kind:bug This is a clearly a bug priority:high High priority bug that should be patched quickly but does not require immediate new release
Milestone

Comments

@marclamberti
Copy link

marclamberti commented Aug 6, 2021

Apache Airflow version: 2.1.2

Kubernetes version (if you are using kubernetes) (use kubectl version): No

Environment:

  • Cloud provider or hardware configuration: No
  • OS (e.g. from /etc/os-release): MacOS Big Sur 11.4
  • Kernel (e.g. uname -a): -
  • Install tools: -
  • Others: -

What happened:

With the following code:

from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

from datetime import datetime, timedelta

def _extract():
    partner = Variable.get("my_dag_partner_secret")
    print(partner)

with DAG("my_dag", start_date=datetime(2021, 1 , 1), schedule_interval="@daily") as dag:

    extract = PythonOperator(
        task_id="extract",
        python_callable=_extract
    )

By executing the command

airflow tasks test my_dag extract 2021-01-01

The value of the variable my_dag_partner_secret gets rendered in the logs whereas it shouldn't

[2021-08-06 19:05:30,088] {taskinstance.py:1303} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=my_dag
AIRFLOW_CTX_TASK_ID=extract
AIRFLOW_CTX_EXECUTION_DATE=2021-01-01T00:00:00+00:00
partner_a
[2021-08-06 19:05:30,091] {python.py:151} INFO - Done. Returned value was: None
[2021-08-06 19:05:30,096] {taskinstance.py:1212} INFO - Marking task as SUCCESS. dag_id=my_dag, task_id=extract, execution_date=20210101T000000, start_date=20210806T131013, end_date=20210806T190530

What you expected to happen:

The value should be masked like on the UI or in the logs

How to reproduce it:

DAG given above

Anything else we need to know:

Nop

@marclamberti marclamberti added the kind:bug This is a clearly a bug label Aug 6, 2021
@ashb ashb added the priority:high High priority bug that should be patched quickly but does not require immediate new release label Aug 6, 2021
@ashb ashb added this to the Airflow 2.1.3 milestone Aug 6, 2021
@ashb ashb added the affected_version:2.1 Issues Reported for 2.1 label Aug 6, 2021
@ShakaibKhan
Copy link
Contributor

I can taking a look into this. Will try to reproduce.

@kaxil kaxil modified the milestones: Airflow 2.1.3, Airflow 2.2 Aug 20, 2021
@ShakaibKhan
Copy link
Contributor

I repeated above instructions and was able reproduce. So it looks like the airflow/util/log/secrets_masker.py:should_hide_value_for_key() is where the check for hiding a sensitive variable happens. Looking into the workflow of running DAGs from cli and if it hits this function/class

@ShakaibKhan
Copy link
Contributor

So from [1]: "The automatic masking is triggered by Connection or Variable access. This means that if you pass a sensitive value via XCom or any other side-channel it will not be masked when printed in the downstream task." and going into task-command::task_test it looks like logs are being propagated so they are no longer masked. Seems to be WAI
[1] https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/index.html#masking-sensitive-data

@artful88533
Copy link

Hello!
Any news?
It is critical for us to display secrets in the CI log (

@potiuk
Copy link
Member

potiuk commented Nov 18, 2021

@artful88533 -> suggestion - maybe you or @ShakaibKhan can take a look and provide fix for that? If you care about it, this is the most "certain" way to make it land in 2.2.3 - way more certain than pinging here - and becoming one of the > 1800 contributors to Airflow is a great way to pay back for the free software you use.

IMHO it is not crirtical (when you have access to the CLI you already can read all the secretes in whatever way you want) but if you need to get it for your CI process, maybe that's a good incentive for you to fix it ? It seems that this is just a question of adding secret masker as filter when the CLI commands are run.

@alex-astronomer
Copy link
Contributor

Actually, I think this might be more critical than it looks at first glance @potiuk. By running the airflow tasks test ... command through a BashOperator in a separate DAG, which tests the task from the DAG that Marc linked in the original issue, it's actually possible to display unmasked secrets through the Airflow Web UI.

@potiuk
Copy link
Member

potiuk commented Feb 1, 2022

Actually, I think this might be more critical than it looks at first glance @potiuk. By running the airflow tasks test ... command through a BashOperator in a separate DAG, which tests the task from the DAG that Marc linked in the original issue, it's actually possible to display unmasked secrets through the Airflow Web UI.

Why would you want to run "airflow tasks test" in DAG? Is this a valid casethat is likely? Maybe I am not understnding something, but I am not sure I see the case when it could be used in "production" in a valid scenario?

Just to give a bit of context - there are many ways you could print the unmasked values for connections, variables, even secrets. For example you could easily launch a subprocess calling "python -c print(Connection.get('conn_id'))" or just running "airlfow connection list" a as a command to print unmasked paswords.

The way how masking is done currently will not prevent this if the Connection is not used in the task before. So "Masking" is not "total prevention" of showing the secret values, it just prevents from accidental printing of those in the "regular use cases". There are many ways how DAG writer could print those and bypass secrets masker deliberately. So my question is - how likely and "normal" it is to run 'airlfow test" inside the "execute" method of a task. I think very unlikely.

And BTW in the future when we implement DB-less mode and maybe even (as a follow up) we will further harden Airlfow to not be able to reach out to read Airflow DB at all, this might be more "hardened" but as of currently we have no mechanism to prevent the DAG writers to print any secret they want to the task log. That's simply impossible.

@alex-astronomer
Copy link
Contributor

That's very true. Thanks for explaining your reasoning about this.

alex-astronomer added a commit to alex-astronomer/airflow that referenced this issue Feb 2, 2022
…he#17476)

Add a context manager to secrets_masker inside
of which stdout is captured, redacted according
to the filters on 'airflow.task' logger and then
spit back into stdout.  Filters stdout that doesn't
go through logger according to airflow.task logger's
filters.
@ashb ashb modified the milestones: Airflow 2.3.0, Airflow 2.3.1 Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.1 Issues Reported for 2.1 good first issue kind:bug This is a clearly a bug priority:high High priority bug that should be patched quickly but does not require immediate new release
Projects
None yet
9 participants