Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: operator_args are modified in place in the Airflow converter #833

Closed
jbandoro opened this issue Feb 5, 2024 · 1 comment · Fixed by #835
Closed

Bug: operator_args are modified in place in the Airflow converter #833

jbandoro opened this issue Feb 5, 2024 · 1 comment · Fixed by #835
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc

Comments

@jbandoro
Copy link
Collaborator

jbandoro commented Feb 5, 2024

A user reported this in the Slack thread here. Even though we deprecated operator_args vars and env in favor of ProjectConfig.dbt_vars and ProjectConfig.env_vars, there is now a problem where if a user supplies operator_args to multiple DbtDag or DbtTaskGroup's with vars on env, they will only be used in the first dag/task group object.

Example problem (copied from the user report)

OSMOS_COMMON_RENDER_CONF = dict(
    emit_datasets=False, test_behavior=TestBehavior.AFTER_EACH, load_method=LoadMode.CUSTOM, dbt_deps=False
)


with DAG(
        dag_id="my-amazing-dag",
        default_args=default_args,
        schedule="50 4 * * *",
) as dag:
    dbt_task_group_base_config = {
        "project_config": MY_WONDERFUL_PROJECT_COSMOS_PROJ_CONF,
        "profile_config": MY_WONDERFUL_PROJECT_COSMOS_PROFILE_CONF,
        "execution_config": BI_PLATFORM_COSMOS_EXEC_CONF,
        "operator_args": {
            "vars": {"logical_date": "{{ execution_date }}"},
        },
    }

    dbt_tasks_google_data = DbtTaskGroup(
        group_id="dbt_tasks_google_data",
        render_config=RenderConfig(
            **COSMOS_COMMON_RENDER_CONF,
            select=[
                "config.tags:gam",
                "config.tags:dfp_perf",
                "path:models/staging/google",
                "path:models/marts/google",
                "path:models/reporting",
            ]
        ),
        **dbt_task_group_base_config,
    )

    dbt_tasks_facebook_data = DbtTaskGroup(
        group_id="dbt_tasks_facebook_data",
        render_config=RenderConfig(
            **COSMOS_COMMON_RENDER_CONF,
            select=[
                "path:models/staging/facebook",
                "path:models/marts/facebook",
            ]
        ),
        **dbt_task_group_base_config,  # logical_date doesn't get passed
    )

    dbt_tasks_customer_portal = DbtTaskGroup(
        group_id="dbt_tasks_customer_portal",
        render_config=RenderConfig(
            **COSMOS_COMMON_RENDER_CONF,
            select=[
                "path:seeds/customer_portal",
                "path:models/staging/customer_portal",
                "path:models/marts/customer_portal",
            ]
        ),
        **dbt_task_group_base_config, # logical_date doesn't get passed
    )

The responsible lines for this bug are in the converter is below:

env_vars = project_config.env_vars or operator_args.pop("env", None)
dbt_vars = project_config.dbt_vars or operator_args.pop("vars", None)

The fix is really straightforward, to use get instead of pop in the lines above, this is ok because Cosmos will already raise an exception if both operator_args and ProjectConfig with env vars or dbt vars are both set here.

Copy link

dosubot bot commented Feb 5, 2024

🤖

Thank you for bringing this issue to our attention! It's great to see that you've identified a straightforward fix for the problem. Your contribution to the repository would be highly valuable. We encourage you to go ahead and create a pull request with your proposed fix. Your effort will help improve the overall quality of the project. Keep up the good work!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Feb 5, 2024
jbandoro added a commit that referenced this issue Feb 16, 2024
## Description

This is a fix for #833 where a user in Slack reported a bug where if the
same `operator_args` are used in multiple `DbtTaskGroup`'s the `vars`
were only used in the first task group.

I added a test which would have caught the bug, so that it isn't
reintroduced later.

## Related Issue(s)

closes #833 

## Breaking Change?

None

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [x] I have added tests that prove my fix is effective or that my
feature works
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
…er#835)

## Description

This is a fix for astronomer#833 where a user in Slack reported a bug where if the
same `operator_args` are used in multiple `DbtTaskGroup`'s the `vars`
were only used in the first task group.

I added a test which would have caught the bug, so that it isn't
reintroduced later.

## Related Issue(s)

closes astronomer#833 

## Breaking Change?

None

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [x] I have added tests that prove my fix is effective or that my
feature works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant