Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Select the nodes to run or skip within a DbtDAG at runtime #1228

Open
1 task done
SoheilSalmani opened this issue Sep 25, 2024 · 0 comments
Open
1 task done
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc area:selector Related to selector, like DAG selector, DBT selector, etc enhancement New feature or request execution:virtualenv Related to Virtualenv execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc triage-needed Items need to be reviewed / assigned to milestone

Comments

@SoheilSalmani
Copy link

SoheilSalmani commented Sep 25, 2024

Description

I would like to be able to select the tasks to run and to skip within a DbtDAG at runtime.

Use case/motivation

Suppose I have 4 massive tables built using incremental materialization in dbt, and they are all in the same DbtDag. If I introduce breaking changes to one of them, I would like to be able to full_refresh it without running the other nodes. I think it would be great to have the possibility to select the nodes to run at runtime if I only want to full refresh one of them (and their dependencies), typically by using the dbt selection syntax in an Airflow Param when manually triggering the DAG.

I was thinking about adding a select and exclude parameters in operator_args. We would be able to select the nodes to run in the DAG at runtime (among the nodes that were rendered using the render_config), and skip the others:

dag = DbtDag(
    dag_id=dag_id,
    project_config=ProjectConfig(dbt_project_path=dbt_project_path),
    render_config=RenderConfig(
        # Render every node in the `marketing` folder
        select=["marketing"],
        emit_datasets=False,
    ),
    profile_config=profile_config,
    execution_config=venv_execution_config,
    schedule=None,
    operator_args={"select": "{{ params.select }}", "exclude": "{{ params.exclude }}"},
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["dbt"],
    params={
        "full_refresh": Param(
            False, type="boolean", description="Full-refresh models?"
        ),
        # Passing `source:google_analytics+` at runtime will only run the tasks that
        # depend on the `google_analytics` source, and skip the rest.
        "select": Param(
            None,
            type=["string", "null"],
            description="dbt selection syntax to select tasks to run within the DAG.",
        ),
        "exclude": Param(
            None,
            type=["string", "null"],
            description="dbt selection syntax to select tasks to skip within the DAG.",
        ),
    },
    render_template_as_native_obj=True,
)

image

I assume one of the main challenge would be to only select and exclude a subset of the nodes that were rendered in the DAG, so that we do not run a dbt command that interferes with models outside of the DAG scope.

Related issues

No response

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@SoheilSalmani SoheilSalmani added enhancement New feature or request triage-needed Items need to be reviewed / assigned to milestone labels Sep 25, 2024
@dosubot dosubot bot added area:rendering Related to rendering, like Jinja, Airflow tasks, etc area:selector Related to selector, like DAG selector, DBT selector, etc execution:virtualenv Related to Virtualenv execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc area:selector Related to selector, like DAG selector, DBT selector, etc enhancement New feature or request execution:virtualenv Related to Virtualenv execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

1 participant