Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Render source nodes #1229

Open
1 task
was-av opened this issue Sep 27, 2024 · 5 comments
Open
1 task

[Bug] Render source nodes #1229

was-av opened this issue Sep 27, 2024 · 5 comments
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:list Primarily related to dbt list command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing profile:clickhouse Related to Clickhouse ProfileConfig triage-needed Items need to be reviewed / assigned to milestone

Comments

@was-av
Copy link

was-av commented Sep 27, 2024

Astronomer Cosmos Version

Other Astronomer Cosmos version (please specify below)

If "Other Astronomer Cosmos version" selected, which one?

1.6.0

dbt-core version

1.3.2

Versions of dbt adapters

dbt-clickhouse==1.3.3
dbt-core==1.3.2
dbt-extractor==0.4.1

LoadMode

AUTOMATIC

ExecutionMode

LOCAL

InvocationMode

DBT_RUNNER

airflow version

2.10.1

Operating System

"Debian GNU/Linux 11 (bullseye)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened?

I add sources to Airflow DAG by selecting source_rendering_behavior equal to SourceRenderingBehavior.ALL and get Error described below.

Relevant log output

[2024-09-27T09:53:31.951+0000] {graph.py:136} INFO - Running command: `/opt/airflow/dbt_venv/bin/dbt ls --output json --output-keys name unique_id resource_type depends_on original_file_path tags config freshness --project-dir /tmp/tmp4fja0ww3 --profiles-dir /workspace/transform/clickhouse_dbt --profile clickhouse_dbt --target prod --vars {"logical_date": "{{ ds }}"} --selector daily`
Traceback (most recent call last):
  File "/workspace/dags/dbt_cosmos.py", line 124, in <module>
    globals()[dag_id] = build_dbt_dag(dag_id, config)
  File "/workspace/dags/dbt_cosmos.py", line 106, in build_dbt_dag
    return dbt_cosmos()
  File "/usr/local/lib/python3.10/site-packages/airflow/models/dag.py", line 4307, in factory
    f(**f_kwargs)
  File "/workspace/dags/dbt_cosmos.py", line 85, in dbt_cosmos
    dbt_run_and_test = DbtTaskGroup(
  File "/usr/local/lib/python3.10/site-packages/cosmos/airflow/task_group.py", line 28, in __init__
    DbtToAirflowConverter.__init__(self, *args, **specific_kwargs(**kwargs))
  File "/usr/local/lib/python3.10/site-packages/cosmos/converter.py", line 261, in __init__
    self.dbt_graph.load(method=render_config.load_method, execution_mode=execution_config.execution_mode)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 402, in load
    self.load_via_dbt_ls()
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 461, in load_via_dbt_ls
    self.load_via_dbt_ls_without_cache()
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 581, in load_via_dbt_ls_without_cache
    nodes = self.run_dbt_ls(dbt_cmd, self.project_path, tmpdir_path, env)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 442, in run_dbt_ls
    stdout = run_command(ls_command, tmp_dir, env_vars)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 156, in run_command
    raise CosmosLoadDbtException(f"Unable to run {command} due to the error:\n{details}")
cosmos.dbt.graph.CosmosLoadDbtException: Unable to run ['/opt/airflow/dbt_venv/bin/dbt', 'ls', '--output', 'json', '--output-keys', 'name', 'unique_id', 'resource_type', 'depends_on', 'original_file_path', 'tags', 'config', 'freshness', '--project-dir', '/tmp/tmp4fja0ww3', '--profiles-dir', '/workspace/transform/clickhouse_dbt', '--profile', 'clickhouse_dbt', '--target', 'prod', '--vars', '{"logical_date": "{{ ds }}"}', '--selector', 'daily'] due to the error:
usage: dbt [-h] [--version] [-r RECORD_TIMING_INFO] [-d]
           [--log-format {text,json,default}] [--no-write-json]
           [--use-colors | --no-use-colors] [--printer-width PRINTER_WIDTH]
           [--warn-error] [--no-version-check]
           [--partial-parse | --no-partial-parse] [--use-experimental-parser]
           [--no-static-parser] [--profiles-dir PROFILES_DIR]
           [--no-anonymous-usage-stats] [-x]
           [--event-buffer-size EVENT_BUFFER_SIZE] [-q] [--no-print]
           [--cache-selected-only | --no-cache-selected-only]
           {docs,source,init,clean,debug,deps,list,ls,build,snapshot,run,compile,parse,test,seed,run-operation}
           ...
dbt: error: unrecognized arguments: unique_id resource_type depends_on original_file_path tags config freshness

How to reproduce

  1. Create dbt project with sources
  2. Create airflow dag with DbtTaskGroup
  3. Use RenderConfig with parameter source_rendering_behavior=SourceRenderingBehavior.ALL
  4. Run python file with that dag

Anything else :)?

Error related to code in file cosmos/dbt/graph.py

    def run_dbt_ls(
        self, dbt_cmd: str, project_path: Path, tmp_dir: Path, env_vars: dict[str, str]
    ) -> dict[str, DbtNode]:
        """Runs dbt ls command and returns the parsed nodes."""
        if self.render_config.source_rendering_behavior != SourceRenderingBehavior.NONE:
            ls_command = [
                dbt_cmd,
                "ls",
                "--output",
                "json",
                "--output-keys",
                "name",
                "unique_id",
                "resource_type",
                "depends_on",
                "original_file_path",
                "tags",
                "config",
                "freshness",
            ]
        else:
            ls_command = [dbt_cmd, "ls", "--output", "json"]

Options value of --output-keys must added as one string.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

[email protected]

@was-av was-av added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Sep 27, 2024
Copy link

dosubot bot commented Sep 27, 2024

Hey @was-av! 👋 I'm here to help you with bug fixes, answer your questions, and guide you on becoming a contributor.

The error you're encountering is due to the --output-keys argument in the dbt ls command being passed as separate strings instead of a single string. This causes dbt to misinterpret the arguments.

To fix this, you need to modify the run_dbt_ls function in cosmos/dbt/graph.py to concatenate the --output-keys values into a single string. Here is the corrected code snippet:

def run_dbt_ls(
    self, dbt_cmd: str, project_path: Path, tmp_dir: Path, env_vars: dict[str, str]
) -> dict[str, DbtNode]:
    """Runs dbt ls command and returns the parsed nodes."""
    if self.render_config.source_rendering_behavior != SourceRenderingBehavior.NONE:
        ls_command = [
            dbt_cmd,
            "ls",
            "--output",
            "json",
            "--output-keys",
            "name unique_id resource_type depends_on original_file_path tags config freshness",
        ]
    else:
        ls_command = [dbt_cmd, "ls", "--output", "json"]

This change ensures that the --output-keys argument is correctly formatted as a single string, which should resolve the error you're seeing [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:list Primarily related to dbt list command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing profile:clickhouse Related to Clickhouse ProfileConfig labels Sep 27, 2024
@pankajastro
Copy link
Contributor

Hey @was-av, The source node rendering feature is available only for dbt-core >= 1.5 https://astronomer.github.io/astronomer-cosmos/configuration/source-nodes-rendering.html

@was-av was-av changed the title [Bug] [Bug] Render source nodes Sep 27, 2024
@was-av
Copy link
Author

was-av commented Sep 27, 2024

@pankajastro thanks for your quick answer.
Could you explain why the source node rendering feature required dbt-core min version 1.5?

@tatiana
Copy link
Collaborator

tatiana commented Sep 30, 2024

@was-av would there be any blocker for you to upgrade from dbt 1.3.2 to 1.5?

The source feature should also work with older versions of dbt as long as you use Cosmos LoadMode.MANIFEST. Would you like to try it?

When using LoadMode.DBT_LS, as you're using, we rely on using dbt ls command flags/output. When implementing the source feature, we realised that dbt-core changed the interface of the CLI to expose additional keys, such as freshness, that were necessary to implement the source nodes.

To support both pre-dbt-1.5 and post-dbt-1.5 interfaces would add an additional performance cost to run Cosmos, since we'd need to find which version of dbt-core the end-user was using - which, when running LoadMode.DBT_LS would mean running an additional subprocess in the scheduler/DAG processor. Since approximately 84% of Cosmos users are using dbt 1.5 and higher, we believed the cost to support both was higher than the benefit.

We're happy to be persuaded otherwise!

@tatiana tatiana removed the bug Something isn't working label Oct 1, 2024
@was-av
Copy link
Author

was-av commented Oct 4, 2024

I've had no blockers and upgraded to the last dbt version.

Core:
  - installed: 1.8.7
  - latest:    1.8.7 - Up to date!

Plugins:
  - clickhouse: 1.8.4 - Up to date!

I steel get error

[2024-10-04T20:34:17.939+0000] {graph.py:136} INFO - Running command: `/opt/airflow/dbt_venv/bin/dbt ls --output json --output-keys name unique_id resource_type depends_on original_file_path tags config freshness --project-dir /tmp/tmp5kgd4hdo --profiles-dir /workspace/transform/clickhouse_dbt --profile clickhouse_dbt --target prod --vars {"logical_date": "{{ ds }}"} --selector daily`
Traceback (most recent call last):
  File "/workspace/dags/edu/az_dbt_cosmos.py", line 124, in <module>
    globals()[dag_id] = build_dbt_dag(dag_id, config)
  File "/workspace/dags/edu/az_dbt_cosmos.py", line 106, in build_dbt_dag
    return dbt_cosmos()
  File "/usr/local/lib/python3.10/site-packages/airflow/models/dag.py", line 4307, in factory
    f(**f_kwargs)
  File "/workspace/dags/edu/az_dbt_cosmos.py", line 85, in dbt_cosmos
    dbt_run_and_test = DbtTaskGroup(
  File "/usr/local/lib/python3.10/site-packages/cosmos/airflow/task_group.py", line 28, in __init__
    DbtToAirflowConverter.__init__(self, *args, **specific_kwargs(**kwargs))
  File "/usr/local/lib/python3.10/site-packages/cosmos/converter.py", line 261, in __init__
    self.dbt_graph.load(method=render_config.load_method, execution_mode=execution_config.execution_mode)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 402, in load
    self.load_via_dbt_ls()
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 461, in load_via_dbt_ls
    self.load_via_dbt_ls_without_cache()
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 581, in load_via_dbt_ls_without_cache
    nodes = self.run_dbt_ls(dbt_cmd, self.project_path, tmpdir_path, env)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 442, in run_dbt_ls
    stdout = run_command(ls_command, tmp_dir, env_vars)
  File "/usr/local/lib/python3.10/site-packages/cosmos/dbt/graph.py", line 156, in run_command
    raise CosmosLoadDbtException(f"Unable to run {command} due to the error:\n{details}")
cosmos.dbt.graph.CosmosLoadDbtException: Unable to run ['/opt/airflow/dbt_venv/bin/dbt', 'ls', '--output', 'json', '--output-keys', 'name', 'unique_id', 'resource_type', 'depends_on', 'original_file_path', 'tags', 'config', 'freshness', '--project-dir', '/tmp/tmp5kgd4hdo', '--profiles-dir', '/workspace/transform/clickhouse_dbt', '--profile', 'clickhouse_dbt', '--target', 'prod', '--vars', '{"logical_date": "{{ ds }}"}', '--selector', 'daily'] due to the error:
usage: dbt [-h] [--version] [-r RECORD_TIMING_INFO] [-d]
           [--log-format {text,json,default}] [--no-write-json]
           [--use-colors | --no-use-colors] [--printer-width PRINTER_WIDTH]
           [--warn-error] [--no-version-check]
           [--partial-parse | --no-partial-parse] [--use-experimental-parser]
           [--no-static-parser] [--profiles-dir PROFILES_DIR]
           [--no-anonymous-usage-stats] [-x]
           [--event-buffer-size EVENT_BUFFER_SIZE] [-q] [--no-print]
           [--cache-selected-only | --no-cache-selected-only]
           {docs,source,init,clean,debug,deps,list,ls,build,snapshot,run,compile,parse,test,seed,run-operation}
           ...
dbt: error: unrecognized arguments: unique_id resource_type depends_on original_file_path tags config freshness

Also, I could not use LoadMode.MANIFEST because it does not support dbt selector

@tatiana would you overview the proposal from dosubo with a fix ls_command filling
#1229 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:list Primarily related to dbt list command or functionality execution:local Related to Local execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing profile:clickhouse Related to Clickhouse ProfileConfig triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

3 participants