-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dbt ls
to run from a temporary directory
#414
Conversation
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
2353167
to
5821a9c
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #414 +/- ##
==========================================
+ Coverage 91.43% 91.51% +0.07%
==========================================
Files 50 50
Lines 1752 1768 +16
==========================================
+ Hits 1602 1618 +16
Misses 150 150
☔ View full report in Codecov by Sentry. |
29171f9
to
e2ecb39
Compare
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there. Closes: #411
…a separate dir Unfortunately this does not work in dbt 1.5 or previous versions
e2ecb39
to
4360fd1
Compare
Thanks, @jlaneve , I think I addressed all the feedback! |
A different approach we could adopt to solve this issue is not to pass Still, this change will not solve the limitation that ATM local operators run things locally. @jlaneve @harels is the scope of this ticket only |
Feature (pending documentation!) * Support dbt global flags (via dbt_cmd_global_flags in `operator_args` by @tatiana in #469 Enhancements * Hide sensitive field when using BigQuery keyfile_dict profile mapping by @jbandoro in #471 Bug fixes * Fix bug on select node add exclude selector subset ids logic by @jensenity in #463 * Refactor dbt ls to run from a temporary directory, to avoid Read-only file system errors during DAG parsing, by @tatiana in #414 Others * Docs: Fix RenderConfig load argument by @jbandoro in #466 * Enable CI integration tests from external forks by @tatiana in #458 * Improve CI tests runtime by @tatiana in #457 * Change CI to run coverage after tests pass by @tatiana in #461 * Fix forks code revision in code coverage by @tatiana in #472 * [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467" i
Since Cosmos 1.0, `load_method.DBT_LS` is the default dbt project parsing method, unless the user gives a manifest. Using the original dbt project path has been a source of issues when that path is Read-Only. This issue was faced when running commands that generate `{project-dir}/target/` and `{project-dir}/logs/`, which was solved as part of #414. This issue is particularly problematic if we want to run `dbt deps` from the original project directory since dbt 1.6 saves adaptors to `{project_dir}/dbt_packages` unless specified in the user's `dbt_project.yml`. To our knowledge, dbt currently does not allow users to define this directory via flags or environment variables, as discussed in #481. This change aims to solve these issues, by creating a temporary directory and creating symbolic links to the original directory. Finally, during the development of this task, it was observed that when running dbt ls in a project with `packages.yml`, dbt raises a 'Compilation Error'. Since dbt may raise other errors in stdout, this PR captures "Errors" more generically - making it more evident potential issues to the end-users.
**Features** * Support dbt global flags (via dbt_cmd_global_flags in operator_args) by @tatiana in #469 * Support parsing DAGs when there are no connections by @jlaneve in #489 **Enhancements** * Hide sensitive field when using BigQuery keyfile_dict profile mapping by @jbandoro in #471 * Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage package <https://pypi.org/project/openlineage-integration-common/>`_ by @tatiana in #485. `Read more <https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_. * Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with symbolic links by @tatiana in #488 * Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in #481 * Update Cosmos log color to purple by @harels in #494 * Change operators to log ``dbt`` commands output as opposed to recording to XCom by @tatiana in #513 **Bug fixes** * Fix bug on select node add exclude selector subset ids logic by @jensenity in #463 * Refactor dbt ls to run from a temporary directory, to avoid Read-only file system errors during DAG parsing, by @tatiana in #414 * Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in #505 * Fix SnowflakePrivateKeyPemProfileMapping private_key reference by @nacpacheco in #501 * Fix incorrect temporary directory creation in VirtualenvOperator init by @tatiana in #500 * Fix log propagation issue by @tatiana in #498 * Fix PostgresUserPasswordProfileMapping to retrieve port from connection by @jlneve in #511 **Others** * Docs: Fix RenderConfig load argument by @jbandoro in #466 * Enable CI integration tests from external forks by @tatiana in #458 * Improve CI tests runtime by @tatiana in #457 * Change CI to run coverage after tests pass by @tatiana in #461 * Fix forks code revision in code coverage by @tatiana in #472 * [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467 * Drop support to Python 3.7 in the CI test matrix by @harels in #490 * Add Airflow 2.7 to the CI test matrix by @tatiana in #487 * Add MyPy type checks to CI since we exceeded pre-commit disk quota usage by @tatiana in #510
As of Cosmos 1.0.0,
LoadMode.DBT_LS
runsdbt ls
from within the original dbt project directory.The
dbt ls
outputs files to the directory it's running from unless the environment variablesDBT_LOG_PATH
andDBT_TARGET_PATH
are specified (as of dbt 1.6).Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This can lead to an error message similar to the following:
This PR changes the behavior of
dbt ls
to try to make thedbt ls
artifacts (logs and target directory) not be written to the original project directory.In addition to the introduced test, this change was validated using airflow 2.6 and dbt 1.6, by following these steps:
(1) Delete folders
logs
andtarget
fromastronomer-cosmos/dev/dags/dbt/jaffle_shop
(2) Add a breakpoint after
stdout, stderr = process.communicate()
indbt/graph.py
(3) Run a DAG that uses
astronomer-cosmos/dev/dags/dbt/jaffle_shop
, e.g.:(4) When the breakpoint happens, check that no
target
orlogs
folder was created after runningdbt ls
inastronomer-cosmos/dev/dags/dbt/jaffle_shop
A limitation with the current approach is that, although
dbt ls
is not creating these directories in the given circumstances, if the user is using the local executor or an earlier version ofdbt
, the files will still be written to the project directory.Closes: #411