Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write logs to a temp dir when using dbt_ls parsing mode #411

Closed
jlaneve opened this issue Jul 27, 2023 · 9 comments · Fixed by #414
Closed

Write logs to a temp dir when using dbt_ls parsing mode #411

jlaneve opened this issue Jul 27, 2023 · 9 comments · Fixed by #414
Assignees

Comments

@jlaneve
Copy link
Collaborator

jlaneve commented Jul 27, 2023

In Slack, a user was getting an issue where the dbt ls command for generating the manifest was trying to write logs to the same directory, which is typically read-only.

20:43:06  Encountered an error:
[Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt_grindr/logs/dbt.log'

We should instead have dbt write to a temp dir and paste the logs to stdout if there are any written to disk

@tatiana tatiana self-assigned this Jul 28, 2023
@tatiana
Copy link
Collaborator

tatiana commented Jul 28, 2023

Users could resolve this by setting the environment variable DBT_LOG_PATH (https://docs.getdbt.com/reference/project-configs/log-path). This would give them control of where these logs are written and the power to troubleshoot further issues if necessary.

@tatiana
Copy link
Collaborator

tatiana commented Jul 28, 2023

I just found the original bug report by Adam Underwood:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1690490672020429?thread_ts=1690473607.733569&cid=C059CC42E9W

The problem goes beyond the logs directory since the dbt ls command also creates a target directory, where several files are attempted to be written, including: target/partial_parse.msgpack. This could be overcome by setting DBT_TARGET_PATH.

@tatiana
Copy link
Collaborator

tatiana commented Jul 28, 2023

A solution to fix both issues if the user does not set DBT-specific environment variables can be to create a temporary directory from where we'll run dbt ls.

This would assume that the user/process running cosmos has to write permissions to create/write temporary directories.

tatiana added a commit that referenced this issue Jul 28, 2023
Previously, LoadMode.DBT_LS used the original dbt project directory.
This command outputs to DBT_LOG_PATH and DBT_TARGET_PATH, if defined, otherwise to the original directory.
Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory.
This PR changes the behaviour of  to output to copy the original project directory into a temporary directory and run the command from there.

Closes: #411
tatiana added a commit that referenced this issue Jul 28, 2023
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory.

The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified.

Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there.

Closes: #411
tatiana added a commit that referenced this issue Aug 7, 2023
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory.

The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified.

Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there.

Closes: #411
@tatiana
Copy link
Collaborator

tatiana commented Aug 7, 2023

@jlaneve let's align here on the approach, given the feedback you gave in:
#414

A proposal:

  1. If the user defines DBT_TARGET_PATH and/or DBT_LOG_PATH, we should use them when running dbt ls.
  2. If the user doesn't define one/both of them, we'll be creating a tmp directory where we'll output both target and logs
  3. By the end of running dbt ls, we'll log everything that is in the given dbt logs file (we can start by logging it using the debug mode, in future we can use the Airflow log level to decide which dbt log levels we want to print)

Are you happy with this approach?

@jlaneve
Copy link
Collaborator Author

jlaneve commented Aug 7, 2023

@jlaneve let's align here on the approach, given the feedback you gave in: #414

A proposal:

  1. If the user defines DBT_TARGET_PATH and/or DBT_LOG_PATH, we should use them when running dbt ls.
  2. If the user doesn't define one/both of them, we'll be creating a tmp directory where we'll output both target and logs
  3. By the end of running dbt ls, we'll log everything that is in the given dbt logs file (we can start by logging it using the debug mode, in future we can use the Airflow log level to decide which dbt log levels we want to print)

Are you happy with this approach?

yes, this sounds great - thank you!

@tatiana
Copy link
Collaborator

tatiana commented Aug 7, 2023

Some additional comments on this. I played a bit with dbt ls, and this is what I found so far:

Logs path

  • Neither dbt 1.5 nor 1.6 support the --log-path parameter
  • dbt 1.6 supports DBT_LOG_PATH, but dbt 1.5 doesn't (probably older versions also don't)

Target path

  • dbt 1.6 supports --target-path, dbt 1.5 doesn't (probably older versions also don't)
  • dbt 1.6 supports DBT_TARGET_PATH, dbt 1.5 doesn't (probably older versions also don't)

So, the one thing to remember with this approach is that if users use an older version of dbt, the custom environment variables approach will not work.

@tatiana
Copy link
Collaborator

tatiana commented Aug 7, 2023

What we agreed is that we're going to attempt to run dbt ls from a temporary directory, pointing to the models/snapshots/seed etc directoried in the original place.

tatiana added a commit that referenced this issue Aug 10, 2023
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory.

The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified.

Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there.

Closes: #411
tatiana added a commit that referenced this issue Aug 14, 2023
As of Cosmos 1.0.0, `LoadMode.DBT_LS` runs `dbt ls` from within the
original dbt project directory.

The `dbt ls` outputs files to the directory it's running from unless the
environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified
(as of dbt 1.6).

Depending on the deployment, the Airflow worker does not have write
permissions to the dbt project directory. This can lead to an error
message similar to the following:

```
20:43:06  Encountered an error:
[Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt_grindr/logs/dbt.log'
```

This PR changes the behavior of `dbt ls` to try to make the `dbt ls`
artifacts (logs and target directory) not be written to the original
project directory.

In addition to the introduced test, this change was validated using
airflow 2.6 and dbt 1.6, by following these steps:
(1) Delete folders `logs` and `target` from
`astronomer-cosmos/dev/dags/dbt/jaffle_shop`
(2) Add a breakpoint after `stdout, stderr = process.communicate()` in
`dbt/graph.py`
(3) Run a DAG that uses `astronomer-cosmos/dev/dags/dbt/jaffle_shop`,
e.g.:
```
airflow dags test basic_cosmos_dag `date -Iseconds`
```
(4) When the breakpoint happens, check that no `target` or `logs` folder
was created after running `dbt ls` in
`astronomer-cosmos/dev/dags/dbt/jaffle_shop`

A limitation with the current approach is that, although `dbt ls` is not
creating these directories in the given circumstances, if the user is
using the local execution mode or an earlier version of `dbt`, the files will
still, be written to the project directory.

Closes: #411
@DanMawdsleyBA
Copy link
Contributor

DanMawdsleyBA commented Aug 20, 2023

After further debugging I was able to get it to work. Turns out the issue was with dbt packages the code need to run dbt deps before running dbt ls (or dbt ls not be dependent on packages which I don't think it needs to be). Removing all dbt packages fixed the issue

@binhnq94
Copy link
Contributor

binhnq94 commented Aug 30, 2023

@DanMawdsleyBA I did a hot fix for my work here:
binhnq94@177cebd

And config in dbt_project.yml:
packages-install-path: /tmp/dbt_package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants