-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Cosmos stale temporary directories #958
Comments
We are experiencing a similar issue in our MWAA environment:
I have done some investigation:
|
Rodrigo Rabioglio reported the same issue in the #airflow-dbt Slack channel:
|
I believe the issue Rodrigo is facing is likely due to
This will only be executed in case the operator execution succeeds: astronomer-cosmos/cosmos/operators/virtualenv.py Lines 106 to 107 in 3c9cf6f
We should change this to use context manager. From Cosmos 1.4 onwards, we also received reports that caching locally the partial parse file was leading to issues in MWAA: This will hopefully be solved once #927 is implemented. |
We are encountering the same issue with temporary directories not being deleted.
It looks like none of the To reproduce this issue locally, I have been using composer-local-dev. I added a line to manually remove the directories using shutil.rmtree(self.virtualenv_dir, ignore_errors=True) after the
which resolved the issue of stale directories. However, I am unsure why this manual removal is necessary or why the directories are not automatically deleted. Should I create a pull request with this fix, or should we investigate further to identify the root cause? What we know so far:
|
I noticed that the temporary directories |
I've observed the following behavior related to this issue in version 1.6.0: |
…lenv_dir=None` and `is_virtualenv_dir_temporary=True` (#1200) ## Description This PR optimizes the `DbtVirtualenvBaseOperator` by implementing virtualenv reuse within a single task execution. It reduces the overhead of creating new virtualenvs for each dbt command. The `DbtVirtualenvBaseOperator` in [email protected] creates a temporary directory and prepares a virtualenv twice when `virtualenv_dir` is `None` and `is_virtualenv_dir_temporary` is `True`. This PR modifies it to create a directory and a virtualenv only once at the beginning of the `run_command` method, avoiding this overhead. Additionally, I have added tests to ensure the directory for virtualenv will be deleted after the task execution. This is related to the issue reported in #958. The changes include: - Reusing virtualenv in a single task execution - Improving temporary directory management - Adding tests to ensure proper handling of virtualenv directories (deletion or persistence) ## Related Issue(s) #958 ## Breaking Change? I believe this is not a breaking change. ## Checklist - [x] I have made corresponding changes to the documentation (if required) - [x] I have added tests that prove my fix is effective or that my feature works
Context
It seems sometimes Cosmos is creating and not deleting temporary directories.
An example of a report, from 30 April 2024 in the #airflow-dbt Slack channel:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1714484749579599
There have been previous discussions in the core Airflow apache/airflow#22404 about tempfile.TemporaryDirectory doesn't necessarily behaving as expected.
I was not able to reproduce this issue yet, but one possibility is that there is some exception or error, and the context manager
tempfile.TemporaryDirectory
is not being able to clean things after those scenarios.Acceptance criteria
The text was updated successfully, but these errors were encountered: