Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime env] Add garbage collection for conda envs #20072

Merged
merged 16 commits into from
Nov 5, 2021

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Nov 4, 2021

Why are these changes needed?

Adds garbage collection for conda envs in the following two cases:

  • Job-level runtime env
  • Detached actor runtime env

In a followup PR we will add GC for per-task and per-actor runtime envs. The issue tracking this is here: #19602

Related issue number

Closes #19958

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@architkulkarni
Copy link
Contributor Author

@edoakes I'm working on getting tests to pass, but it would be good to get a review of the overall approach. Once this is working, we can get started on unifying all the "Managers" into the plugin API.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just not sure about being able to remove file lock.

Ping me when you have tests passing!

python/ray/_private/runtime_env/conda.py Outdated Show resolved Hide resolved
Comment on lines +33 to +34
@pytest.fixture(scope="function", params=["ray_client", "no_ray_client"])
def start_cluster(ray_start_cluster, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this shared between the tests somehow?

@architkulkarni
Copy link
Contributor Author

@edoakes Tests are passing locally!

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -90 to +91
conda_dir, f"requirements-{pip_hash_str}.txt")
resources_dir, f"requirements-{pip_hash_str}.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we clean this file up btw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no, but we should

create_conda_env(
conda_yaml_file, prefix=conda_env_name, logger=logger)
self._created_envs.add(conda_env_name)
os.remove(conda_yaml_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general we should have these cleanups in a finally block so they run even if the creation fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense

Comment on lines +117 to +118
delete_cmd = [conda_path, "remove", "-p", prefix, "--all", "-y"]
exit_code, output = exec_cmd_stream_to_logger(delete_cmd, logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, could we also just do rm -f the directory? or this there some special cleanup that this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the source https://github.com/conda/conda/blob/master/conda/cli/main_remove.py and yeah, it looks like the main ingredient is rm rf. It looks like it edits some metadata too though, but not sure how important that is

@edoakes edoakes merged commit c517507 into ray-project:master Nov 5, 2021
@architkulkarni architkulkarni deleted the conda-gc branch November 5, 2021 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[runtime env] Implement local GC for conda/pip
2 participants