Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Fix trial cleanup after x seconds, set default to 600 #28449

Merged
merged 8 commits into from
Sep 14, 2022

Conversation

krfricke
Copy link
Contributor

Signed-off-by: Kai Fricke [email protected]

Why are these changes needed?

This currently does not work in three places: 1) We need to kill the actor as garbage collection will not work with futures in flight, 2) We need to trigger the _stop_actor method after clearing the futures, as it will create a new future, 3) the future was not fetched correctly.

We also set the default cleanup time to 10 minutes, which should suffice for most cases and avoids deadlocks in long-running tasks.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Kai Fricke added 2 commits September 12, 2022 19:03
@krfricke krfricke changed the title [tune] Fix trial cleanup after x seconds, set default [tune] Fix trial cleanup after x seconds, set default to 600 Sep 12, 2022
Copy link
Contributor

@xwjiang2010 xwjiang2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kai for fixing this!

python/ray/tune/execution/ray_trial_executor.py Outdated Show resolved Hide resolved
python/ray/tune/trainable/trainable.py Show resolved Hide resolved
Kai Fricke added 2 commits September 13, 2022 10:04
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Kai Fricke added 2 commits September 13, 2022 15:06
Signed-off-by: Kai Fricke <[email protected]>
Nit
Signed-off-by: Kai Fricke <[email protected]>
Kai Fricke added 2 commits September 13, 2022 18:47
@krfricke krfricke merged commit 2d8ce25 into ray-project:master Sep 14, 2022
@krfricke krfricke deleted the tune/actor-cleanup branch September 14, 2022 08:54
PaulFenton pushed a commit to PaulFenton/ray that referenced this pull request Sep 19, 2022
…ject#28449)

This currently does not work in three places: 1) We need to kill the actor as garbage collection will not work with futures in flight, 2) We need to trigger the _stop_actor method after clearing the futures, as it will create a new future, 3) the future was not fetched correctly.

We also set the default cleanup time to 10 minutes, which should suffice for most cases and avoids deadlocks in long-running tasks.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: PaulFenton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants