Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune test autoscaler / fix stale node detection bug #21516

Merged
merged 83 commits into from
Jan 19, 2022

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Jan 10, 2022

Why are these changes needed?

See #21458. Currently, Tune keeps its own list of alive node IPs, but this information is only updated every 10 seconds and is usually stale when a new node is added. Because of this, the first trial scheduled on this node is usually marked as failed. This PR adds a test confirming this behavior.

This PR should be updated to fix the issue. cc @xwjiang2010

Related issue number

This PR includes #20256 and requires it to be merged.

This tests #21458 and should be updated to close this issue.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@xwjiang2010 xwjiang2010 changed the title [wip] Tune test autoscaler / fix stale node detection bug Tune test autoscaler / fix stale node detection bug Jan 18, 2022
Copy link
Contributor Author

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for removing this code path.

I approve these changes - technically you'll have to approve the PR though as I originally filed it ;-)

@krfricke krfricke merged commit 8fd5b7a into ray-project:master Jan 19, 2022
@krfricke krfricke deleted the tune/test-autoscaler branch January 19, 2022 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants