Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[observability][autoscaler] Ensure pending nodes is reset to 0 after scaling #32085

Merged
merged 3 commits into from
Jan 31, 2023

Conversation

wuisawesome
Copy link
Contributor

Why are these changes needed?

The previous way pending_nodes was calculated was prone to race conditions, instead, let's just always publish it in the main thread with other metrics.

Related issue number

Closes #31982

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wuisawesome
Copy link
Contributor Author

cc @gvspraveen

{
"autoscaler_cluster_resources": 0,
"autoscaler_pending_resources": 0,
"autoscaler_pending_nodes": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to write a test that could've caught this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this precisely a test that catches this?

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 31, 2023
@wuisawesome
Copy link
Contributor Author

Remaining test failures look unrelated/flakey. merging

@wuisawesome wuisawesome merged commit 7573d49 into ray-project:master Jan 31, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…scaling (ray-project#32085)

The previous way pending_nodes was calculated was prone to race conditions, instead, let's just always publish it in the main thread with other metrics.

Closes ray-project#31982

---------

Co-authored-by: Alex <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dashboard] dashboard showing stale pending nodes while the cluster is fully autoscaled.
3 participants