Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Placement Group] Fix placement group high cpu usage part 1 #18652

Merged
merged 24 commits into from
Sep 28, 2021

Conversation

rkooo567
Copy link
Contributor

@rkooo567 rkooo567 commented Sep 15, 2021

Why are these changes needed?

Implement;

  • [P0]Cache bundle spec (in review)
  • [P0] Optimize string manipulation logic in the ComputeResources function (in review)
  • [P0] Increase gcs server RPC timeout to 20 seconds (in review)

This also makes GetBundle to return const which is a more ideal semantic.

This makes many_ppo runs 6+ hours rather than 1 hour.

With this PR #18650, it should make many_ppo run for a day without an issue.

For more detail please check the related issue.

Related issue number

Part of #18651
With this PR and #18650, this should fix the P0 release blocker #18541. But we should also fix the fundamental issue (I marked it as P1, and I can work on it next sprint)

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

src/ray/common/bundle_spec.cc Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_placement_group_manager.cc Outdated Show resolved Hide resolved
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 16, 2021
@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 20, 2021
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 20, 2021
@@ -109,12 +123,12 @@ std::string FormatPlacementGroupResource(const std::string &original_resource_na

bool IsBundleIndex(const std::string &resource, const PlacementGroupID &group_id,
const int bundle_index) {
return resource.find("_group_" + std::to_string(bundle_index) + "_" + group_id.Hex()) !=
std::string::npos;
return resource.find(kGroupKeyword + std::to_string(bundle_index) + "_" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think std::to_string here will benefit from this (#18538)

Are you seeing anything different after that PR merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fishbone
Copy link
Contributor

I think overall it looks good. Some comments there. Please let me know if you have any concerns.

@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 27, 2021
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 27, 2021
@ericl
Copy link
Contributor

ericl commented Sep 27, 2021

Tests / lint broken

@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 28, 2021
@ericl ericl merged commit a0a02f4 into ray-project:master Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants