Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[resource-reporting 3/n] further clean up LocalResourceManager #21927

Merged
merged 2 commits into from
Jan 28, 2022

Conversation

scv119
Copy link
Contributor

@scv119 scv119 commented Jan 27, 2022

Why are these changes needed?

Further clean up LocalResourceManager.

  • add comments
  • rename UpdateLatestResource -> ReplaceResources
  • Move helper function to private

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -587,7 +587,7 @@ void NodeManager::FillResourceReport(rpc::ResourcesData &resources_data) {
resources_data.set_node_manager_address(initial_config_.node_manager_address);
// Update local cache from gcs remote cache, this is needed when gcs restart.
// We should always keep the cache view consistent.
cluster_resource_scheduler_->GetLocalResourceManager().UpdateLastResourceUsage(
cluster_resource_scheduler_->GetLocalResourceManager().ResetResources(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this name is confusing: for local resource, current node should be the source of truth. Why do we need to update local resource usage from gcs client?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better name should be ResetLastReportResourceUsage or something similar.

void LocalResourceManager::UpdateLastResourceUsage(
std::shared_ptr<SchedulingResources> gcs_resources) {
void LocalResourceManager::ResetResources(
std::shared_ptr<SchedulingResources> replacement) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A const& is better.

gcs_resources->GetAvailableResources().GetResourceMap()));
resource_name_to_id_, replacement->GetTotalResources().GetResourceMap(),
replacement->GetAvailableResources().GetResourceMap()));
OnResourceChanged();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't actually change local resource, we only change the the one we report to gcs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you are right!... was a bit confused

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to call OnResourceChanged() here right since local resource is not changed

/// It is responsible for allocating/deallocating resources for (task) resource request;
/// it also supports creating a new resource or delete an existing resource.
/// Whenever the resouce changes, it notifies the subscriber of the change.
/// This class is not thread safe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add which thread it is expected to be running in? like which event loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revisit this once we finished the cluster resource manager part. I'm not sure if this class should be aware of the event loop.. yet.

@scv119 scv119 added the do-not-merge Do not merge this PR! label Jan 27, 2022
@scv119 scv119 merged commit 0ff8bfa into ray-project:master Jan 28, 2022
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not all comments are addressed

gcs_resources->GetAvailableResources().GetResourceMap()));
resource_name_to_id_, replacement->GetTotalResources().GetResourceMap(),
replacement->GetAvailableResources().GetResourceMap()));
OnResourceChanged();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to call OnResourceChanged() here right since local resource is not changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants