Skip to content

Commit

Permalink
[core] Fix the placement group stress test regression. (#34192) (#34249)
Browse files Browse the repository at this point in the history
Signed-off-by: Yi Cheng <[email protected]>


The regression is because of enabling ray syncer. In the code, whenever the pg is created and deleted, raylet will actively send a message to GCS and this introduced a lot of workload to the GCS and thus make the code run slow.
If disable ray syncer, raylet won't create this message and not sending it to GCS.

There is no need doing this since when new resource is added to local node, ray syncer will be able to notice this and the resource will be pushed to GCS after 100ms.

This PR deleted this logic and thus fix the regression.

```
before: placement group create/removal per second 1271.32 +- 8.27
after: placement group create/removal per second 1282.83 +- 3.99
```

For release test:
```
perf_metrics = [{'perf_metric_name': 'pgs_per_second', 'perf_metric_value': 17.061243668170643, 'perf_metric_type': 'THROUGHPUT'}, {'perf_metric_name': 'dashboard_p50_latency_ms', 'perf_metric_value': 3.261, 'perf_metric_type': 'LATENCY'}, {'perf_metric_name': 'dashboard_p95_latency_ms', 'perf_metric_value': 129.682, 'perf_metric_type': 'LATENCY'}, {'perf_metric_name': 'dashboard_p99_latency_ms', 'perf_metric_value': 141.648, 'perf_metric_type': 'LATENCY'}]
```
  • Loading branch information
fishbone authored Apr 11, 2023
1 parent 5ca1b8c commit cb6f1c2
Showing 1 changed file with 0 additions and 8 deletions.
8 changes: 0 additions & 8 deletions src/ray/raylet/node_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1935,10 +1935,6 @@ void NodeManager::HandleCommitBundleResources(
RAY_LOG(DEBUG) << "Request to commit resources for bundles: "
<< GetDebugStringForBundles(bundle_specs);
placement_group_resource_manager_->CommitBundles(bundle_specs);
if (RayConfig::instance().use_ray_syncer()) {
// To reduce the lag, we trigger a broadcasting immediately.
RAY_CHECK(ray_syncer_.OnDemandBroadcasting(syncer::MessageType::RESOURCE_VIEW));
}
send_reply_callback(Status::OK(), nullptr, nullptr);

cluster_task_manager_->ScheduleAndDispatchTasks();
Expand Down Expand Up @@ -1979,10 +1975,6 @@ void NodeManager::HandleCancelResourceReserve(

// Return bundle resources.
placement_group_resource_manager_->ReturnBundle(bundle_spec);
if (RayConfig::instance().use_ray_syncer()) {
// To reduce the lag, we trigger a broadcasting immediately.
RAY_CHECK(ray_syncer_.OnDemandBroadcasting(syncer::MessageType::RESOURCE_VIEW));
}
cluster_task_manager_->ScheduleAndDispatchTasks();
send_reply_callback(Status::OK(), nullptr, nullptr);
}
Expand Down

0 comments on commit cb6f1c2

Please sign in to comment.