-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Update number of cores on every iteration #1480
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
# This call takes about 0.3ms and hits a locally in-memory cached record of cluster resources | ||
cores: int = int(ray.cluster_resources()["CPU"]) - self.reserved_cores | ||
max_inflight_tasks = cores + self.max_task_backlog | ||
|
||
while True: # Loop: Dispatch (get tasks -> batch dispatch). | ||
tasks_to_dispatch: list[PartitionTask] = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might even want to do it here, this is where batches are dispatched up to the limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github UI is being unclear, but I mean the last line of that block, 458/456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved into inner loop and guarded it with a TTL
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1480 +/- ##
==========================================
+ Coverage 74.70% 74.86% +0.15%
==========================================
Files 60 60
Lines 6061 6102 +41
==========================================
+ Hits 4528 4568 +40
- Misses 1533 1534 +1
|
Updates the number of cores available before/after every batch dispatch
This should allow us to take advantage of autoscaling of the Ray cluster better as we will schedule larger batches of tasks + more total inflight tasks as the cluster autoscales.