-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No spreading if a node is selected for lease request due to locality #22015
Conversation
I need to add tests, etc but I'd like to request an early review to make sure the overall approach is good. |
// If no raylet address is given, find the best worker for our next lease request. | ||
best_node_address = lease_policy_->GetBestNodeForTask(resource_spec); | ||
std::tie(best_node_address, is_selected_based_on_locality) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we only run this logic if the scheduling strategy is DEFAULT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently that check is inside lease_policy: it checks the scheduling strategy and decide if it needs to run locality logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure, the desired precedence is:
- SPREAD strategy => always spread, ignore locality complete
- DEFAULT / locality present => set spread threshold to 1.0 (no spreading at all)
- DEFAULT / no-locality => default spread threshold
Is this right?
That's right. |
Is there a way we can set the spread threshold only on the owner side? So that way the scheduler just takes the spread threshold directly from the TaskSpec? |
@ericl @stephanie-wang Updated, now only scheduler decides spread threshold. |
@@ -1558,7 +1558,8 @@ std::string ClusterTaskManager::GetBestSchedulableNode(const internal::Work &wor | |||
bool *is_infeasible) { | |||
// If the local node is available, we should directly return it instead of | |||
// going through the full hybrid policy since we don't want spillback. | |||
if (work.grant_or_reject && !force_spillback && IsLocallySchedulable(work.task)) { | |||
if ((work.grant_or_reject || work.is_selected_based_on_locality) && !force_spillback && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to unify grant_or_reject and is_selected_based_on_locality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same if the node is available. But if the node is not available, their behavior is different, one is spill, the other is reject. I think it might be clearer to separate them.
This improves the locality aware scheduling but doesn't fix the fundamental problem: resource view and object view are separated in two places (owner core worker and raylet). Without having these information in a single place, we have the following issues:
It's possible those issues won't cause much problem in real workload and this PR is enough but just something to keep in mind when we refactor scheduler. cc @scv119 @iycheng |
Don't merge it yet since I want to run nightly tests. |
FYI @jjyao for (1), we have an open issue for it from the original locality-aware scheduling work and a basic implementation idea (on spillback, raylet returns all available nodes and the core worker chooses the best locality node from that set) but it has never been prioritized since we haven't created a test workload that demonstrates the need for it. It also predates the hybrid scheduling policy and wouldn't work very well with it (breaks under-threshold round-robin packing), so implementing a locality-aware spillback policy within the hybrid scheduling policy would probably yield much better results. (2) is a great point that's always bothered me about this design, and we've heard (3) requested from users before. Really looking forward to addressing all of these with the redesign! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great tests!
Ah... I didn't see the label, sorry! Hopefully it works with the nightly tests. |
Nightly tests are broken now. Hopefully this works otherwise we can revert it. I'll keep an eye on the nightly tests once it's recovered. |
Why are these changes needed?
Related issue number
Closes #18581
Checks
scripts/format.sh
to lint the changes in this PR.