Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No spreading if a node is selected for lease request due to locality #22015

Merged
merged 7 commits into from
Feb 3, 2022

Conversation

jjyao
Copy link
Collaborator

@jjyao jjyao commented Jan 31, 2022

Why are these changes needed?

  1. If the node is selected based on locality, we always run the task on the node selected by locality if the node is available.
  2. For spread scheduling strategy, we always select the local node as the first raylet to request lease, no locality involved.

Related issue number

Closes #18581

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@jjyao
Copy link
Collaborator Author

jjyao commented Jan 31, 2022

I need to add tests, etc but I'd like to request an early review to make sure the overall approach is good.

// If no raylet address is given, find the best worker for our next lease request.
best_node_address = lease_policy_->GetBestNodeForTask(resource_spec);
std::tie(best_node_address, is_selected_based_on_locality) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we only run this logic if the scheduling strategy is DEFAULT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently that check is inside lease_policy: it checks the scheduling strategy and decide if it needs to run locality logic.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure, the desired precedence is:

  1. SPREAD strategy => always spread, ignore locality complete
  2. DEFAULT / locality present => set spread threshold to 1.0 (no spreading at all)
  3. DEFAULT / no-locality => default spread threshold

Is this right?

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 31, 2022
@jjyao
Copy link
Collaborator Author

jjyao commented Feb 1, 2022

To make sure, the desired precedence is:

  1. SPREAD strategy => always spread, ignore locality complete

  2. DEFAULT / locality present => set spread threshold to 1.0 (no spreading at all)

  3. DEFAULT / no-locality => default spread threshold

Is this right?

That's right.

@stephanie-wang
Copy link
Contributor

Is there a way we can set the spread threshold only on the owner side? So that way the scheduler just takes the spread threshold directly from the TaskSpec?

@jjyao jjyao changed the title [WIP] Set spread threshold to 1.0 for locality scheduling [WIP] No spreading if a node is selected for lease request due to locality Feb 2, 2022
@jjyao
Copy link
Collaborator Author

jjyao commented Feb 2, 2022

@ericl @stephanie-wang Updated, now only scheduler decides spread threshold.

@@ -1558,7 +1558,8 @@ std::string ClusterTaskManager::GetBestSchedulableNode(const internal::Work &wor
bool *is_infeasible) {
// If the local node is available, we should directly return it instead of
// going through the full hybrid policy since we don't want spillback.
if (work.grant_or_reject && !force_spillback && IsLocallySchedulable(work.task)) {
if ((work.grant_or_reject || work.is_selected_based_on_locality) && !force_spillback &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to unify grant_or_reject and is_selected_based_on_locality?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same if the node is available. But if the node is not available, their behavior is different, one is spill, the other is reject. I think it might be clearer to separate them.

@jjyao jjyao added the do-not-merge Do not merge this PR! label Feb 2, 2022
@jjyao jjyao changed the title [WIP] No spreading if a node is selected for lease request due to locality No spreading if a node is selected for lease request due to locality Feb 2, 2022
@jjyao jjyao removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 2, 2022
@jjyao
Copy link
Collaborator Author

jjyao commented Feb 2, 2022

This improves the locality aware scheduling but doesn't fix the fundamental problem: resource view and object view are separated in two places (owner core worker and raylet). Without having these information in a single place, we have the following issues:

  1. Spillback is not locality aware: ideally if the best locality node is not available, we may want to spill back to the second best locality node.
  2. It's possible that best locality node doesn't have the resources needed by the task (e.g. task needs custom resource A but best locality node doesn't have it), however core worker will still request the lease from that node.
  3. Actor scheduling is not locality aware.

It's possible those issues won't cause much problem in real workload and this PR is enough but just something to keep in mind when we refactor scheduler. cc @scv119 @iycheng

@jjyao
Copy link
Collaborator Author

jjyao commented Feb 2, 2022

Don't merge it yet since I want to run nightly tests.

@clarkzinzow
Copy link
Contributor

FYI @jjyao for (1), we have an open issue for it from the original locality-aware scheduling work and a basic implementation idea (on spillback, raylet returns all available nodes and the core worker chooses the best locality node from that set) but it has never been prioritized since we haven't created a test workload that demonstrates the need for it. It also predates the hybrid scheduling policy and wouldn't work very well with it (breaks under-threshold round-robin packing), so implementing a locality-aware spillback policy within the hybrid scheduling policy would probably yield much better results.

(2) is a great point that's always bothered me about this design, and we've heard (3) requested from users before. Really looking forward to addressing all of these with the redesign!

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great tests!

python/ray/tests/test_scheduling.py Show resolved Hide resolved
@ericl ericl merged commit 44db41c into ray-project:master Feb 3, 2022
@ericl
Copy link
Contributor

ericl commented Feb 3, 2022

Ah... I didn't see the label, sorry! Hopefully it works with the nightly tests.

@jjyao
Copy link
Collaborator Author

jjyao commented Feb 3, 2022

Nightly tests are broken now. Hopefully this works otherwise we can revert it. I'll keep an eye on the nightly tests once it's recovered.

@jjyao jjyao deleted the jjyao/locality branch February 3, 2022 20:47
@jjyao jjyao restored the jjyao/locality branch February 4, 2022 16:22
@jjyao jjyao deleted the jjyao/locality branch February 4, 2022 17:55
@jjyao jjyao removed the do-not-merge Do not merge this PR! label Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core][scheduler] Data locality scheduling doesn't always work with distributed scheduler policy
5 participants