Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] get or create logic for placement groups #20196

Closed
1 of 2 tasks
matthewdeng opened this issue Nov 9, 2021 · 11 comments
Closed
1 of 2 tasks

[Feature] get or create logic for placement groups #20196

matthewdeng opened this issue Nov 9, 2021 · 11 comments
Labels
core-placement-group enhancement Request for new feature and/or capability stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@matthewdeng
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Currently, the logic to "get or create" a placement group feels a little complex and requires understanding of how placement_group_capture_child_tasks operates.

Example:

import ray
from ray.util.placement_group import get_current_placement_group

current_placement_group = get_current_placement_group()
should_capture_child_tasks_in_placement_group = ray.worker.global_worker.should_capture_child_tasks_in_placement_group
should_create_placement_group = current_placement_group is None or not should_capture_child_tasks_in_placement_group

Use case

For Ray Train, we want to be able to configure whether workers should be colocated or spread - this is done by creating a placement group with SPREAD or PACK strategy. If the caller of Train (e.g. Ray Tune) has already specified a placement group, it should take precedence and a new placement group should not be created.

Related issues

#18524 - for the described use-case, this solution would allow us to not need to create a placement group altogether. This would also be a preferred solution for elastic training!

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@matthewdeng matthewdeng added the enhancement Request for new feature and/or capability label Nov 9, 2021
@scv119
Copy link
Contributor

scv119 commented Nov 10, 2021

@rkooo567 any thought on this?

@rkooo567
Copy link
Contributor

A couple comments;

One note is that #18524 will be just a hint, and collocation or spread is not guaranteed. This might be a difference from placement group which can guarantee collocation if you place an actor to the same bundle.

About

import ray
from ray.util.placement_group import get_current_placement_group

current_placement_group = get_current_placement_group()
should_capture_child_tasks_in_placement_group = ray.worker.global_worker.should_capture_child_tasks_in_placement_group
should_create_placement_group = current_placement_group is None or not should_capture_child_tasks_in_placement_group

Can you give me more context? Driver (caller) never needs to use get_current_placement_group cuz the driver doesn't have the current placement group. Is it called within an actor? Also, why do you need should_capture_child_tasks_in_placement_group to decide whether or not you should create a pg?

@matthewdeng
Copy link
Contributor Author

One note is that #18524 will be just a hint, and collocation or spread is not guaranteed. This might be a difference from placement group which can guarantee collocation if you place an actor to the same bundle.

Yep, for Ray Train best effort is okay. If we want strict scheduling placement groups is a viable solution (though maybe we would need the ability to add more bundles to a placement group).

Can you give me more context? Driver (caller) never needs to use get_current_placement_group cuz the driver doesn't have the current placement group. Is it called within an actor? Also, why do you need should_capture_child_tasks_in_placement_group to decide whether or not you should create a pg?

Yeah, this is called within an actor. At a high level the flows are:

Ray Train:

  1. Ray Train job should create a placement group.
  2. Ray Train will spawn workers in this placement group.

Ray Train as part of Ray Tune:

  1. Ray Tune creates 1 placement group per trial.
  2. Ray Train job will be created within the trial.
  3. Ray Train should inherit this placement group.
  4. Ray Train will spawn workers in this placement group.

Note that the Ray Train workers will always be added to a placement group, but the creator of the placement group will depend on whether or not it is contained in a Ray Tune job. We need to check should_capture_child_tasks_in_placement_group within Ray Train to determine if a new placement group should be created.

@rkooo567
Copy link
Contributor

That makes sense! Thanks for the detail here...

Here, isn't should_capture_child_tasks_in_placement_group always set to be True from the parent? Maybe you can just

def get_or_create_pg():
    current_placement_group = get_current_placement_group()
    if current_placement_group:
        assert ray.get_runtime_context().should_create_placement_group
        return current_placement_group

    # create

?

@rkooo567
Copy link
Contributor

Btw, I feel like should_capture_child_tasks_in_placement_group is a really bad name. Do you think renaming it to inherit_placement_group is more straightforward?

@rkooo567
Copy link
Contributor

cc @matthewdeng

@matthewdeng
Copy link
Contributor Author

Ah isn't should_capture_child_tasks_in_placement_group set to True only if the parent specifies placement_group_capture_child_tasks=True?

@rkooo567
Copy link
Contributor

Yes. Are there cases where the parent doesn’t set that as True?

@matthewdeng
Copy link
Contributor Author

Yeah the default value is now False: #17527

@rkooo567 rkooo567 removed their assignment Feb 22, 2022
@stale
Copy link

stale bot commented Jun 23, 2022

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 23, 2022
@stale
Copy link

stale bot commented Aug 12, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-placement-group enhancement Request for new feature and/or capability stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

No branches or pull requests

3 participants