[Core] Tasks/actors with no resource requirements are not scheduled using placement groups as expected #31034

Yard1 · 2022-12-12T20:22:54Z

What happened + What you expected to happen

I specify an actor which uses a STRICT_PACK placement group scheduling strategy with placement_group_capture_child_tasks=True. That actor spawns several tasks which have no resource requirements (num_cpus=0). I expect the tasks to be scheduled on the same node as the actor, as the strategy is STRICT_PACK. Instead, the tasks are running on arbitrary nodes. This only happens if the tasks have no resource requirements.

Versions / Dependencies

master (df13a1d)

Reproduction script

import ray
import ray.util
from ray.util.placement_group import placement_group
from ray.util.scheduling_strategies import PlacementGroupSchedulingStrategy


@ray.remote
def task(ip):
    print(ip, ray.util.get_node_ip_address())
    assert ip == ray.util.get_node_ip_address()
    return


@ray.remote
class Actor:
    def __init__(self, task_cpus):
        self.task_cpus = task_cpus

    def run(self):
        task_with_cpus_set = task.options(num_cpus=self.task_cpus)
        ip = ray.util.get_node_ip_address()
        tasks = [task_with_cpus_set.remote(ip) for i in range(32)]
        ray.get(tasks)


pg = placement_group([{"CPU": 1}] * 8, strategy="STRICT_PACK")
ray.get(pg.ready())

ActorWithPlacementGroup = Actor.options(
    scheduling_strategy=PlacementGroupSchedulingStrategy(
        placement_group=pg, placement_group_capture_child_tasks=True
    )
)

# works
actor = ActorWithPlacementGroup.remote(task_cpus=1)
ray.get(actor.run.remote())

print("fail")

# fails
actor = ActorWithPlacementGroup.remote(task_cpus=0)
ray.get(actor.run.remote())

Run on a cluster with multiple nodes each with <= 8 CPUs (eg. https://console.anyscale-staging.com/o/anyscale-internal/workspaces/expwrk_j4rphgl1yttb36lhzw59svlf/ses_subxaevyczw11qxaylf3m9qa)

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

amogkam · 2022-12-12T20:34:51Z

Looks like the same issue as #27931

ericl · 2022-12-12T20:35:26Z

Closed (duplicates #27931)

Yard1 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) core Issues that should be addressed in Ray Core labels Dec 12, 2022

ericl closed this as completed Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Tasks/actors with no resource requirements are not scheduled using placement groups as expected #31034

[Core] Tasks/actors with no resource requirements are not scheduled using placement groups as expected #31034

Yard1 commented Dec 12, 2022

amogkam commented Dec 12, 2022

ericl commented Dec 12, 2022

[Core] Tasks/actors with no resource requirements are not scheduled using placement groups as expected #31034

[Core] Tasks/actors with no resource requirements are not scheduled using placement groups as expected #31034

Comments

Yard1 commented Dec 12, 2022

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

amogkam commented Dec 12, 2022

ericl commented Dec 12, 2022