Skip to content

Commit

Permalink
[Cherry Pick] Support placement_group=None in PlacementGroupSchedulin…
Browse files Browse the repository at this point in the history
…gStrategy (#27370) (#27416)
  • Loading branch information
jjyao authored Aug 3, 2022
1 parent 2ad2cb2 commit fea2595
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 39 deletions.
36 changes: 18 additions & 18 deletions doc/source/ray-core/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
Ray Gotchas
===========

Ray sometimes has some aspects of its behavior that might catch
users off guard. There may be sound arguments for these design choices.
Ray sometimes has some aspects of its behavior that might catch
users off guard. There may be sound arguments for these design choices.

In particular, users think of Ray as running on their local machine, and
while this is mostly true, this doesn't work.
In particular, users think of Ray as running on their local machine, and
while this is mostly true, this doesn't work.

Environment variables are not passed from the driver to workers
---------------------------------------------------------------
Expand Down Expand Up @@ -39,9 +39,9 @@ Filenames work sometimes and not at other times
-----------------------------------------------

**Issue**: If you reference a file by name in a task or actor,
it will sometimes work and sometimes fail. This is
because if the task or actor runs on the head node
of the cluster, it will work, but if the task or actor
it will sometimes work and sometimes fail. This is
because if the task or actor runs on the head node
of the cluster, it will work, but if the task or actor
runs on another machine it won't.

**Example**: Let's say we do the following command:
Expand All @@ -50,7 +50,7 @@ runs on another machine it won't.
% touch /tmp/foo.txt
And I have this code:
And I have this code:

.. code-block:: python
Expand All @@ -61,15 +61,15 @@ And I have this code:
def check_file():
foo_exists = os.path.exists("/tmp/foo.txt")
print(f"Foo exists? {foo_exists}")
futures = []
for _ in range(1000):
for _ in range(1000):
futures.append(check_file.remote())
ray.get(futures)
then you will get a mix of True and False. If
then you will get a mix of True and False. If
``check_file()`` runs on the head node, or we're running
locally it works. But if it runs on a worker node, it returns ``False``.

Expand All @@ -78,7 +78,7 @@ It's the same code after all.

**Fix**

- Use only shared paths for such applications -- e.g. if you are using a network file system you can use that, or the files can be on s3.
- Use only shared paths for such applications -- e.g. if you are using a network file system you can use that, or the files can be on s3.
- Do not rely on file path consistency.


Expand All @@ -87,10 +87,10 @@ Placement groups are not composable
-----------------------------------

**Issue**: If you have a task that is called from something that runs in a placement
group, the resources are never allocated and it hangs.
group, the resources are never allocated and it hangs.

**Example**: You are using Ray Tune which creates placement groups, and you want to
apply it to an objective function, but that objective function makes use
apply it to an objective function, but that objective function makes use
of Ray Tasks itself, e.g.

.. code-block:: python
Expand All @@ -111,18 +111,18 @@ of Ray Tasks itself, e.g.
tuner = tune.Tuner(objective, param_space={"a": 1})
tuner.fit()
This will error with message:
ValueError: Cannot schedule create_task_that_uses_resources.<locals>.sample_task with the placement group
This will error with message:
ValueError: Cannot schedule create_task_that_uses_resources.<locals>.sample_task with the placement group
because the resource request {'CPU': 10} cannot fit into any bundles for the placement group, [{'CPU': 1.0}].

**Expected behavior**: The above executes.

**Fix**: In the ``@ray.remote`` declaration of tasks
called by ``create_task_that_uses_resources()`` , include a
``placement_group=None``.
``scheduling_strategy=PlacementGroupSchedulingStrategy(placement_group=None)``.

.. code-block:: diff
def create_task_that_uses_resources():
+ @ray.remote(num_cpus=10, placement_group=None)
+ @ray.remote(num_cpus=10, scheduling_strategy=PlacementGroupSchedulingStrategy(placement_group=None))
- @ray.remote(num_cpus=10)
12 changes: 6 additions & 6 deletions python/ray/tests/test_placement_group_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,7 @@
)
from ray.util.client.ray_client_helpers import connect_to_client_or_not
from ray.util.placement_group import get_current_placement_group

try:
import pytest_timeout
except ImportError:
pytest_timeout = None
from ray.util.scheduling_strategies import PlacementGroupSchedulingStrategy


@ray.remote
Expand Down Expand Up @@ -376,7 +372,11 @@ def schedule_nested_actor(self):

def schedule_nested_actor_outside_pg(self):
# Don't use placement group.
actor = NestedActor.options(placement_group=None).remote()
actor = NestedActor.options(
scheduling_strategy=PlacementGroupSchedulingStrategy(
placement_group=None
)
).remote()
ray.get(actor.ready.remote())
self.actors.append(actor)

Expand Down
10 changes: 0 additions & 10 deletions python/ray/tests/test_scheduling_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,16 +244,6 @@ def func():

func.options(scheduling_strategy="XXX").remote()

with pytest.raises(ValueError):

@ray.remote
def func():
return 0

func.options(
scheduling_strategy=PlacementGroupSchedulingStrategy(placement_group=None)
).remote()


@pytest.mark.parametrize("connect_to_client", [True, False])
def test_node_affinity_scheduling_strategy(
Expand Down
5 changes: 0 additions & 5 deletions python/ray/util/scheduling_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,6 @@ def __init__(
placement_group_bundle_index: int = -1,
placement_group_capture_child_tasks: Optional[bool] = None,
):
if placement_group is None:
raise ValueError(
"placement_group needs to be an instance of PlacementGroup"
)

self.placement_group = placement_group
self.placement_group_bundle_index = placement_group_bundle_index
self.placement_group_capture_child_tasks = placement_group_capture_child_tasks
Expand Down

0 comments on commit fea2595

Please sign in to comment.