[Datasets] Fix max number of actors for default actor pool strategy #26266

c21 · 2022-07-02T01:08:30Z

Why are these changes needed?

This PR is to fix the maximal number of actors used in datasets tranformation, when using default actor pool strategy (i.e. ActorPoolStrategy()). It should not exceed 1.25 * num_cpus set in the cluster.

Related issue number

Closes #26221

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

c21 · 2022-07-02T01:10:59Z

python/ray/data/_internal/compute.py

@@ -295,7 +297,8 @@ def map_block_nosplit(
                if not ready:
                    if (
                        len(workers) < self.max_size
-                        and len(ready_workers) / len(workers) > 0.8
+                        and len(ready_workers) / len(workers)
+                        > self.ready_to_total_workers_ratio


This looks weird format to me, but this is the result after running scripts/format.sh.

c21 · 2022-07-02T01:11:24Z

python/ray/data/tests/test_dataset.py

+def test_actorpoolstrategy_default_max_actors(shutdown_only):
+    def f(x):
+        import time
+


ditto, scripts/format.sh decides to add a new line here.

Yep our formatting script will add a newline after imports, even dynamic imports within functions.

c21 · 2022-07-02T01:13:09Z

python/ray/data/tests/test_dataset.py

@@ -4233,7 +4233,7 @@ def f(should_import_polars):
        ctx.use_polars = original_use_polars


-def test_actorpoolstrategy_apply_interrupt():
+def test_actorpoolstrategy_apply_interrupt(shutdown_only):


shutdown_only has to add here, o.w. the newly added unit test test_actorpoolstrategy_default_max_actors below cannot call ray.init again.

Yep this is to be expected for tests that start a Ray cluster! If the cluster is shareable (e.g. same cluster config), the ray_start_regular_shared fixture takes care of sharing a common cluster across tests. But since this test needs a special cluster config, the shutdown_only fixture is the way to go.

Supernit: test_actor_pool_strategy_apply_interrupt and test_actor_pool_strategy_default_max_actors may be a bit more readable, snake-casing the camel-cased class name instead of actorpoolstrategy. This is up to personal preference I think.

@clarkzinzow - agree, updated.

ericl · 2022-07-02T01:53:10Z

python/ray/data/tests/test_dataset.py

+        num_cpus * (1 / compute_strategy.ready_to_total_workers_ratio)
+    )
+    assert (
+        compute_strategy.num_workers <= expected_max_num_workers


Shall we also assert that it scaled up to at least 5 workers?

@ericl - makes sense, added.

ericl

One minor comment.

ericl

LGTM, assuming you've checked the test fails before the fix.

c21 · 2022-07-02T02:15:10Z

LGTM, assuming you've checked the test fails before the fix.

@ericl - yes, checked before creating the PR, thanks for review.

clarkzinzow

LGTM!

clarkzinzow · 2022-07-02T02:13:43Z

python/ray/data/tests/test_dataset.py

@@ -4233,7 +4233,7 @@ def f(should_import_polars):
        ctx.use_polars = original_use_polars


-def test_actorpoolstrategy_apply_interrupt():
+def test_actorpoolstrategy_apply_interrupt(shutdown_only):


Yep this is to be expected for tests that start a Ray cluster! If the cluster is shareable (e.g. same cluster config), the ray_start_regular_shared fixture takes care of sharing a common cluster across tests. But since this test needs a special cluster config, the shutdown_only fixture is the way to go.

clarkzinzow · 2022-07-02T02:14:33Z

python/ray/data/tests/test_dataset.py

@@ -4233,7 +4233,7 @@ def f(should_import_polars):
        ctx.use_polars = original_use_polars


-def test_actorpoolstrategy_apply_interrupt():
+def test_actorpoolstrategy_apply_interrupt(shutdown_only):


Supernit: test_actor_pool_strategy_apply_interrupt and test_actor_pool_strategy_default_max_actors may be a bit more readable, snake-casing the camel-cased class name instead of actorpoolstrategy. This is up to personal preference I think.

clarkzinzow · 2022-07-02T02:14:59Z

python/ray/data/tests/test_dataset.py

+def test_actorpoolstrategy_default_max_actors(shutdown_only):
+    def f(x):
+        import time
+


Yep our formatting script will add a newline after imports, even dynamic imports within functions.

ericl · 2022-07-03T21:40:28Z

Merged, thanks!

c21 · 2022-07-04T02:46:48Z

Thank you @ericl and @clarkzinzow for review!

* master: (104 commits) [Serve] Java Client API and End to End Tests (ray-project#22726) [Docs] Small fix to AIR examples descriptions (ray-project#26227) [Deployment Graph] Move `Deployment` creation outside to build function (ray-project#26129) [K8s][Ray Operator] Ignore resource requests when detected container resources. (ray-project#26234) Revert "[Core] Add retry exception allowlist for user-defined filteri… (ray-project#26289) [ci] pin gpustat (ray-project#26311) [tune] fix `set_tune_experiment` (ray-project#26298) Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (ray-project#26231) [Release] Use nightly base images for release tests (ray-project#25373) Revert "[Core] fix gRPC handlers' unlimited active calls configuration (ray-project#25626)" (ray-project#26202) [RLlib] Some Docs fixes (2). (ray-project#26265) [C++ worker] Refine worker context and more (ray-project#26281) Fix file_system_monitor.cc message (ray-project#26143) [Java] Make Java test more stable (ray-project#26282) [air] Do not warn of `checkpoint_dir` if it's coming from us (base_trainer). (ray-project#26259) [Datasets] Support drop_columns API (ray-project#26200) [Datasets] Fix max number of actors for default actor pool strategy (ray-project#26266) [ci] Stop syncer staging tests (ray-project#26273) [core][gcs] Add storage namespace to redis storage in GCS. (ray-project#25994) [workflow] Deprecate workflow.create (ray-project#26106) ...

Fix max number of actors for default actor pool strategy

dcec706

c21 requested review from ericl, scv119, clarkzinzow, jjyao and jianoaix as code owners July 2, 2022 01:08

c21 changed the title ~~Fix max number of actors for default actor pool strategy~~ [Datasets] Fix max number of actors for default actor pool strategy Jul 2, 2022

c21 assigned ericl, clarkzinzow and jianoaix Jul 2, 2022

c21 commented Jul 2, 2022

View reviewed changes

ericl reviewed Jul 2, 2022

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 2, 2022

Address comment to add check for minimal bound

90ec1a5

c21 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 2, 2022

ericl approved these changes Jul 2, 2022

View reviewed changes

clarkzinzow approved these changes Jul 2, 2022

View reviewed changes

Change test naming

cd6347b

ericl merged commit 7360452 into ray-project:master Jul 3, 2022

c21 deleted the max-actors branch July 4, 2022 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datasets] Fix max number of actors for default actor pool strategy #26266

[Datasets] Fix max number of actors for default actor pool strategy #26266

c21 commented Jul 2, 2022

c21 Jul 2, 2022

c21 Jul 2, 2022

clarkzinzow Jul 2, 2022

c21 Jul 2, 2022

clarkzinzow Jul 2, 2022

clarkzinzow Jul 2, 2022

c21 Jul 2, 2022

ericl Jul 2, 2022

c21 Jul 2, 2022

ericl left a comment

ericl left a comment

c21 commented Jul 2, 2022

clarkzinzow left a comment

clarkzinzow Jul 2, 2022

clarkzinzow Jul 2, 2022

clarkzinzow Jul 2, 2022

ericl commented Jul 3, 2022

c21 commented Jul 4, 2022

[Datasets] Fix max number of actors for default actor pool strategy #26266

[Datasets] Fix max number of actors for default actor pool strategy #26266

Conversation

c21 commented Jul 2, 2022

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

c21 commented Jul 2, 2022

clarkzinzow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl commented Jul 3, 2022

c21 commented Jul 4, 2022