-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Fix max number of actors for default actor pool strategy #26266
Conversation
@@ -295,7 +297,8 @@ def map_block_nosplit( | |||
if not ready: | |||
if ( | |||
len(workers) < self.max_size | |||
and len(ready_workers) / len(workers) > 0.8 | |||
and len(ready_workers) / len(workers) | |||
> self.ready_to_total_workers_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird format to me, but this is the result after running scripts/format.sh
.
def test_actorpoolstrategy_default_max_actors(shutdown_only): | ||
def f(x): | ||
import time | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, scripts/format.sh
decides to add a new line here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep our formatting script will add a newline after imports, even dynamic imports within functions.
@@ -4233,7 +4233,7 @@ def f(should_import_polars): | |||
ctx.use_polars = original_use_polars | |||
|
|||
|
|||
def test_actorpoolstrategy_apply_interrupt(): | |||
def test_actorpoolstrategy_apply_interrupt(shutdown_only): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutdown_only
has to add here, o.w. the newly added unit test test_actorpoolstrategy_default_max_actors
below cannot call ray.init
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is to be expected for tests that start a Ray cluster! If the cluster is shareable (e.g. same cluster config), the ray_start_regular_shared
fixture takes care of sharing a common cluster across tests. But since this test needs a special cluster config, the shutdown_only
fixture is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supernit: test_actor_pool_strategy_apply_interrupt
and test_actor_pool_strategy_default_max_actors
may be a bit more readable, snake-casing the camel-cased class name instead of actorpoolstrategy
. This is up to personal preference I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkzinzow - agree, updated.
num_cpus * (1 / compute_strategy.ready_to_total_workers_ratio) | ||
) | ||
assert ( | ||
compute_strategy.num_workers <= expected_max_num_workers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also assert that it scaled up to at least 5 workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl - makes sense, added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming you've checked the test fails before the fix.
@ericl - yes, checked before creating the PR, thanks for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -4233,7 +4233,7 @@ def f(should_import_polars): | |||
ctx.use_polars = original_use_polars | |||
|
|||
|
|||
def test_actorpoolstrategy_apply_interrupt(): | |||
def test_actorpoolstrategy_apply_interrupt(shutdown_only): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is to be expected for tests that start a Ray cluster! If the cluster is shareable (e.g. same cluster config), the ray_start_regular_shared
fixture takes care of sharing a common cluster across tests. But since this test needs a special cluster config, the shutdown_only
fixture is the way to go.
@@ -4233,7 +4233,7 @@ def f(should_import_polars): | |||
ctx.use_polars = original_use_polars | |||
|
|||
|
|||
def test_actorpoolstrategy_apply_interrupt(): | |||
def test_actorpoolstrategy_apply_interrupt(shutdown_only): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supernit: test_actor_pool_strategy_apply_interrupt
and test_actor_pool_strategy_default_max_actors
may be a bit more readable, snake-casing the camel-cased class name instead of actorpoolstrategy
. This is up to personal preference I think.
def test_actorpoolstrategy_default_max_actors(shutdown_only): | ||
def f(x): | ||
import time | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep our formatting script will add a newline after imports, even dynamic imports within functions.
Merged, thanks! |
Thank you @ericl and @clarkzinzow for review! |
* master: (104 commits) [Serve] Java Client API and End to End Tests (ray-project#22726) [Docs] Small fix to AIR examples descriptions (ray-project#26227) [Deployment Graph] Move `Deployment` creation outside to build function (ray-project#26129) [K8s][Ray Operator] Ignore resource requests when detected container resources. (ray-project#26234) Revert "[Core] Add retry exception allowlist for user-defined filteri… (ray-project#26289) [ci] pin gpustat (ray-project#26311) [tune] fix `set_tune_experiment` (ray-project#26298) Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (ray-project#26231) [Release] Use nightly base images for release tests (ray-project#25373) Revert "[Core] fix gRPC handlers' unlimited active calls configuration (ray-project#25626)" (ray-project#26202) [RLlib] Some Docs fixes (2). (ray-project#26265) [C++ worker] Refine worker context and more (ray-project#26281) Fix file_system_monitor.cc message (ray-project#26143) [Java] Make Java test more stable (ray-project#26282) [air] Do not warn of `checkpoint_dir` if it's coming from us (base_trainer). (ray-project#26259) [Datasets] Support drop_columns API (ray-project#26200) [Datasets] Fix max number of actors for default actor pool strategy (ray-project#26266) [ci] Stop syncer staging tests (ray-project#26273) [core][gcs] Add storage namespace to redis storage in GCS. (ray-project#25994) [workflow] Deprecate workflow.create (ray-project#26106) ...
Why are these changes needed?
This PR is to fix the maximal number of actors used in datasets tranformation, when using default actor pool strategy (i.e.
ActorPoolStrategy()
). It should not exceed1.25 * num_cpus
set in the cluster.Related issue number
Closes #26221
Checks
scripts/format.sh
to lint the changes in this PR.