-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Bulk executor initial implementation #30903
Conversation
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Some progress: now passing a good portion of dataset tests:
|
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
@@ -28,8 +28,6 @@ def __init__( | |||
ray_remote_args: Remote arguments for the Ray actors to be created. | |||
pool_size: The size of the actor pool. | |||
""" | |||
if "num_cpus" not in ray_remote_args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl I reverted this, since it looks not compatible with the requirement that "num_gpus and num_cpus cannot be both specified" (user may want to run actors on gpus and then they won't be able tp specify cpus).
All CI tests are passing now. However, we seem to have a release test failure that may be relevant: https://buildkite.com/ray-project/release-tests-pr/builds/26265#0185e158-7979-4c94-be61-22da62d208ca |
It seems likely due to the lack of autoscaling for actor pool, as it had just one actor through the entire run, @clarkzinzow And since it's about the same issue as TODO in unit test, shall we note this as a debt to fix soon and merge this PR? @ericl
|
Alright. Let's leave the TODO to fix this test (or we can fix this test by increasing the min pool size). |
…-project#31283) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>
…oject#31305) Add the initial operator implementations. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>
…oject#31443) Add the basic bulk executor. This is split out from ray-project#30903 Signed-off-by: tmynn <[email protected]>
Why are these changes needed?
Initial implementation of ray-project/enhancements#18
Original prototype: https://github.com/ray-project/ray/pull/30222/files
TODO:
Merge plan: