[Data] Support using concurrent actors for `ActorPool` #34253

amogkam · 2023-04-11T00:59:42Z

Support using concurrent actors for ActorPool. We do this by gating the user UDF in a separate threadpool of max size 1.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: amogkam <[email protected]>

ericl

A couple questions:

Should we always use this thread-pool code path regardless of the setting?
Do we have any segfault issues right now?
How does this relate to the thread-based pipelining code you have for accelerated prefetch? It seems like we could also use that pipeline verbatim, with each actor thread feeding into the pipeline and awaiting the output at the end.

amogkam · 2023-04-11T02:09:04Z

We can. Are you suggesting for code simplicity?
This approach doesn't run into segfault
Hmm I don't think that can very easily be plugged in as is. I think the pattern is different right? For prefetching there is a single producer with multiple threads being used for the computation. Here, there are multiple producers, with a single thread being used for computation.

Signed-off-by: amogkam <[email protected]>

…ool-concurrent-actor

Signed-off-by: amogkam <[email protected]>

amogkam · 2023-04-11T19:12:24Z

Made the change for point 1

amogkam · 2023-04-11T19:12:54Z

python/ray/data/_internal/execution/util.py

+
+    class _Wrapper(callable_cls):
+        def __init__(self, *args, **kwargs):
+            self.thread_pool_executor = ThreadPoolExecutor(max_workers=1)


@ericl we have max_workers=1 here so the user's UDF will always be run in a single thread.

amogkam · 2023-04-11T19:13:09Z

python/ray/data/tests/test_dataset_map.py

+    # Make sure user's UDF is not running concurrently.
+    assert len(set(thread_ids)) == 1


test here @ericl

c21

should we update documentation as well? so users are aware that they can use the concurrent actors by setting max_concurrency in map_batches?

…4253) Support using concurrent actors for ActorPool. We do this by gating the user UDF in a separate threadpool of max size 1. --------- Signed-off-by: amogkam <[email protected]> Signed-off-by: elliottower <[email protected]>

…4253) Support using concurrent actors for ActorPool. We do this by gating the user UDF in a separate threadpool of max size 1. --------- Signed-off-by: amogkam <[email protected]> Signed-off-by: Jack He <[email protected]>

update

b5ca090

Signed-off-by: amogkam <[email protected]>

amogkam requested review from ericl, scv119, clarkzinzow, jjyao, jianoaix and c21 as code owners April 11, 2023 00:59

amogkam assigned ericl and c21 Apr 11, 2023

ericl reviewed Apr 11, 2023

View reviewed changes

amogkam added 3 commits April 10, 2023 19:21

fix

4949314

Signed-off-by: amogkam <[email protected]>

Merge branch 'master' of github.com:ray-project/ray into data-actor-p…

86f01fd

…ool-concurrent-actor

update

1c4ecf7

Signed-off-by: amogkam <[email protected]>

amogkam commented Apr 11, 2023

View reviewed changes

ericl approved these changes Apr 11, 2023

View reviewed changes

c21 approved these changes Apr 11, 2023

View reviewed changes

amogkam merged commit c8a4b98 into ray-project:master Apr 11, 2023

amogkam deleted the data-actor-pool-concurrent-actor branch April 11, 2023 21:49

amogkam mentioned this pull request Apr 12, 2023

[data] [streaming] Support async/thread-pool batch generation for actor pool map and iter_batches() #31911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Support using concurrent actors for `ActorPool` #34253

[Data] Support using concurrent actors for `ActorPool` #34253

amogkam commented Apr 11, 2023

ericl left a comment

amogkam commented Apr 11, 2023

amogkam commented Apr 11, 2023

amogkam Apr 11, 2023

amogkam Apr 11, 2023

c21 left a comment

		# Make sure user's UDF is not running concurrently.
		assert len(set(thread_ids)) == 1

[Data] Support using concurrent actors for ActorPool #34253

[Data] Support using concurrent actors for ActorPool #34253

Conversation

amogkam commented Apr 11, 2023

Why are these changes needed?

Related issue number

Checks

ericl left a comment

Choose a reason for hiding this comment

amogkam commented Apr 11, 2023

amogkam commented Apr 11, 2023

amogkam Apr 11, 2023

Choose a reason for hiding this comment

amogkam Apr 11, 2023

Choose a reason for hiding this comment

c21 left a comment

Choose a reason for hiding this comment

[Data] Support using concurrent actors for `ActorPool` #34253

[Data] Support using concurrent actors for `ActorPool` #34253