-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Support using concurrent actors for ActorPool
#34253
[Data] Support using concurrent actors for ActorPool
#34253
Conversation
Signed-off-by: amogkam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions:
- Should we always use this thread-pool code path regardless of the setting?
- Do we have any segfault issues right now?
- How does this relate to the thread-based pipelining code you have for accelerated prefetch? It seems like we could also use that pipeline verbatim, with each actor thread feeding into the pipeline and awaiting the output at the end.
|
Signed-off-by: amogkam <[email protected]>
…ool-concurrent-actor
Signed-off-by: amogkam <[email protected]>
Made the change for point 1 |
|
||
class _Wrapper(callable_cls): | ||
def __init__(self, *args, **kwargs): | ||
self.thread_pool_executor = ThreadPoolExecutor(max_workers=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl we have max_workers=1
here so the user's UDF will always be run in a single thread.
# Make sure user's UDF is not running concurrently. | ||
assert len(set(thread_ids)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test here @ericl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update documentation as well? so users are aware that they can use the concurrent actors by setting max_concurrency
in map_batches
?
…4253) Support using concurrent actors for ActorPool. We do this by gating the user UDF in a separate threadpool of max size 1. --------- Signed-off-by: amogkam <[email protected]> Signed-off-by: elliottower <[email protected]>
…4253) Support using concurrent actors for ActorPool. We do this by gating the user UDF in a separate threadpool of max size 1. --------- Signed-off-by: amogkam <[email protected]> Signed-off-by: Jack He <[email protected]>
Support using concurrent actors for
ActorPool
. We do this by gating the user UDF in a separate threadpool of max size 1.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.