Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] [streaming] Support async/thread-pool batch generation for actor pool map and iter_batches() #31911

Closed
ericl opened this issue Jan 24, 2023 · 1 comment
Assignees
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks Ray 2.4

Comments

@ericl
Copy link
Contributor

ericl commented Jan 24, 2023

#31576 is only implemented in the old backend.

For the new backend, we should make sure to pipeline/asynchronously compute the batches within the actor workers in a separate thread.

We should also enable this optimization for iter_batches(), and in particular use a thread-pool to accelerate batch conversions with additional parallelism.

cc @amogkam

@ericl ericl added enhancement Request for new feature and/or capability P2 Important issue, but not time-critical data Ray Data-related issues Ray 2.4 labels Jan 24, 2023
@ericl ericl changed the title [data] [streaming] Support async batch generation for actor pool map operators [data] [streaming] Support async/thread-pool batch generation for actor pool map operators Jan 30, 2023
@ericl ericl changed the title [data] [streaming] Support async/thread-pool batch generation for actor pool map operators [data] [streaming] Support async/thread-pool batch generation for actor pool map and iter_batches() Jan 30, 2023
@ericl ericl added P1 Issue that should be fixed within a few weeks and removed P2 Important issue, but not time-critical labels Feb 6, 2023
@amogkam
Copy link
Contributor

amogkam commented Apr 12, 2023

Supported in map_batches by #34253
Supported in iter_batches by #33575, #33605, and #33620

@amogkam amogkam closed this as completed Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks Ray 2.4
Projects
None yet
Development

No branches or pull requests

2 participants