[Data] collate_fn in iter_torch_batches could be a bottleneck #33508
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
Description
The current implementation does custom batching in a sequential manner. In some cases that the user-defined
collate_fn
consumes much time, the overall dataset pipeline is still slow. In contrast, thecollate_fn
in pytorch DataLoader executes in parallel.Use case
No response
The text was updated successfully, but these errors were encountered: