[Data] collate_fn in iter_torch_batches could be a bottleneck #33508

llan-ml · 2023-03-21T03:38:16Z

Description

The current implementation does custom batching in a sequential manner. In some cases that the user-defined collate_fn consumes much time, the overall dataset pipeline is still slow. In contrast, the collate_fn in pytorch DataLoader executes in parallel.

Use case

No response

The text was updated successfully, but these errors were encountered:

amogkam · 2023-03-21T06:06:52Z

this will be fixed by #33510!

the _collate_fn will run in a threadpool

raulchen · 2023-05-25T18:50:02Z

@amogkam #33510 was closed. Anything else we can do for this issue?

genesis-jamin · 2023-11-20T01:54:35Z

Was this feature ever implemented? #33510 seems to have been closed without merging, and from viewing the flame graph I'm seeing significant time spent in my custom collate function.

justinvyu · 2023-11-20T17:47:03Z

@genesis-jamin The linked PR was a prototype that got split into multiple PRs that have already been merged.

llan-ml added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 21, 2023

Yard1 mentioned this issue Mar 22, 2023

[Data] Async iter_batches #33510

Closed

10 tasks

scottjlee added P1 Issue that should be fixed within a few weeks data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 24, 2023

anyscalesam closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] collate_fn in iter_torch_batches could be a bottleneck #33508

[Data] collate_fn in iter_torch_batches could be a bottleneck #33508

llan-ml commented Mar 21, 2023

amogkam commented Mar 21, 2023

raulchen commented May 25, 2023

genesis-jamin commented Nov 20, 2023 •

edited

Loading

justinvyu commented Nov 20, 2023

[Data] collate_fn in iter_torch_batches could be a bottleneck #33508

[Data] collate_fn in iter_torch_batches could be a bottleneck #33508

Comments

llan-ml commented Mar 21, 2023

Description

Use case

amogkam commented Mar 21, 2023

raulchen commented May 25, 2023

genesis-jamin commented Nov 20, 2023 • edited Loading

justinvyu commented Nov 20, 2023

genesis-jamin commented Nov 20, 2023 •

edited

Loading