[Datasets] Add a new `"zero_copy"` batch format #32662

amogkam · 2023-02-17T20:38:40Z

Currently, predictors and preprocessors manually call into dataset.dataset_format() to determine what batch format to use that results in zero copy, and thus allowing preprocessors/predictors to delegate to format-specific optimized implementations of their transformation functions

However, this triggers execution of the dataset. Instead, we should introduce a "zero-copy" batch format in datasets that can pick the best batch format during runtime.

This will also allow us to deprecate dataset_format.

The text was updated successfully, but these errors were encountered:

amogkam · 2023-02-17T20:40:34Z

cc @clarkzinzow

c21 · 2023-03-16T05:51:17Z

Let's also deprecate Dataset.dataset_format.

ericl · 2023-03-25T00:15:55Z

Closed in #33562

amogkam mentioned this issue Feb 17, 2023

[AIR] Success criteria for numpy narrow waist #28346

Closed

21 tasks

amogkam changed the title ~~Add a new "zero_copy" batch format that allows preprocessors/predictors to delegate to format-specific optimized implementations of their transformation functions~~ Add a new "zero_copy" batch format Feb 17, 2023

amogkam added the data Ray Data-related issues label Feb 17, 2023

amogkam added this to the Dataset Streaming Execution milestone Feb 17, 2023

clarkzinzow changed the title ~~Add a new "zero_copy" batch format~~ [Datasets] Add a new "zero_copy" batch format Feb 17, 2023

c21 assigned amogkam Feb 22, 2023

ericl modified the milestone: Dataset Streaming Execution Feb 23, 2023

c21 added P1 Issue that should be fixed within a few weeks Ray 2.4 labels Mar 16, 2023

ericl added Ray 2.5 and removed Ray 2.4 labels Mar 20, 2023

ericl closed this as completed Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datasets] Add a new `"zero_copy"` batch format #32662

[Datasets] Add a new `"zero_copy"` batch format #32662

amogkam commented Feb 17, 2023 •

edited

Loading

amogkam commented Feb 17, 2023

c21 commented Mar 16, 2023

ericl commented Mar 25, 2023

[Datasets] Add a new "zero_copy" batch format #32662

[Datasets] Add a new "zero_copy" batch format #32662

Comments

amogkam commented Feb 17, 2023 • edited Loading

amogkam commented Feb 17, 2023

c21 commented Mar 16, 2023

ericl commented Mar 25, 2023

[Datasets] Add a new `"zero_copy"` batch format #32662

[Datasets] Add a new `"zero_copy"` batch format #32662

amogkam commented Feb 17, 2023 •

edited

Loading