Make a pass fixing Dataset API issues #22886

ericl · 2022-03-07T23:30:01Z

Why are these changes needed?

This fixes the following issues:

should force compute="actors" when passing a stateful callable to map
remove _spread_resource_prefix
remove _move
remove .pipeline
promote to PublicAPI: fully_executed, stats
DatasetPipeline: underscores for constants
DatasetPipeline: add fetch_if_missing to schema
DatasetPipeline: move stats to public api
DatasetPipeline: remove foreach_dataset
read_api: remove _spread_resource_prefix
read_api: make tensor_column_schema supported
compute: make ComputeStrategy DeveloperAPI

jjyao · 2022-03-08T00:22:52Z

python/ray/data/dataset.py

@@ -507,8 +507,6 @@ def random_shuffle(
        *,
        seed: Optional[int] = None,
        num_blocks: Optional[int] = None,
-        _spread_resource_prefix: Optional[str] = None,


@scv119 Have we confirmed with Uber that they no longer need _spread_resource_prefix?

AFAIK we have not.

python/ray/data/tests/test_dataset.py

jjyao

LGTM. I'll let @clarkzinzow to take another look.

python/ray/data/tests/test_dataset.py

clarkzinzow

Nice! So happy to see _spread_resource_prefix go away. 🎉

clarkzinzow · 2022-03-08T20:41:12Z

python/ray/data/dataset.py

+            A list of references to this dataset's blocks.
+        """
+        return self._plan.execute().get_blocks()
+
    def _experimental_lazy(self) -> "Dataset[T]":


Should we de-experimentalize lazy evaluation before GA?

Good question. I'm leaning towards no, given that we have read task fusion enabled at least, and most of the critical use cases involve pipelines. But we can revisit based on user needs.

ericl added 5 commits March 7, 2022 15:15

remove iter

c916673

update

ae12a4c

wip

b90ea59

wip

b6afe1b

update

3cbe9ec

ericl requested review from scv119, clarkzinzow and jjyao as code owners March 7, 2022 23:30

ericl assigned jjyao and clarkzinzow Mar 7, 2022

jjyao reviewed Mar 8, 2022

View reviewed changes

ericl added 3 commits March 7, 2022 18:02

fix

023e445

promote from beta

7c87a40

fix

b788e81

jjyao reviewed Mar 8, 2022

View reviewed changes

python/ray/data/tests/test_dataset.py Outdated Show resolved Hide resolved

remove

1e4467f

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 8, 2022

clarkzinzow approved these changes Mar 8, 2022

View reviewed changes

ericl merged commit 52491c8 into ray-project:master Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a pass fixing Dataset API issues #22886

Make a pass fixing Dataset API issues #22886

ericl commented Mar 7, 2022

jjyao Mar 8, 2022

clarkzinzow Mar 8, 2022

jjyao left a comment

clarkzinzow left a comment

clarkzinzow Mar 8, 2022

ericl Mar 8, 2022

Make a pass fixing Dataset API issues #22886

Make a pass fixing Dataset API issues #22886

Conversation

ericl commented Mar 7, 2022

Why are these changes needed?

jjyao Mar 8, 2022

Choose a reason for hiding this comment

clarkzinzow Mar 8, 2022

Choose a reason for hiding this comment

jjyao left a comment

Choose a reason for hiding this comment

clarkzinzow left a comment

Choose a reason for hiding this comment

clarkzinzow Mar 8, 2022

Choose a reason for hiding this comment

ericl Mar 8, 2022

Choose a reason for hiding this comment