Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a pass fixing Dataset API issues #22886

Merged
merged 9 commits into from
Mar 8, 2022
Merged

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 7, 2022

Why are these changes needed?

This fixes the following issues:

  • should force compute="actors" when passing a stateful callable to map
  • remove _spread_resource_prefix
  • remove _move
  • remove .pipeline
  • promote to PublicAPI: fully_executed, stats
  • DatasetPipeline: underscores for constants
  • DatasetPipeline: add fetch_if_missing to schema
  • DatasetPipeline: move stats to public api
  • DatasetPipeline: remove foreach_dataset
  • read_api: remove _spread_resource_prefix
  • read_api: make tensor_column_schema supported
  • compute: make ComputeStrategy DeveloperAPI

@@ -507,8 +507,6 @@ def random_shuffle(
*,
seed: Optional[int] = None,
num_blocks: Optional[int] = None,
_spread_resource_prefix: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scv119 Have we confirmed with Uber that they no longer need _spread_resource_prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we have not.

python/ray/data/tests/test_dataset.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll let @clarkzinzow to take another look.

python/ray/data/tests/test_dataset.py Outdated Show resolved Hide resolved
@ericl ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 8, 2022
Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! So happy to see _spread_resource_prefix go away. 🎉

A list of references to this dataset's blocks.
"""
return self._plan.execute().get_blocks()

def _experimental_lazy(self) -> "Dataset[T]":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we de-experimentalize lazy evaluation before GA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm leaning towards no, given that we have read task fusion enabled at least, and most of the critical use cases involve pipelines. But we can revisit based on user needs.

@ericl ericl merged commit 52491c8 into ray-project:master Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants