-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a pass fixing Dataset API issues #22886
Conversation
@@ -507,8 +507,6 @@ def random_shuffle( | |||
*, | |||
seed: Optional[int] = None, | |||
num_blocks: Optional[int] = None, | |||
_spread_resource_prefix: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scv119 Have we confirmed with Uber that they no longer need _spread_resource_prefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we have not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll let @clarkzinzow to take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! So happy to see _spread_resource_prefix
go away. 🎉
A list of references to this dataset's blocks. | ||
""" | ||
return self._plan.execute().get_blocks() | ||
|
||
def _experimental_lazy(self) -> "Dataset[T]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we de-experimentalize lazy evaluation before GA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm leaning towards no, given that we have read task fusion enabled at least, and most of the critical use cases involve pipelines. But we can revisit based on user needs.
Why are these changes needed?
This fixes the following issues: