-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Add streaming execution documentation #33941
Conversation
…project#32493)" (ray-project#33485)" This reverts commit 5c79954.
@@ -66,6 +66,8 @@ Execution | |||
|
|||
This section covers the Datasets execution model and performance considerations. | |||
|
|||
.. _datasets_lazy_execution: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add/edit this page instead of adding a new page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The DatasetPipeline is expected to be deprecated in Ray 2.5. If your use case doesn't | ||
need per-window shuffle, we recommend using the streaming execution of Dataset. By | ||
setting the resource limits, you can cap the resource usage to run the operations | ||
as the DatasetPipeline window, and achieve even better performance and stability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resources limits are too much detail to mention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to edit the key concepts page to mention streaming.
|
||
ctx.execution_options.preserve_order = True | ||
|
||
To enable deterministic execution, set the above to True. This may decrease performance, but will ensure block ordering is preserved through execution. This flag defaults to True in Ray 2.3 but False in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably simplify to say it defaults to False, since streaming isn't enabled in 2.3 by default.
|
||
.. code-block:: | ||
|
||
ctx.execution_options.actor_locality_enabled = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can note this is enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for cherry picking into 2.4.0
The test failure (https://buildkite.com/ray-project/oss-ci-build-pr/builds/17601#01875443-53d7-48eb-bfcd-f7a48e01bbfd) is not relevant and actually fixed (#34115). |
* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add streaming execution documentation * fix * feedback * remove new file * fix * fix * key concept * fix * fix * fix * wording * feedback
Approved for cherry picking into 2.4.0 |
* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add streaming execution documentation * fix * feedback * remove new file * fix * fix * key concept * fix * fix * fix * wording * feedback Signed-off-by: elliottower <[email protected]>
* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add streaming execution documentation * fix * feedback * remove new file * fix * fix * key concept * fix * fix * fix * wording * feedback Signed-off-by: Jack He <[email protected]>
Why are these changes needed?
Streaming execution will be enabled in 2.4 so we need to have documentation coverage to guide users.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.