Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] [docs] Datastream docs rename [5/n] #34512

Merged
merged 63 commits into from
Apr 20, 2023

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Apr 18, 2023

Why are these changes needed?

Part 5 of #34235

Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Ray Data simplifies general purpose parallel GPU and CPU compute in Ray through its
powerful :ref:`Datastream <datastream_concept>` primitive. Datastreams enables workloads such as
:ref:`GPU batch inference <ref-use-cases-batch-infer>` efficiently on large datasets. Ray Data manages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change large datasets to large input data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Lower-case datasets just refers to datasets in general, not the Dataset class.

:align: center

..
https://docs.google.com/drawings/d/132jhE3KXZsf29ho1yUdPrCHB9uheHBWHJhDQMXqIVPA/edit

Datasets can shuffle hundreds of terabytes of data. For an in-depth guide on shuffle performance, read :ref:`Performance Tips and Tuning <shuffle_performance_tips>`.
Datastream can shuffle multi-terabyte datasets, leveraging the Ray object store for disk spilling. For an in-depth guide on shuffle performance, read :ref:`Performance Tips and Tuning <shuffle_performance_tips>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why datasets here is better than data before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower case 'datasets' is fine to use; this was just an incidental fix to improve the wording here.

Signed-off-by: Eric Liang <[email protected]>
@ericl ericl merged commit c0dff99 into ray-project:master Apr 20, 2023
@jjyao jjyao mentioned this pull request Apr 21, 2023
8 tasks
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
angelinalg added a commit that referenced this pull request Apr 24, 2023
#34512  changed the link to `data/datastream.html`, which doesn't exist. 

Signed-off-by: angelinalg <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants