-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] [docs] Datastream docs rename [5/n] #34512
Conversation
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing several Dataset
in https://anyscale-ray--34512.com.readthedocs.build/en/34512/data/examples/nyc_taxi_basic_processing.html and https://anyscale-ray--34512.com.readthedocs.build/en/34512/data/examples/ocr_example.html .
Are we planning to change them in a followup PR?
|
||
Ray Data simplifies general purpose parallel GPU and CPU compute in Ray through its | ||
powerful :ref:`Datastream <datastream_concept>` primitive. Datastreams enables workloads such as | ||
:ref:`GPU batch inference <ref-use-cases-batch-infer>` efficiently on large datasets. Ray Data manages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change large datasets
to large input data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. Lower-case datasets just refers to datasets in general, not the Dataset class.
:align: center | ||
|
||
.. | ||
https://docs.google.com/drawings/d/132jhE3KXZsf29ho1yUdPrCHB9uheHBWHJhDQMXqIVPA/edit | ||
|
||
Datasets can shuffle hundreds of terabytes of data. For an in-depth guide on shuffle performance, read :ref:`Performance Tips and Tuning <shuffle_performance_tips>`. | ||
Datastream can shuffle multi-terabyte datasets, leveraging the Ray object store for disk spilling. For an in-depth guide on shuffle performance, read :ref:`Performance Tips and Tuning <shuffle_performance_tips>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why datasets
here is better than data
before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower case 'datasets' is fine to use; this was just an incidental fix to improve the wording here.
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Signed-off-by: Eric Liang <[email protected]>
Part 5 of ray-project#34235 Signed-off-by: elliottower <[email protected]>
#34512 changed the link to `data/datastream.html`, which doesn't exist. Signed-off-by: angelinalg <[email protected]>
Part 5 of ray-project#34235 Signed-off-by: Jack He <[email protected]>
Why are these changes needed?
Part 5 of #34235