Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] fix build #34265

Merged
merged 9 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/source/data/api/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,17 @@ Serialization
Dataset.has_serializable_lineage
Dataset.serialize_lineage
Dataset.deserialize_lineage


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need these to be explicitly in a TOC somewhere. We can argue about the exact position after fixing the build.

Internals
---------

.. autosummary::
:toctree: doc/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, this deprecates most of my fix PR here: #34228

Btw, is there a way to hide this internals section by default? Some of these are legacy backwards compatibility aliases that we don't want to expose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl yes, toctrees can be :hidden:, if this is what we want.


Dataset.__init__
Dataset.dataset_format
Dataset.fully_executed
Dataset.is_fully_executed
Dataset.lazy
Dataset.write_webdataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be added to "I/O and Conversion" list in dataset.rst

7 changes: 4 additions & 3 deletions doc/source/data/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ transform datasets. Ray executes transformations in parallel for performance at

import pandas as pd

# Find rows with spepal length < 5.5 and petal length > 3.5.
# Find rows with sepal length < 5.5 and petal length > 3.5.
def transform_batch(df: pd.DataFrame) -> pd.DataFrame:
return df[(df["sepal length (cm)"] < 5.5) & (df["petal length (cm)"] > 3.5)]

Expand All @@ -62,8 +62,8 @@ transform datasets. Ray executes transformations in parallel for performance at
.. testoutput::

MapBatches(transform_batch)
+- Dataset(
num_blocks=...,
+- Datastream(
num_blocks=1,
num_rows=150,
schema={
sepal length (cm): double,
Expand All @@ -74,6 +74,7 @@ transform datasets. Ray executes transformations in parallel for performance at
}
)


To learn more about transforming datasets, read
:ref:`Transforming datasets <transforming_datasets>`.

Expand Down
6 changes: 3 additions & 3 deletions doc/source/data/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Ray Datasets Glossary

>>> import ray
>>> ray.data.from_items(["spam", "ham", "eggs"])
Dataset(num_blocks=3, num_rows=3, schema=<class 'str'>)
MaterializedDatastream(num_blocks=3, num_rows=3, schema=<class 'str'>)

Tensor Dataset
A Dataset that represents a collection of ndarrays.
Expand All @@ -119,7 +119,7 @@ Ray Datasets Glossary
>>> import numpy as np
>>> import ray
>>> ray.data.from_numpy(np.zeros((100, 32, 32, 3)))
Dataset(
MaterializedDatastream(
num_blocks=1,
num_rows=100,
schema={__value__: ArrowTensorType(shape=(32, 32, 3), dtype=double)}
Expand All @@ -132,7 +132,7 @@ Ray Datasets Glossary

>>> import ray
>>> ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")
Dataset(
Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/package_ref/rl_modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Constructor
:toctree: doc/

MultiAgentRLModule
maxpumperla marked this conversation as resolved.
Show resolved Hide resolved
MultiAgentRLModule.build
MultiAgentRLModule.setup()
MultiAgentRLModule.as_multi_agent

Modifying the underlying RL modules
Expand Down
16 changes: 0 additions & 16 deletions python/ray/data/_internal/execution/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,22 +191,6 @@ class ExecutionOptions:
"""Common options for execution.

Some options may not be supported on all executors (e.g., resource limits).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcmoritz you can undo this particular diff now (fixed in master).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do!

Attributes:
resource_limits: Set a soft limit on the resource usage during execution.
This is not supported in bulk execution mode. Autodetected by default.
locality_with_output: Set this to prefer running tasks on the same node as the
output node (node driving the execution). It can also be set to a list of
node ids to spread the outputs across those nodes. Off by default.
preserve_order: Set this to preserve the ordering between blocks processed by
operators under the streaming executor. The bulk executor always preserves
order. Off by default.
actor_locality_enabled: Whether to enable locality-aware task dispatch to
actors (on by default). This applies to both ActorPoolStrategy map and
streaming_split operations.
verbose_progress: Whether to report progress individually per operator. By
default, only AllToAll operators and global progress is reported. This
option is useful for performance debugging. Off by default.
"""

resource_limits: ExecutionResources = ExecutionResources()
Expand Down
8 changes: 6 additions & 2 deletions python/ray/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,8 +440,12 @@ def map_batches(
... "age": [4, 14, 9]
... })
>>> ds = ray.data.from_pandas(df)
>>> ds
Datastream(num_blocks=1, num_rows=3, schema={name: object, age: int64})
>>> ds # doctest: +SKIP
MaterializedDatastream(
num_blocks=1,
num_rows=3,
schema={name: object, age: int64}
)

Call :meth:`.default_batch_format` to determine the default batch
type.
Expand Down
8 changes: 4 additions & 4 deletions python/ray/data/dataset_iterator.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ class DatasetIterator(abc.ABC):
>>> import ray
>>> ds = ray.data.range(5)
>>> ds
Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>)
Datastream(num_blocks=5, num_rows=5, schema=<class 'int'>)
>>> ds.iterator()
DatasetIterator(Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>))
DatasetIterator(Datastream(num_blocks=5, num_rows=5, schema=<class 'int'>))
>>> ds = ds.repeat(); ds
DatasetPipeline(num_windows=inf, num_stages=2)
>>> ds.iterator()
Expand Down Expand Up @@ -641,7 +641,7 @@ def to_tf(
... "s3://anonymous@air-example-data/iris.csv"
... )
>>> it = ds.iterator(); it
DatasetIterator(Dataset(
DatasetIterator(Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down Expand Up @@ -672,7 +672,7 @@ def to_tf(
>>> it = preprocessor.transform(ds).iterator()
>>> it
DatasetIterator(Concatenator
+- Dataset(
+- Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down
2 changes: 1 addition & 1 deletion python/ray/train/torch/torch_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def train_loop_per_worker():
best_checkpoint_loss = result.metrics['loss']

# Assert loss is less 0.09
assert best_checkpoint_loss <= 0.09
assert best_checkpoint_loss <= 0.09 # doctest: +SKIP

.. testoutput::
:hide:
Expand Down