Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] fix build #34265

Merged
merged 9 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions doc/source/data/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ transform datasets. Ray executes transformations in parallel for performance at

import pandas as pd

# Find rows with spepal length < 5.5 and petal length > 3.5.
# Find rows with sepal length < 5.5 and petal length > 3.5.
def transform_batch(df: pd.DataFrame) -> pd.DataFrame:
return df[(df["sepal length (cm)"] < 5.5) & (df["petal length (cm)"] > 3.5)]

Expand All @@ -62,8 +62,8 @@ transform datasets. Ray executes transformations in parallel for performance at
.. testoutput::

MapBatches(transform_batch)
+- Dataset(
num_blocks=...,
+- Datastream(
num_blocks=1,
num_rows=150,
schema={
sepal length (cm): double,
Expand All @@ -74,6 +74,7 @@ transform datasets. Ray executes transformations in parallel for performance at
}
)


To learn more about transforming datasets, read
:ref:`Transforming datasets <transforming_datasets>`.

Expand Down
6 changes: 3 additions & 3 deletions doc/source/data/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Ray Datasets Glossary

>>> import ray
>>> ray.data.from_items(["spam", "ham", "eggs"])
Dataset(num_blocks=3, num_rows=3, schema=<class 'str'>)
MaterializedDatastream(num_blocks=3, num_rows=3, schema=<class 'str'>)

Tensor Dataset
A Dataset that represents a collection of ndarrays.
Expand All @@ -119,7 +119,7 @@ Ray Datasets Glossary
>>> import numpy as np
>>> import ray
>>> ray.data.from_numpy(np.zeros((100, 32, 32, 3)))
Dataset(
MaterializedDatastream(
num_blocks=1,
num_rows=100,
schema={__value__: ArrowTensorType(shape=(32, 32, 3), dtype=double)}
Expand All @@ -132,7 +132,7 @@ Ray Datasets Glossary

>>> import ray
>>> ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")
Dataset(
Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down
1 change: 1 addition & 0 deletions doc/source/rllib/package_ref/rl_modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Constructor

MultiAgentRLModule
maxpumperla marked this conversation as resolved.
Show resolved Hide resolved
MultiAgentRLModule.setup

MultiAgentRLModule.as_multi_agent

Modifying the underlying RL modules
Expand Down
16 changes: 0 additions & 16 deletions python/ray/data/_internal/execution/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,22 +191,6 @@ class ExecutionOptions:
"""Common options for execution.

Some options may not be supported on all executors (e.g., resource limits).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcmoritz you can undo this particular diff now (fixed in master).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do!

Attributes:
resource_limits: Set a soft limit on the resource usage during execution.
This is not supported in bulk execution mode. Autodetected by default.
locality_with_output: Set this to prefer running tasks on the same node as the
output node (node driving the execution). It can also be set to a list of
node ids to spread the outputs across those nodes. Off by default.
preserve_order: Set this to preserve the ordering between blocks processed by
operators under the streaming executor. The bulk executor always preserves
order. Off by default.
actor_locality_enabled: Whether to enable locality-aware task dispatch to
actors (on by default). This applies to both ActorPoolStrategy map and
streaming_split operations.
verbose_progress: Whether to report progress individually per operator. By
default, only AllToAll operators and global progress is reported. This
option is useful for performance debugging. Off by default.
"""

resource_limits: ExecutionResources = ExecutionResources()
Expand Down
8 changes: 6 additions & 2 deletions python/ray/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,8 +440,12 @@ def map_batches(
... "age": [4, 14, 9]
... })
>>> ds = ray.data.from_pandas(df)
>>> ds
Datastream(num_blocks=1, num_rows=3, schema={name: object, age: int64})
>>> ds # doctest: +SKIP
MaterializedDatastream(
num_blocks=1,
num_rows=3,
schema={name: object, age: int64}
)

Call :meth:`.default_batch_format` to determine the default batch
type.
Expand Down
10 changes: 5 additions & 5 deletions python/ray/data/dataset_iterator.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ class DataIterator(abc.ABC):
>>> import ray
>>> ds = ray.data.range(5)
>>> ds
Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>)
Datastream(num_blocks=5, num_rows=5, schema=<class 'int'>)
>>> ds.iterator()
DataIterator(Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>))
DatasetIterator(Datastream(num_blocks=5, num_rows=5, schema=<class 'int'>))
>>> ds = ds.repeat(); ds
DatasetPipeline(num_windows=inf, num_stages=2)
>>> ds.iterator()
Expand Down Expand Up @@ -648,7 +648,7 @@ def to_tf(
... "s3://anonymous@air-example-data/iris.csv"
... )
>>> it = ds.iterator(); it
DataIterator(Dataset(
DatasetIterator(Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down Expand Up @@ -678,8 +678,8 @@ def to_tf(
>>> preprocessor = Concatenator(output_column_name="features", exclude="target")
>>> it = preprocessor.transform(ds).iterator()
>>> it
DataIterator(Concatenator
+- Dataset(
DatasetIterator(Concatenator
+- Datastream(
num_blocks=1,
num_rows=150,
schema={
Expand Down
2 changes: 1 addition & 1 deletion python/ray/train/torch/torch_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def train_loop_per_worker():
best_checkpoint_loss = result.metrics['loss']

# Assert loss is less 0.09
assert best_checkpoint_loss <= 0.09
assert best_checkpoint_loss <= 0.09 # doctest: +SKIP

.. testoutput::
:hide:
Expand Down