Skip to content

Commit

Permalink
[Data] [Docs] Standardize API Refs for Input/Output (#37017)
Browse files Browse the repository at this point in the history
Standardize API refs for some of the I/O APIs.

---------

Signed-off-by: amogkam <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Co-authored-by: angelinalg <[email protected]>
  • Loading branch information
amogkam and angelinalg authored Jul 7, 2023
1 parent dfee3f0 commit 957f9b7
Show file tree
Hide file tree
Showing 11 changed files with 1,128 additions and 643 deletions.
16 changes: 16 additions & 0 deletions doc/source/data/api/input_output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,9 @@ Partitioning API
datasource.PathPartitionEncoder
datasource.PathPartitionParser
datasource.PathPartitionFilter
datasource.FileExtensionFilter

.. _metadata_provider:

MetadataProvider API
--------------------
Expand All @@ -240,3 +243,16 @@ MetadataProvider API
datasource.DefaultFileMetadataProvider
datasource.DefaultParquetMetadataProvider
datasource.FastFileMetadataProvider


.. _block_write_path_provider:

BlockWritePathProvider API
--------------------------

.. autosummary::
:toctree: doc/

datasource.BlockWritePathProvider
datasource.DefaultBlockWritePathProvider

2 changes: 2 additions & 0 deletions doc/source/data/performance-tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ If your transformation isn't vectorized, there's no performance benefit.
Optimizing reads
----------------

.. _read_parallelism:

Tuning read parallelism
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
651 changes: 394 additions & 257 deletions python/ray/data/dataset.py

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions python/ray/data/examples/data/different-extensions/data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
a,b
0,1
Empty file.
Binary file added python/ray/data/examples/data/iris.tfrecords.gz
Binary file not shown.
2 changes: 2 additions & 0 deletions python/ray/data/examples/data/year=2022/month=09/sales.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
order_number,quantity
10107,30
4 changes: 4 additions & 0 deletions python/ray/data/examples/data/year=2022/month=09/sales.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"order_number": 10107,
"quantity": 30
}
1,087 changes: 707 additions & 380 deletions python/ray/data/read_api.py

Large diffs are not rendered by default.

3 changes: 0 additions & 3 deletions python/ray/data/tests/test_consumption.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,9 +494,6 @@ def test_convert_types(ray_start_regular_shared):
def test_from_items(ray_start_regular_shared):
ds = ray.data.from_items(["hello", "world"])
assert extract_values("item", ds.take()) == ["hello", "world"]

ds = ray.data.from_items([{"hello": "world"}], output_arrow_format=True)
assert ds.take() == [{"hello": "world"}]
assert isinstance(next(ds.iter_batches(batch_format=None)), pa.Table)


Expand Down
4 changes: 1 addition & 3 deletions rllib/offline/dataset_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,7 @@ def write(self, sample_batch: SampleBatchType):
# Todo: We should flush at the end of sampling even if this
# condition was not reached.
if len(self.samples) >= self.max_num_samples_per_file:
ds = data.from_items(self.samples, output_arrow_format=True).repartition(
num_blocks=1, shuffle=False
)
ds = data.from_items(self.samples).repartition(num_blocks=1, shuffle=False)
if self.format == "json":
ds.write_json(self.path, try_create_dir=True)
elif self.format == "parquet":
Expand Down

0 comments on commit 957f9b7

Please sign in to comment.