Skip to content

Commit

Permalink
[Data][Doc] Add tip about how to understand map_batches format (#47394)
Browse files Browse the repository at this point in the history
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Add a note to help users understand the format inside map_batches.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Philipp Moritz <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]>
  • Loading branch information
pcmoritz and bveeramani authored Oct 4, 2024
1 parent 5a84f14 commit 48a0444
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions python/ray/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,10 @@ def map_batches(
stateful Ray actors. For more information, see
:ref:`Stateful Transforms <stateful_transforms>`.
.. tip::
To understand the format of the input to ``fn``, call :meth:`~Dataset.take_batch`
on the dataset to get a batch in the same format as will be passed to ``fn``.
.. tip::
If ``fn`` doesn't mutate its input, set ``zero_copy_batch=True`` to improve
performance and decrease memory utilization.
Expand Down Expand Up @@ -562,6 +566,11 @@ def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
:meth:`~Dataset.iter_batches`
Call this function to iterate over batches of data.
:meth:`~Dataset.take_batch`
Call this function to get a batch of data from the dataset
in the same format as will be passed to the `fn` function of
:meth:`~Dataset.map_batches`.
:meth:`~Dataset.flat_map`
Call this method to create new records from existing ones. Unlike
:meth:`~Dataset.map`, a function passed to :meth:`~Dataset.flat_map`
Expand Down

0 comments on commit 48a0444

Please sign in to comment.