Add objects GC in dataset iterator #34030

jianoaix · 2023-04-03T23:39:53Z

Why are these changes needed?

The DatasetIterator doesn't eagerly GC objects, which resulted in OOM of consumer nodes. The new consumer nodes that got brought up were not in sync with other healthy consumer nodes. The DatasetPipeline requires all consumers to read windows in sync, so this caused the pipeline to hang and then fail with timeout.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…project#32493)" (ray-project#33485)" This reverts commit 5c79954.

jianoaix

Tested: https://buildkite.com/ray-project/release-tests-pr/builds/33701#0187499d-df5a-4e4b-adac-fdb0bc32dca9

jianoaix · 2023-04-04T05:19:41Z

python/ray/data/tests/test_object_gc.py

@@ -24,6 +35,8 @@ def test_iter_batches_no_spilling_upon_no_transformation(shutdown_only):

    check_no_spill(ctx, ds.repeat())
    check_no_spill(ctx, ds.window(blocks_per_window=20))
+    check_to_torch_no_spill(ctx, ds.repeat())


This will fail (i.e. have spilling) if without this PR.

ericl · 2023-04-04T05:25:55Z

Nice find!

…atorgcblocks

… flaky

…atorgcblocks

c21 · 2023-04-05T21:19:58Z

python/ray/data/_internal/dataset_iterator/dataset_iterator_impl.py

@@ -30,7 +30,7 @@ def _to_block_iterator(
        ds = self._base_dataset
        block_iterator, stats, executor = ds._plan.execute_to_iterator()
        ds._current_executor = executor
-        return block_iterator, stats
+        return block_iterator, stats, False


shall we update the type hint at line 28 as well?

c21 · 2023-04-05T21:20:47Z

python/ray/data/_internal/dataset_iterator/pipelined_dataset_iterator.py

+        if epoch_pipeline._first_dataset is not None:
+            blocks_owned_by_consumer = (
+                epoch_pipeline._first_dataset._plan.execute()._owned_by_consumer
+            )
+        else:
+            blocks_owned_by_consumer = (
+                epoch_pipeline._peek()._plan.execute()._owned_by_consumer
+            )


could you add a comment in code for why we need to do this?

jianoaix

The pipelined_ingestion_1500_gb has been consistently passing: https://buildkite.com/ray-project/release-tests-pr/builds?branch=jianoaix%3Aiteratorgcblocks
Will wait the CI to pass and then merge.

jianoaix · 2023-04-05T22:14:29Z

python/ray/data/_internal/dataset_iterator/dataset_iterator_impl.py

@@ -30,7 +30,7 @@ def _to_block_iterator(
        ds = self._base_dataset
        block_iterator, stats, executor = ds._plan.execute_to_iterator()
        ds._current_executor = executor
-        return block_iterator, stats
+        return block_iterator, stats, False


jianoaix · 2023-04-05T22:22:58Z

python/ray/data/_internal/dataset_iterator/pipelined_dataset_iterator.py

+        if epoch_pipeline._first_dataset is not None:
+            blocks_owned_by_consumer = (
+                epoch_pipeline._first_dataset._plan.execute()._owned_by_consumer
+            )
+        else:
+            blocks_owned_by_consumer = (
+                epoch_pipeline._peek()._plan.execute()._owned_by_consumer
+            )


…atorgcblocks

jianoaix · 2023-04-06T01:55:41Z

There is failure in python/ray/data/tests/test_dataset_consumption.py:: test_dataset_lineage_serialization_unsupported, but it's not relevant here.

* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add objects GC in dataset iterator * test it * more tests * fix comment * add a little more memory as it's close to the limit and may make test flaky * feedback

* Revert "[Datasets] Revert "Enable streaming executor by default (#32493)" (#33485)" This reverts commit 5c79954. * Add objects GC in dataset iterator * test it * more tests * fix comment * add a little more memory as it's close to the limit and may make test flaky * feedback

* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add objects GC in dataset iterator * test it * more tests * fix comment * add a little more memory as it's close to the limit and may make test flaky * feedback Signed-off-by: elliottower <[email protected]>

* Revert "[Datasets] Revert "Enable streaming executor by default (ray-project#32493)" (ray-project#33485)" This reverts commit 5c79954. * Add objects GC in dataset iterator * test it * more tests * fix comment * add a little more memory as it's close to the limit and may make test flaky * feedback Signed-off-by: Jack He <[email protected]>

jianoaix added 8 commits March 22, 2023 20:33

Revert "[Datasets] Revert "Enable streaming executor by default (ray-…

925a247

…project#32493)" (ray-project#33485)" This reverts commit 5c79954.

Merge branch 'master' of https://github.com/ray-project/ray

b33ae23

Merge branch 'master' of https://github.com/ray-project/ray

4ef5d35

Merge branch 'master' of https://github.com/ray-project/ray

e6dcd6e

Merge branch 'master' of https://github.com/ray-project/ray

482e9dc

Merge branch 'master' of https://github.com/ray-project/ray

3e2d393

Merge branch 'master' of https://github.com/ray-project/ray

cb0840c

Add objects GC in dataset iterator

1696afb

jianoaix requested review from ericl, scv119, clarkzinzow, jjyao and c21 as code owners April 3, 2023 23:39

test it

cec613c

jianoaix commented Apr 4, 2023

View reviewed changes

jianoaix changed the title ~~[WIP] Add objects GC in dataset iterator~~ Add objects GC in dataset iterator Apr 4, 2023

jianoaix added 2 commits April 4, 2023 16:36

more tests

bfc5f39

fix comment

ae6a6b8

jianoaix assigned ericl and c21 Apr 4, 2023

ericl approved these changes Apr 5, 2023

View reviewed changes

jianoaix added 3 commits April 5, 2023 19:35

Merge branch 'master' of https://github.com/ray-project/ray into iter…

abb9551

…atorgcblocks

add a little more memory as it's close to the limit and may make test…

017f14f

… flaky

Merge branch 'master' of https://github.com/ray-project/ray into iter…

b6cb319

…atorgcblocks

c21 approved these changes Apr 5, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 5, 2023

feedback

30eb6f5

jianoaix commented Apr 5, 2023

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into iter…

5445241

…atorgcblocks

jianoaix merged commit 1999c9d into ray-project:master Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add objects GC in dataset iterator #34030

Add objects GC in dataset iterator #34030

jianoaix commented Apr 3, 2023 •

edited

Loading

jianoaix left a comment

jianoaix Apr 4, 2023

ericl commented Apr 4, 2023

c21 Apr 5, 2023

jianoaix Apr 5, 2023

c21 Apr 5, 2023

jianoaix Apr 5, 2023

jianoaix left a comment

jianoaix Apr 5, 2023

jianoaix Apr 5, 2023

jianoaix commented Apr 6, 2023

Add objects GC in dataset iterator #34030

Add objects GC in dataset iterator #34030

Conversation

jianoaix commented Apr 3, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

jianoaix left a comment

Choose a reason for hiding this comment

jianoaix Apr 4, 2023

Choose a reason for hiding this comment

ericl commented Apr 4, 2023

c21 Apr 5, 2023

Choose a reason for hiding this comment

jianoaix Apr 5, 2023

Choose a reason for hiding this comment

c21 Apr 5, 2023

Choose a reason for hiding this comment

jianoaix Apr 5, 2023

Choose a reason for hiding this comment

jianoaix left a comment

Choose a reason for hiding this comment

jianoaix Apr 5, 2023

Choose a reason for hiding this comment

jianoaix Apr 5, 2023

Choose a reason for hiding this comment

jianoaix commented Apr 6, 2023

jianoaix commented Apr 3, 2023 •

edited

Loading