[data] Streaming executor fixes #2 #32759

jianoaix · 2023-02-22T23:48:57Z

Why are these changes needed?

Due to the .dataset_format() is hardcoded to default in streaming, the numpy is treated as a block type which is not supported
Issues caused by the partial execution of ds.take() in bulk execution (v.s. in streaming this is not handled by LazyBlocklist)
Issues of object ownership handling due to incorrect resolution when converting bundle to blocklist

#32132

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: jianoaix <[email protected]>

…lestreamingexec

… LazyBlocklist

ericl · 2023-02-23T00:58:01Z

python/ray/data/tests/test_dataset.py

+        if ray.data.context.DatasetContext.get_current().use_streaming_executor:
+            # In streaming execution of ds.iter_batches(), there is no partial
+            # execution so _num_computed() in LazyBlocklist is 0.
+            assert ds._plan.execute()._num_computed() == 0


This is basically disabling this test. How about we just force it to use bulk executor instead?

This assertion still has value as it checks that streaming execution didn't run LazyBlocklist.

python/ray/data/tests/test_dataset.py

Signed-off-by: Edward Oakes <[email protected]>

Signed-off-by: elliottower <[email protected]>

jianoaix added 30 commits December 8, 2022 23:20

Fix read_tfrecords_benchmark nightly test

edc51bd

Signed-off-by: jianoaix <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray

61f4d6d

Merge branch 'master' of https://github.com/ray-project/ray

a33a943

Merge branch 'master' of https://github.com/ray-project/ray

36ebe52

Merge branch 'master' of https://github.com/ray-project/ray

ce6763e

Merge branch 'master' of https://github.com/ray-project/ray

0e2c29e

Merge branch 'master' of https://github.com/ray-project/ray

f2b6ed0

Merge branch 'master' of https://github.com/ray-project/ray

bb6c5c4

Merge branch 'master' of https://github.com/ray-project/ray

540fe79

Merge branch 'master' of https://github.com/ray-project/ray

edad7d0

Merge branch 'master' of https://github.com/ray-project/ray

60cc079

Merge branch 'master' of https://github.com/ray-project/ray

a3d3980

Merge branch 'master' of https://github.com/ray-project/ray

001579c

Merge branch 'master' of https://github.com/ray-project/ray

8aeed6c

Merge branch 'master' of https://github.com/ray-project/ray

7a9a49b

Merge branch 'master' of https://github.com/ray-project/ray

ef97167

Merge branch 'master' of https://github.com/ray-project/ray

6f0563c

Merge branch 'master' of https://github.com/ray-project/ray

bcec4d6

Merge branch 'master' of https://github.com/ray-project/ray

ddef4e5

Merge branch 'master' of https://github.com/ray-project/ray

fc9a175

Merge branch 'master' of https://github.com/ray-project/ray

f0e90b7

Merge branch 'master' of https://github.com/ray-project/ray

999d1de

Merge branch 'master' of https://github.com/ray-project/ray

d8159e3

Merge branch 'master' of https://github.com/ray-project/ray

d81cd02

Merge branch 'master' of https://github.com/ray-project/ray

bc831bb

Merge branch 'master' of https://github.com/ray-project/ray

c444395

Merge branch 'master' of https://github.com/ray-project/ray

642da6f

Merge branch 'master' of https://github.com/ray-project/ray

f713f2f

Merge branch 'master' of https://github.com/ray-project/ray

d416a73

Merge branch 'master' of https://github.com/ray-project/ray

da5acee

jianoaix added 9 commits February 16, 2023 18:28

current resource usage

e1d0df5

parquet test: num computed

94a88ee

resource limits

c670a3b

Merge branch 'master' of https://github.com/ray-project/ray into enab…

2c608f4

…lestreamingexec

merge

c5a6aa9

fix treating numpy as block format

6c82ad2

fix issues related to default dataset_format and partial execution in…

4cbcbae

… LazyBlocklist

fix incorrect ownership resolution from bundle iter to blocklist

82bc7a7

unset the streaming flag

7e31d52

jianoaix requested review from ericl, scv119, clarkzinzow, jjyao and c21 as code owners February 22, 2023 23:48

cleanup

c4b7f42

jianoaix assigned ericl, c21 and clarkzinzow Feb 22, 2023

ericl reviewed Feb 23, 2023

View reviewed changes

python/ray/data/tests/test_dataset.py Show resolved Hide resolved

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 23, 2023

ericl approved these changes Feb 23, 2023

View reviewed changes

ericl merged commit 4c6d75b into ray-project:master Feb 23, 2023

jianoaix removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 23, 2023

edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023

[data] Streaming executor fixes #2 (ray-project#32759)

23f66c4

Signed-off-by: Edward Oakes <[email protected]>

peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023

[data] Streaming executor fixes #2 (ray-project#32759)

d54006d

elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023

[data] Streaming executor fixes ray-project#2 (ray-project#32759)

ee96d00

Signed-off-by: elliottower <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] Streaming executor fixes #2 #32759

[data] Streaming executor fixes #2 #32759

jianoaix commented Feb 22, 2023 •

edited

Loading

ericl Feb 23, 2023

jianoaix Feb 23, 2023

[data] Streaming executor fixes #2 #32759

[data] Streaming executor fixes #2 #32759

Conversation

jianoaix commented Feb 22, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

ericl Feb 23, 2023

Choose a reason for hiding this comment

jianoaix Feb 23, 2023

Choose a reason for hiding this comment

jianoaix commented Feb 22, 2023 •

edited

Loading