Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Bulk executor initial implementation #30903

Merged
merged 152 commits into from
Jan 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
9e4451e
copy prototype
ericl Dec 5, 2022
8924a89
cleanup
ericl Dec 5, 2022
44578ce
wip compatibility
ericl Dec 5, 2022
e0a346a
add basic wiring
ericl Dec 5, 2022
22504c0
works
ericl Dec 6, 2022
0b26570
fix up split handling
ericl Dec 6, 2022
3f0e0cb
refactor legacy compat package
ericl Dec 6, 2022
eaa46b0
todo move operators fully
ericl Dec 7, 2022
3162f44
reorganize opeators
ericl Dec 7, 2022
2136170
stub out actors impl
ericl Dec 7, 2022
38ae324
improve legacy integration
ericl Dec 7, 2022
9f24555
add str
ericl Dec 7, 2022
f33c772
add own block propagation
ericl Dec 7, 2022
bf5288f
rename to tasks
ericl Dec 7, 2022
f5efe2c
add basic stats
ericl Dec 13, 2022
e5790dc
implement alltoall
ericl Dec 13, 2022
5c7e490
revert format change
ericl Dec 14, 2022
d6bee3c
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 14, 2022
1eb5519
fixme
ericl Dec 14, 2022
ec66fd0
fix
ericl Dec 14, 2022
5aa082b
fix own propagation
ericl Dec 14, 2022
c8f8c79
add debug mem metrics
ericl Dec 14, 2022
5b2f7ec
fix block clearing for datasetpipeline
ericl Dec 15, 2022
00025f5
add config
ericl Dec 15, 2022
f8570ee
misc test fixes
ericl Dec 15, 2022
edba805
fix split memory free
ericl Dec 15, 2022
683f4a1
workaround segfault
ericl Dec 16, 2022
a9c0bdf
wip
ericl Dec 16, 2022
db332e1
wip towards stats passing
ericl Dec 16, 2022
07c0c69
improve logs
ericl Dec 16, 2022
e78e800
use bulk wait for performance
ericl Dec 16, 2022
0fa159e
add ctrl-c support
ericl Dec 16, 2022
2a9e0a5
rename
ericl Dec 16, 2022
7573f99
rename node to op
ericl Dec 16, 2022
c00f867
wip
ericl Dec 18, 2022
0ae94c7
Support block bundling
jianoaix Dec 19, 2022
8c29abf
Block bundling: polish
jianoaix Dec 19, 2022
e6da60e
add mem tracing module
ericl Dec 19, 2022
a4faedc
flag protect tracing
ericl Dec 19, 2022
6b9105a
Merge branch 'bulk-executor' of github.com:ericl/ray into bulk-executor
ericl Dec 20, 2022
a676598
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 20, 2022
3427f90
add interfaces
ericl Dec 20, 2022
224854c
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Dec 20, 2022
ebf21e3
remove meta
ericl Dec 20, 2022
b5074e0
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Dec 21, 2022
51ceb36
add docstrings
ericl Dec 21, 2022
4dad697
Merge branch 'interfaces-1' into bulk-executor
ericl Dec 21, 2022
d0769e3
remove input metadata
ericl Dec 21, 2022
9695a27
remove hanging
ericl Dec 21, 2022
aab996e
fix gc failures
ericl Dec 21, 2022
92f8b61
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 21, 2022
af3308b
fix size est tests
ericl Dec 21, 2022
a983e52
fix stats uuid handling
ericl Dec 21, 2022
da1ab6a
Merge branch 'bulk-executor' of github.com:ericl/ray into bulkexecuto…
jianoaix Dec 21, 2022
01d4b2c
Block bundling: add more tests
jianoaix Dec 21, 2022
808c82f
fix handling of randomize block stage ownership
ericl Dec 22, 2022
fa7e3ec
Merge branch 'bulk-executor' of github.com:ericl/ray into bulk-executor
ericl Dec 22, 2022
44fb0f7
handle zero
ericl Dec 22, 2022
964aaeb
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 22, 2022
4567839
wip
ericl Dec 22, 2022
23eea81
completion guarantee comments
ericl Dec 22, 2022
beba2a6
add assert too
ericl Dec 22, 2022
887e4b3
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 23, 2022
f83edd9
add operators
ericl Dec 23, 2022
bc8f342
add test execution
ericl Dec 23, 2022
50b456a
wip
ericl Dec 23, 2022
bdfef58
wip
ericl Dec 23, 2022
d810c61
add test todos
ericl Dec 23, 2022
91b2848
add data stats todo
ericl Dec 23, 2022
9e706ad
Merge remote-tracking branch 'upstream/master' into operators
ericl Dec 23, 2022
d3e370a
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Dec 23, 2022
d4f514a
add basic tests
ericl Dec 23, 2022
cde12ec
add note
ericl Dec 23, 2022
b95a356
typo
ericl Jan 3, 2023
1f15cd9
Merge branch 'operators' into bulk-executor
ericl Jan 3, 2023
6129d66
fix tests
ericl Jan 3, 2023
ea62366
optimize function arg passing
ericl Jan 3, 2023
ab4e5d7
Merge remote-tracking branch 'upstream/master' into operators
ericl Jan 3, 2023
510e748
Merge branch 'operators' into bulk-executor
ericl Jan 3, 2023
a6e8a18
comments
ericl Jan 3, 2023
7eec78a
Merge branch 'operators' into bulk-executor
ericl Jan 3, 2023
bc021c9
comments 2
ericl Jan 3, 2023
cd0a902
Merge branch 'operators' into bulk-executor
ericl Jan 3, 2023
718a32e
cleanup hierarchy
ericl Jan 4, 2023
f3d8a50
or zero
ericl Jan 4, 2023
3228401
Apply suggestions from code review
ericl Jan 4, 2023
d1a98d6
Merge branch 'operators' of github.com:ericl/ray into operators
ericl Jan 4, 2023
1a8dc02
min rows per bundle
ericl Jan 4, 2023
203720e
fix tests
ericl Jan 4, 2023
f9850b4
Merge branch 'operators' into bulk-executor
ericl Jan 4, 2023
e1d2e89
last comment
ericl Jan 4, 2023
690cb1d
Merge branch 'operators' into bulk-executor
ericl Jan 4, 2023
bf4ef1d
add min rows
ericl Jan 4, 2023
0807aa9
Merge branch 'operators' into bulk-executor
ericl Jan 4, 2023
f7cd953
fix tests
ericl Jan 4, 2023
1314dfb
Merge branch 'operators' into bulk-executor
ericl Jan 4, 2023
4d94aed
Merge remote-tracking branch 'upstream/master' into operators
ericl Jan 4, 2023
30c4486
Merge branch 'operators' into bulk-executor
ericl Jan 4, 2023
1c83066
add exec impl
ericl Jan 4, 2023
7cbfea4
lint
ericl Jan 4, 2023
f55101d
fix tests
ericl Jan 4, 2023
3410619
lint
ericl Jan 4, 2023
9f57758
check extra metrics
ericl Jan 4, 2023
f20fdc6
pull in optimization
ericl Jan 4, 2023
0830f1e
add all to all test
ericl Jan 5, 2023
a607a3a
Merge branch 'part-4' into bulk-executor
ericl Jan 5, 2023
e3a6dd7
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Jan 5, 2023
19f0664
legacy compat
ericl Jan 5, 2023
be8b0d5
add split
ericl Jan 5, 2023
341acb9
off by default
ericl Jan 5, 2023
8dacdef
sanity test
ericl Jan 5, 2023
6ea1cb8
update
ericl Jan 5, 2023
9ac348b
Merge branch 'legacy-compat' into bulk-executor
ericl Jan 5, 2023
597614a
wip port the old streaming prototype
ericl Jan 6, 2023
dbc2ebd
fix comments
ericl Jan 6, 2023
458552f
add assert
ericl Jan 9, 2023
53eb19d
Merge branch 'legacy-compat' into bulk-executor
ericl Jan 9, 2023
a16e2dc
Apply suggestions from code review
ericl Jan 10, 2023
64849be
fix type
ericl Jan 10, 2023
88cfd35
Merge remote-tracking branch 'upstream/master' into legacy-compat
ericl Jan 10, 2023
965c0de
fix test
ericl Jan 10, 2023
1dfe172
revert
ericl Jan 10, 2023
7d8c2c9
Merge branch 'legacy-compat' into bulk-executor
ericl Jan 10, 2023
25d0bb2
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Jan 10, 2023
d789c9c
flip on
ericl Jan 10, 2023
6cbbe8a
remove
ericl Jan 10, 2023
a119fc4
Merge remote-tracking branch 'upstream/master' into bulk-executor
ericl Jan 10, 2023
d4d2d0a
try removing buffer change
ericl Jan 10, 2023
4462055
remove streaming executor
ericl Jan 10, 2023
723241c
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 13, 2023
64a1453
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 17, 2023
2d554c5
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 18, 2023
13852ec
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 18, 2023
12c9eef
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 18, 2023
aef0530
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 18, 2023
4242d72
extra metric
jianoaix Jan 18, 2023
6540381
add __init__.py to operator packkage
jianoaix Jan 19, 2023
c446c58
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 19, 2023
7c28ae3
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 19, 2023
3467714
ray client block splitting
jianoaix Jan 19, 2023
a3bfe5b
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 20, 2023
99e54da
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 20, 2023
49691ae
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 23, 2023
9358e1b
fix
jianoaix Jan 23, 2023
199fe0e
fix stats
jianoaix Jan 23, 2023
c6e6a63
fix actorpool requiring num_cpus
jianoaix Jan 23, 2023
06b1ad7
fix bazel test
jianoaix Jan 24, 2023
3867061
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 24, 2023
7097973
Merge branch 'master' of https://github.com/ray-project/ray into bulk…
jianoaix Jan 24, 2023
0b74edf
minimize dif
jianoaix Jan 24, 2023
a9a66ab
less diff
jianoaix Jan 24, 2023
a265437
disable incremental take test
jianoaix Jan 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/ray/data/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@

# Whether to use the new executor backend.
DEFAULT_NEW_EXECUTION_BACKEND = bool(
int(os.environ.get("RAY_DATASET_NEW_EXECUTION_BACKEND", "0"))
int(os.environ.get("RAY_DATASET_NEW_EXECUTION_BACKEND", "1"))
)

# Whether to use the streaming executor. This only has an effect if the new execution
Expand Down
14 changes: 5 additions & 9 deletions python/ray/data/tests/test_object_gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,26 +126,22 @@ def test_iter_batches_no_spilling_upon_shuffle(shutdown_only):

def test_pipeline_splitting_has_no_spilling(shutdown_only):
# The object store is about 800MiB.
ctx = ray.init(num_cpus=1, object_store_memory=800e6)
ctx = ray.init(num_cpus=1, object_store_memory=1200e6)
# The size of dataset is 50000*(80*80*4)*8B, about 10GiB, 50MiB/block.
ds = ray.data.range_tensor(50000, shape=(80, 80, 4), parallelism=200)
ds = ray.data.range_tensor(5000, shape=(80, 80, 4), parallelism=20)

# 2 blocks/window.
p = ds.window(bytes_per_window=100 * 1024 * 1024).repeat()
p = ds.window(bytes_per_window=100 * 1024 * 1024).repeat(2)
p1, p2 = p.split(2)

@ray.remote
def consume(p):
for batch in p.iter_batches(batch_size=None):
pass
print(p.stats())

tasks = [consume.remote(p1), consume.remote(p2)]
try:
# Run it for 20 seconds.
ray.get(tasks, timeout=20)
except Exception:
for t in tasks:
ray.cancel(t, force=True)
ray.get(tasks)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianoaix FYI I found these tests can sometimes spuriously pass with the timeout, if the first iteration never starts. I think it's safer to force a fixed number of iterations (e.g., 2 here), which would be enough to trigger spilling if GC wasn't working correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

meminfo = memory_summary(ctx.address_info["address"], stats_only=True)
assert "Spilled" not in meminfo, meminfo

Expand Down
6 changes: 6 additions & 0 deletions python/ray/data/tests/test_pipeline_incremental_take.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
import time
import pytest
import ray
from ray.data.context import DatasetContext

from ray.tests.conftest import * # noqa


def test_incremental_take(shutdown_only):
# TODO(https://github.com/ray-project/ray/issues/31145): re-enable
# after the segfault bug is fixed.
if DatasetContext.get_current().new_execution_backend:
return

ray.init(num_cpus=2)

# Can read incrementally even if future results are delayed.
Expand Down