-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Provide more efficient + intuitive block clearing semantics for different execution modes #24127
[Datasets] Provide more efficient + intuitive block clearing semantics for different execution modes #24127
Conversation
python/ray/data/impl/plan.py
Outdated
self, clear_input_blocks: bool = True, force_read: bool = False | ||
self, | ||
clear_input_blocks: bool = True, | ||
destroy_unrecoverable_input_blocks: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name of this argument!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at a high level. @jianoaix can you review?
3920ff9
to
4935ad3
Compare
What's |
@jjyao By fan-out, we mean more than one dataset operation applied to a single base dataset, e.g. two different map operations applied to the same base ds = ray.data.range(10).map(lambda x: x+1)
ds1 = ds.map(lambda x: 2*x)
ds2 = ds.map(lambda x: 3*x) I'll add this to the PR description! |
Actually, with fan-out, I think there is an issue with execution plan, which will be a DAG, rather than a chain. For example, continue the code snippet, we can have a fan-in: ds3 = ds1.union(ds2). Now the lineage of ds3 is a DAG. And user can continue such pattern to construct complex DAG. |
@jianoaix Agreed that an explicit DAG representation would be more ideal (that's the future plan), but what do you see as the concrete current issue with fan-out here? Fan-out is supported in eager mode via reusing the base dataset's snapshot blocks, and is supported in lazy mode via recomputation from source, so fan-out should be covered. I.e., the stage DAG for fan-out of lazy datasets is currently supported by independent chains for each dataset sink, with the chain going back to the shared source dataset.
Right now, lazy fan-in isn't supported: all fan-in operations trigger computation for lazy datasets. One thing that's currently untested is lazy fan-out after a union. Upon a quick eye-test, it looks like it might not work if one or more of the unioned datasets are created from non-lazy datasources, since recomputing from the base won't work on the dummy read tasks. |
With the lineage/execution plan being used for both eager and lazy dataset (e.g. the other PR claimed the eager dataset can also serialize the lineage), isn't this already a problem for both cases using linear data structure for DAG? |
Synced offline, opened this PR to make it explicit that lineage-based serialization of unions/zips are not supported. |
…lear everything if pipelining.
37d9297
to
7473238
Compare
7473238
to
e68a5e9
Compare
meminfo = memory_summary(info["address"], stats_only=True) | ||
# TODO(ekl) we can add this back once we clear inputs on eager exec as well. | ||
# assert "Spilled" not in meminfo, meminfo | ||
assert "Spilled" not in meminfo, meminfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test might prove to be flaky due to the memory reclaiming issues within Ray, so we may need to put this in a retry loop similar to test_memory_release_lazy_shuffle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Even 2s not enough to escape the reclaiming uncertainty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1s was a bit flaky, didn't see 2s fail while testing locally, but who knows in CI. For context, we use an "OOM grace period" of 2 seconds in core Ray, where we let reference counting + garbage collection and object spilling free up space, so I'd expect this to be ~enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few minor comments
"""Force full evaluation of the blocks of this dataset. | ||
|
||
This can be used to read all blocks into memory. By default, Datasets | ||
doesn't read blocks from the datasource until the first transform. | ||
|
||
Args: | ||
preserve_original: Whether the original unexecuted dataset should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "original dataset" for a dataset? (I think users may have same question if they read this API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I thought that when calling d2 = ds.fully_executed(preserve_original=False)
, original
referring to ds
would be obvious. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was having a chain of operations in mind, in that case it's unclear which one of them should be thought as "original" (maybe the first one? maybe each intermediate ones? or others?).
import time | ||
|
||
# TODO(Clark): Remove this sleep once we have fixed memory pressure handling. | ||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File a bug to track this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep can do!
meminfo = memory_summary(info["address"], stats_only=True) | ||
# TODO(ekl) we can add this back once we clear inputs on eager exec as well. | ||
# assert "Spilled" not in meminfo, meminfo | ||
assert "Spilled" not in meminfo, meminfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Even 2s not enough to escape the reclaiming uncertainty?
TL;DR: Don't clear for eager, clear all but non-lazy input blocks if lazy, clear everything if pipelining.
This PR provides more efficient and intuitive block clearing semantics for eager mode, lazy mode, and pipelining, while still supporting multiple operations applied to the same base dataset, i.e. fan-out. For example, two different map operations are applied to the same base
ds
in this example:If naively clear the blocks when executing the map to produce
ds1
, the map producingds2
will fail.Desired Semantics
Related issue number
Closes #18791
Checks
scripts/format.sh
to lint the changes in this PR.