-
-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reusing Graph node values if their inputs haven't changed #6059
Conversation
1f2363c
to
8e0e2b7
Compare
c1cafb1
to
1494abf
Compare
### Problem Overriding `__eq__` on `datatype` violates structural equality, and the assumptions that people have about `datatype` instances. Moreover, it is very error prone in the presence of #6059, which will be using `__eq__` to determine whether objects have changed. ### Solution Make it impossible to override `__eq__` accidentally on `datatype` by attaching a canary to the `__eq__` method definition on the generated class. Remove a few `__eq__` implementations that violated structural equality and were thus causing issues in #6059. ### Result `datatypes` behave as expected more frequently, and bugs are avoided. There is no noticeable impact on performance in my testing (likely these were overridden back when _every_ object ended up memoized, rather than just `Key` instances).
Now just depends on #6013: the reviewable portion begins after |
1494abf
to
23b4051
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice!
/// generation is recorded on the consuming edge, and can later used to determine whether the | ||
/// inputs to a node have changed. | ||
/// | ||
/// Unlike the RunToken (which is incremented whenever a node re-runs), the Generation is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to name these RunGeneration
and OutputGeneration
or something similarly contrasting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have very different uses, and aren't really related to one another at that level... so would rather keep the existing separation.
NotStarted { | ||
run_token: RunToken, | ||
generation: Generation, | ||
previous_result: Option<Result<N::Item, N::Error>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a shared-across-instances Arc<Mutex<Vec<Result<N::Item, N::Error>>
to get more than n-1
re-use of generations, but unclear what usage patterns are actually going to be to know which is going to be more efficient. (If people flip up and back a lot, a Vec
saves a lot of computation; if they tend to make linear changes, the Mutex
and linear scan may be more costly)
(An Arc<Mutex<HashMap<Vec<Generation>, Result<N::Item, N::Error>>>>
would come to mind, except for how we learnt about the minimum sizes of HashMap
s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Slack, I'd like to hold off on storing multiple values in this first edition. We can easily add it later if we're not seeing any unexpected memory pressure due to just holding one.
src/rust/engine/graph/src/lib.rs
Outdated
let run_token = run_token.next(); | ||
match entry_key { | ||
&EntryKey::Valid(ref n) => { | ||
let context2 = context.clone_for(entry_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've generally been using context2
to mean "clone of context
which I was required to make because of annoying move semantics"; possibly call this context_for_entry_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/rust/engine/graph/src/lib.rs
Outdated
// NB: The unwrap here avoids a clone and a comparison: see the method docs. | ||
( | ||
generation, | ||
previous_result.unwrap_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous_result.expect("A node cannot be marked clean without a previous result.")
?
@@ -152,7 +160,7 @@ impl fmt::Debug for Value { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Clone)] | |||
#[derive(Clone, Debug, Eq, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can also be Copy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw
contains a Value
with a python exception in it, so it's good to be explicit about copies.
@@ -487,6 +508,7 @@ impl<N: Node> InnerGraph<N> { | |||
self.nodes.get(node) | |||
} | |||
|
|||
// TODO: Now that we never delete Entries, we should consider making this infalliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're probably going to want to start deleting entries again at some point for memory reasons, just in a more principled/less aggressive way
src/rust/engine/graph/src/lib.rs
Outdated
run_token: RunToken, | ||
generation: Generation, | ||
start_time: Instant, | ||
waiters: Vec<oneshot::Sender<Result<(N::Item, Generation), N::Error>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this signature suggest that Error
s are always a new generation, even if they're the same? Possibly this should be a oneshot::Sender<(Result<N::Item, N::Error>, Generation)>
so that identical errors don't trigger downstream invalidations?
src/rust/engine/graph/src/lib.rs
Outdated
future::ok(()).to_boxed() | ||
} else { | ||
// The Node needs to (re-)run! | ||
let context = context2.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely don't re-use the name context
here, as it's actually different from context
in the enclosing scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/rust/engine/graph/src/lib.rs
Outdated
.map(move |res| (res, generation)) | ||
.to_boxed(); | ||
} | ||
_ => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment in here that we're falling through to the second match?
src/rust/engine/graph/src/lib.rs
Outdated
enum EntryState<N: Node> { | ||
NotStarted(RunToken), | ||
// A node that has either been explicitly cleared, or has not yet started Running for the first | ||
// time. In this state there is no need for a dirty bit because the generation is either in its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: the Generation
is not incremented when a node is cleared: only the RunToken
.
23b4051
to
73d017c
Compare
…a Node's result to a stored `previous_result`.
… Nodes ran, and having Nodes include a Context.id in their output.
… after we dirty a Node. TODO: Storing these in a Vec on the state is potentially less efficient than storing them on the edges in the graph, since that avoids a bunch of extra allocation.
… to newly computed generations.
73d017c
to
c42e376
Compare
This no longer has dependencies, and is definitely reviewable. Thanks! |
Note to self: two bugs to fix here.
|
…gerly prune edges for cleared entries, and lazily prune edges for dirtied entries.
f2f0a76
to
9d090ac
Compare
…duler already accounts for this, and it seems reasonable to not bake retry into the Graph.
Problem
As described in #4558, we currently completely delete
Node
s from theGraph
when their inputs have changed.One concrete case where this is problematic is that all
Snapshots
in the graph end up with a dependency on thescandir
outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming aPath
(in order to see whether any path component on the way down is a symlink). This means that changes anywhere above aSnapshot
invalidate thatSnapshot
, and changes at the root of the repo invalidate allSnapshots
(although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own).But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current
Node
invalidation strategy.Solution
Implement node "dirtying", as described on #4558.
There are a few components to this work:
Entry::clear
ed (which will force aNode
to re-run), aNode
may beEntry::dirty
ed. A "dirty"Node
is eligible to be "cleaned" if its dependencies have not changed since it was dirtied.Node
records aGeneration
value that acts as proxy for "my output has changed". TheNode
locally maintains this value, and when a Node re-runs for any reason (either due to beingdirtied
orcleared
), it compares its new output value to its old output value to determine whether to increment theGeneration
.Node
records theGeneration
values of the dependencies that it used to run, at the point when it runs. When a dirtiedNode
is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then theNode
can be "cleaned": ie, its previous value can be used without re-running it.This patch also expands the testing of
Graph
to differentiate dirtying aNode
from clearing it, and confirms that the correctNodes
re-run in each of those cases.Result
Cleaning all
Nodes
involved in./pants list ::
after touchingpants.ini
completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013.Fixes #4558 and fixes #4394.