Skip to content

Commit

Permalink
Add support for reusing Graph node values if their inputs haven't cha…
Browse files Browse the repository at this point in the history
…nged (#6059)

### Problem

As described in #4558, we currently completely delete `Node`s from the `Graph` when their inputs have changed.

One concrete case where this is problematic is that all `Snapshots` in the graph end up with a dependency on the `scandir` outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming a `Path` (in order to see whether any path component on the way down is a symlink). This means that changes anywhere above a `Snapshot` invalidate that `Snapshot`, and changes at the root of the repo invalidate _all_ `Snapshots` (although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own).

But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current `Node` invalidation strategy.

### Solution

Implement node "dirtying", as described on #4558.

There are a few components to this work:

* In addition to being `Entry::clear`ed (which will force a `Node` to re-run), a `Node` may be `Entry::dirty`ed. A "dirty" `Node` is eligible to be "cleaned" if its dependencies have not changed since it was dirtied.
* Each `Node` records a `Generation` value that acts as proxy for "my output has changed". The `Node` locally maintains this value, and when a Node re-runs for any reason (either due to being `dirtied` or `cleared`), it compares its new output value to its old output value to determine whether to increment the `Generation`.
* Each `Node` records the `Generation` values of the dependencies that it used to run, at the point when it runs. When a dirtied `Node` is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then the `Node` can be "cleaned": ie, its previous value can be used without re-running it.

This patch also expands the testing of `Graph` to differentiate dirtying a `Node` from clearing it, and confirms that the correct `Nodes` re-run in each of those cases.

### Result

Cleaning all `Nodes` involved in `./pants list ::` after touching `pants.ini` completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013. 

Fixes #4558 and fixes #4394.
  • Loading branch information
Stu Hood authored Jul 6, 2018
1 parent f783edc commit 14a3f94
Show file tree
Hide file tree
Showing 11 changed files with 832 additions and 138 deletions.
1 change: 1 addition & 0 deletions src/python/pants/engine/native.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@
uint64_t graph_len(Scheduler*);
uint64_t graph_invalidate(Scheduler*, BufferBuffer);
uint64_t graph_invalidate_all_paths(Scheduler*);
PyResult graph_visualize(Scheduler*, Session*, char*);
void graph_trace(Scheduler*, ExecutionRequest*, char*);
Expand Down
13 changes: 12 additions & 1 deletion src/python/pants/engine/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,11 @@ def invalidate_files(self, direct_filenames):
logger.info('invalidated %d nodes for: %s', invalidated, filenames)
return invalidated

def invalidate_all_files(self):
invalidated = self._native.lib.graph_invalidate_all_paths(self._scheduler)
logger.info('invalidated all %d nodes', invalidated)
return invalidated

def graph_len(self):
return self._native.lib.graph_len(self._scheduler)

Expand Down Expand Up @@ -453,11 +458,17 @@ def execution_request(self, products, subjects):
return self.execution_request_literal(roots)

def invalidate_files(self, direct_filenames):
"""Calls `Graph.invalidate_files()` against an internal product Graph instance."""
"""Invalidates the given filenames in an internal product Graph instance."""
invalidated = self._scheduler.invalidate_files(direct_filenames)
self._maybe_visualize()
return invalidated

def invalidate_all_files(self):
"""Invalidates all filenames in an internal product Graph instance."""
invalidated = self._scheduler.invalidate_all_files()
self._maybe_visualize()
return invalidated

def node_count(self):
return self._scheduler.graph_len()

Expand Down
29 changes: 29 additions & 0 deletions src/rust/engine/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions src/rust/engine/graph/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ fnv = "1.0.5"
futures = "^0.1.16"
hashing = { path = "../hashing" }
petgraph = "0.4.5"

[dev-dependencies]
rand = "0.5.3"
Loading

0 comments on commit 14a3f94

Please sign in to comment.