Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for reusing Graph node values if their inputs haven't cha…
…nged (#6059) ### Problem As described in #4558, we currently completely delete `Node`s from the `Graph` when their inputs have changed. One concrete case where this is problematic is that all `Snapshots` in the graph end up with a dependency on the `scandir` outputs of all of their parent directories, because we need to expand symlinks recursively from the root when consuming a `Path` (in order to see whether any path component on the way down is a symlink). This means that changes anywhere above a `Snapshot` invalidate that `Snapshot`, and changes at the root of the repo invalidate _all_ `Snapshots` (although 99% of the syscalls they depend on are not invalidated, having no dependencies of their own). But this case is just one of many cases affected by the current implementation: there are many other times where we re-compute more than we should due to the current `Node` invalidation strategy. ### Solution Implement node "dirtying", as described on #4558. There are a few components to this work: * In addition to being `Entry::clear`ed (which will force a `Node` to re-run), a `Node` may be `Entry::dirty`ed. A "dirty" `Node` is eligible to be "cleaned" if its dependencies have not changed since it was dirtied. * Each `Node` records a `Generation` value that acts as proxy for "my output has changed". The `Node` locally maintains this value, and when a Node re-runs for any reason (either due to being `dirtied` or `cleared`), it compares its new output value to its old output value to determine whether to increment the `Generation`. * Each `Node` records the `Generation` values of the dependencies that it used to run, at the point when it runs. When a dirtied `Node` is deciding whether to re-run, it compares the previous generation values of its dependencies to their current dependency values: if they are equal, then the `Node` can be "cleaned": ie, its previous value can be used without re-running it. This patch also expands the testing of `Graph` to differentiate dirtying a `Node` from clearing it, and confirms that the correct `Nodes` re-run in each of those cases. ### Result Cleaning all `Nodes` involved in `./pants list ::` after touching `pants.ini` completes 6 times faster than recomputing them from scratch (56 seconds vs 336 seconds in our repository). More gains are likely possible by implementing the performance improvement(s) described on #6013. Fixes #4558 and fixes #4394.
- Loading branch information