-
-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The trace for an error in a context with a lot of dependents is redundant #3695
Comments
I think that's a good idea. Then it becomes a sorting problem to present the most useful error paths. |
This is now related to #4358... because |
Looked into this a bit with @caveness , and determined that since switching to Cancelled nodes have no |
Problem #3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant. But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root. But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise. Solution Since a cancelled node is never the source of a failure, don't render them. Result The added test no longer renders a <None> entry, like: Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D) <None> Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C) Throw(An exception for B) Traceback (most recent call last): File LOCATION-INFO, in extern_invoke_runnable val = runnable(*args) File LOCATION-INFO, in nested_raise fn_raises(x) File LOCATION-INFO, in fn_raises raise Exception('An exception for {}'.format(type(x).__name__)) Exception: An exception for B Fixes #3695.
Problem pantsbuild#3695 initially described a solution to a problem experienced in the original "slow-failing" implementation of the engine: if there were multiple paths between a root and a failure, we would render all of them, which was hugely redundant. But since that ticket was opened, we switched to a fast-failing implementation via the switch to Futures. When one dependency of a Node fails, the rest are effectively cancelled, since Futures are pull based. This (should) have the effect of rendering exactly one failure path per root: the first failure that is encountered while computing that root. But it was still the case that we were printing redundant/useless information: cancelled dependencies were being rendered, which in the case of things like ./pants list :: resulted in a lot of noise. Solution Since a cancelled node is never the source of a failure, don't render them. Result The added test no longer renders a <None> entry, like: Computing Select(<pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.A'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =A) Computing Task(<class 'pants_test.engine.test_engine.D'>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =D) <None> Computing Task(<function nested_raise at 0xEEEEEEEEE>, <pants_test.engine.test_engine.B object at 0xEEEEEEEEE>, =C) Throw(An exception for B) Traceback (most recent call last): File LOCATION-INFO, in extern_invoke_runnable val = runnable(*args) File LOCATION-INFO, in nested_raise fn_raises(x) File LOCATION-INFO, in fn_raises raise Exception('An exception for {}'.format(type(x).__name__)) Exception: An exception for B Fixes pantsbuild#3695.
### Problem While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). ### Solution * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). ### Result Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` # before: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s # after: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
…y root per distinct failure. Fixes pantsbuild#3695.
### Problem #4558 introduces a new state for a `Node`: it may be `dirty`, which requires that we remember the previous value of a `Node` while simultaneously computing its next value. I began implementing this state machine with our existing `future::Shared` usage, but ran into the need to either hide values in the closure of the running future, or store values atomically outside of the future's value. This complexity represents a classic case for a state machine. Additionally, we have seen that `future::Shared` has a relatively high cost in terms of complexity and memory (see #5990). ### Solution In order to make #4558 easier to implement correctly (and to reduce memory usage), implement the storage of a `Node`'s value in the `Graph` via a state machine. One of the largest complexities of `Shared` is that it needs to elect one of the waiters of the `Shared` future as the one who will actually do the work of `poll`ing the wrapped future. To avoid that complexity, we introduce usage of the `tokio` `Runtime` for _all_ `Node` executions. This allows waiters to stay very simple, and receive the result value via a `oneshot::channel`. ### Result #4558 should be much easier to implement. I'll likely wait to land this until #4558 is further along. This change reduces memory usage about 200MB further than the fix from #5990 (down to about 2.5GB for `./pants list ::` in Twitter's repo). Unfortunately, in its current form it also represents a performance regression of about 10% for that same usecase. Although I believe I have a handle on why, I'd like to defer fixing that regression until after #4558 is implemented. This also fixes #3695, as since we now fail slowly, we are able to render multiple distinct failures at once.
The
Graph.trace
for a cycle detected in a context where the failed/cyclic Node has a lot of dependents is very redundant. See @ericzundel 's response to my question on https://rbcommons.com/s/twitter/r/4087/ for an example.To repro, introduce a failure inside of a graph, and then list a target spec that covers a bunch of things that depend on it... ie: break guava in 3rdparty, and then attempt to run
./pants list ::
.Most likely what we want to do is invert the trace: rather than rendering downward from the roots, we should discover all failed nodes, and then, for each node, choose some number of relevant paths (or just one: the shortest...?) to the root to render.
The text was updated successfully, but these errors were encountered: