Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store workunits as a DAG rather than a tree #14856

Merged
merged 3 commits into from
Mar 28, 2022

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Mar 20, 2022

As described in #14680: due to memoization, workunits are most accurately represented as a DAG, rather than as a tree.

This PR changes the storage of workunits from a tree to a DAG by giving workunits multiple parents. But it does not yet actually add multiple parents to a workunit, which will be accomplished in a followup change. This portion is worth landing separately because it eases the fix for #14867, while the actual addition of multiple parents is not necessary for that fix.

@stuhood stuhood force-pushed the stuhood/workunit-dag branch 2 times, most recently from 96c13dc to 144ee5c Compare March 23, 2022 22:55
@stuhood stuhood added the category:internal CI, fixes for not-yet-released features, etc. label Mar 23, 2022
@thejcannon
Copy link
Member

I'm gonna keep an eye on this one.
I think my ideal end-state for the workunits is to have enough data to accurately show what's happening under-the-hood in browser like https://ui.perfetto.dev/ or using the older Chrome Trace JSON.

That way users can capture all the info, and then use existing tools to explore (and they provide filtering, highlighting, etc...)

@stuhood stuhood changed the title [internal] Store workunits as a DAG rather than a tree Store workunits as a DAG rather than a tree Mar 28, 2022
@stuhood stuhood marked this pull request as ready for review March 28, 2022 19:32
@stuhood
Copy link
Member Author

stuhood commented Mar 28, 2022

Commits are useful to review independently.

Although this would seem like it might negatively impact performance, benchmarking shows it as neutral to positive (possibly because heavy_hitters becomes more efficient in how it filters the parents to render).

src/python/pants/init/BUILD Outdated Show resolved Hide resolved
}
dict_entries.push((
externs::store_utf8(py, "parent_ids"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh. Fun!
Is there any order to the parent IDs? E.g. does the first ID have any semantic meaning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be implicitly, since I believe that the order of addition of parents will be preserved, but I hadn't thought about trying to make any guarantees there. What did you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking about how previously the parent was the first parent to trigger this rule, and wondering if that was still the case with the first parent ID. It's likely not relevant enough to make a strong guarantee.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@stuhood stuhood merged commit 07d69ed into pantsbuild:main Mar 28, 2022
@stuhood stuhood deleted the stuhood/workunit-dag branch March 28, 2022 21:58
.collect::<Vec<_>>();

if has_parent_ids {
// TODO: Remove the single-valued `parent_id` field around version 2.16.0.dev0.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @asherf

Copy link
Contributor

@tdyas tdyas Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, the streaming workunit handler support has a "version" attribute that could be incremented to make it easier to account for changes such as this.

@stuhood stuhood added this to the 2.11.x milestone Mar 28, 2022
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

Comment on lines -748 to +779
let store = WorkunitStore::new(false, Level::Debug);
let store = WorkunitStore::new(false, Level::Trace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this change: https://github.com/pantsbuild/pants/pull/14856/files#diff-cb899b178835b1e2138b163355f0e525762e4104d1240e98894172300b87ce8eL858 ... would otherwise cause workunits which are consumed by some tests to be disabled. I thought about adjusting the tests instead, but that felt like it could wait for an actual need.

stuhood added a commit to stuhood/pants that referenced this pull request Mar 29, 2022
As described in pantsbuild#14680: due to memoization, workunits are most accurately represented as a DAG, rather than as a tree.

This PR changes the _storage_ of workunits from a tree to a DAG by giving workunits multiple parents. But it does not yet actually add multiple parents to a workunit, which will be accomplished in a followup change. This portion is worth landing separately because it eases the fix for pantsbuild#14867, while the actual addition of multiple parents is not necessary for that fix.
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Mar 29, 2022
…e themselves (cherrypick of #14854, #14856, #14934) (#14942)

Fix missing `check` output by allowing disabled workunits to re-enable themselves (#14934)

#13483 broke the rendering of `EngineAwareReturnType` implementations which relied on starting a workunit at `Level::TRACE`, and then escalating its level to something visible (usually `INFO` or greater) when it completed. `check` outputs for the JVM were strongly affected (see #14867), since they relied on the fact that `FallibleClasspathEntry` escalates to `ERROR` to render compile errors.

To resolve this, we roll back a portion of #13483. Rather than not recording the workunit at all, we instead record it in the `WorkunitStore` as "disabled", which is signaled by a workunit not having any `WorkunitMetadata`. This has some of the efficiency benefits of #13483 (because we continue to skip heap allocating the metadata's fields), but allows a workunit to escalate itself from disabled to enabled as it completes by specifying a non-disabled level in `RunningWorkunit::update_metadata`.

The recording of "disabled" workunits is additionally necessary for #14680, because otherwise workunits which were not actually recorded would break the tracking of multiple parents: when adding a new parent to a workunit, you need an existing `SpanId` corresponding to the work that you are adding a parent to (or else you might accidentally depend on a parent arbitrarily far up the stack).

Fixes #14867.

[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Apr 11, 2022
…e children (#15088)

#15080 was caused by two factors:
1. #14541 made counters global via a new API, and attached them (as deprecated) to "the root workunit" (with "has no parent id" as the heuristic that something was the root workunit).
2. #14856 moved to calculating the parent(s) of a node based on a running graph of workunits (to allow #14680 to eventually add multiple parents), which meant that when nodes completed out of order, we might not have any parents for them.

Together: this meant that when workunits completed asynchronously, we might not have parents for them, and because of the deprecation, we would attach the counters to multiple workunits. A consumer which was aggregating the counters would end up with an inaccurate total.

Fixes #15080.
stuhood added a commit to stuhood/pants that referenced this pull request Apr 11, 2022
…e children (pantsbuild#15088)

1. pantsbuild#14541 made counters global via a new API, and attached them (as deprecated) to "the root workunit" (with "has no parent id" as the heuristic that something was the root workunit).
2. pantsbuild#14856 moved to calculating the parent(s) of a node based on a running graph of workunits (to allow pantsbuild#14680 to eventually add multiple parents), which meant that when nodes completed out of order, we might not have any parents for them.

Together: this meant that when workunits completed asynchronously, we might not have parents for them, and because of the deprecation, we would attach the counters to multiple workunits. A consumer which was aggregating the counters would end up with an inaccurate total.

Fixes pantsbuild#15080.
stuhood added a commit that referenced this pull request Apr 11, 2022
…e children (cherrypick of #15088) (#15103)

#15080 was caused by two factors:

1. #14541 made counters global via a new API, and attached them (as deprecated) to "the root workunit" (with "has no parent id" as the heuristic that something was the root workunit).
2. #14856 moved to calculating the parent(s) of a node based on a running graph of workunits (to allow #14680 to eventually add multiple parents), which meant that when nodes completed out of order, we might not have any parents for them.

Together: this meant that when workunits completed asynchronously, we might not have parents for them, and because of the deprecation, we would attach the counters to multiple workunits. A consumer which was aggregating the counters would end up with an inaccurate total.

Fixes #15080.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants