-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merkle tree cache becomes pathologically slow for deep depsets #20862
Comments
CC: @tjgq (author), @meteorcloudy (reviewer), and @lberki (merged conflicting code since) |
I'll assign this to @tjgq because he was the original author of this change, but I'm deeply interested in how chis could possibly be. I'd like to take up your offer on a reproduction; if we have one, it's much faster to figure out if whatever fix we come up with actually fixes the issue and to gather profiles ourselves instead of asking you to (@tjgq might know off the bat what went wrong, but I sure don't) |
@bazel-io fork 7.0.1 |
The only theory I've got is that delaying the flattening somehow moves it into the critical path, whereas before it had a chance to run in parallel with something else. Once a repro is available I can do some profiling and find out. |
I'm making this a soft release blocker for 7.0.1, which means it won't block 7.0.1 if other things clear out. |
My first attempts at a repro were unsuccessful. Does anyone know of an "easy" way to get Bazel to output the shape of a NestedSet so that I can just reproduce that shape from our main repo for the repro case? |
If you can plumb it to an output group, the structure of |
If you can't reproduce it, you can try uploading a profile taken by |
Relates to bazelbuild/bazel#20862
Thanks. It seems that this repro never finishes without |
Thanks! From cursory glance, it looks like you indeed replicated the problematic nested set structure verbatim :) @tjgq @meteorcloudy do you think it's worth waiting with Bazel 7.0.1 until we know the root cause? |
7.0.1 has been lasting for too long. I would prefer to not block 7.0.1 anytime longer. |
I agree. It's been too long indeed. |
@DavidANeil Thanks for the very nice repro! I want to highlight the fact that One thing I'd like to know is: before Bazel 7, by how much did As to why this issue was introduced by 67021ff: Before that change, the eager depset flattening meant that, for most actions (roughly everything except C++ and Java), the depset of inputs as observed by the Merkle tree cache had a single node (!) so the only opportunity for reusing cached Merkle trees was in the presence of tree artifacts, which are specially handled by the cache; the cache was otherwise pure overhead (since approximately no two actions have the exact same set of inputs) but the single node could be computed very quickly. After that change, the Merkle tree cache is able to traverse the original depset structure, compute subtrees from individual nodes, and merge them. Unfortunately, what that reveals is that a lot of time is spent merging a large number of very small nodes. Merging is not an efficient operation because the Merkle tree structure is at odds with the depset structure: Merkle tree nodes must correspond to points in the directory hierarchy, whereas depset nodes don't correspond to anything; they're just arbitrary sets of files. I believe the example you provided is pathological because of the large depset depth, but I can definitely see the overhead in other situations, though not by nearly as much. (In non-pathological multi-action builds the overhead might, of course, be paid off by the ability to reuse previously computed nodes; in a single-action build nothing can be reused, so the cache is always pure overhead.) Given the above, and because deferring depset flattening is worth it for other reasons (reduced memory pressure, ability for other downstream logic to exploit the depset structure), I’m not convinced that reverting 67021ff is the right path forward. While there might be some gains to be had in optimizing the Merkle tree merge operation, it might be unsalvageable for very deep depsets. I think we need to rethink/redesign the entire way the cache works so that it actually provides a benefit in most cases without failing catastrophically for the rest (for example, by selectively applying it only to subtrees that can be merged cheaply, such as tree artifacts and runfiles). Another angle we can explore is optimizing the (uncached) computation of the Merkle tree so that the cache becomes unnecessary. (This is something I've suspected we can optimize for a while; I just haven't had the time to look into it properly.) One thing I still don’t understand is why the run time explodes when the cache size is reduced; I agree that this is likely to be the same issue as observed in #18686. I didn’t have access to a good repro at that time, but now that I do, I intend to look into it further. |
|
In that case, I feel like I should ask for one further clarification: is the pathological action (that the repro was distilled from) a Starlark action, or a native (C++/Java) one? My analysis above assumed a Starlark action because that's what I see in the repro, but judging from the discussion in #10875, it seems that the cache was originally introduced with the objective of speeding up C++ actions (which would also explain why its ineffectiveness for Starlark actions wasn't noticed at the time). |
It is a Starlark action using a |
Thanks for confirming. Regarding the effect of the cache size: the depset in this repro has 7383 nodes and 953936 edges (it's a dag, not a tree). The graph search relies on the cache to avoid visiting the same node multiple times. If the cache size is smaller than 7383, some of the nodes can be visited multiple times because they fall out of the cache before a subsequent visit. As the cache size gets smaller and smaller, the number of steps approaches the number of edges, leading to a catastrophic blowup in run time. I think the fix is to either (1) ensure the cache size is large enough to fit the entire depset during a search (trivial for a single action, less so when multiple arbitrary actions can run in parallel), or (2) keep a separate visited set for each individual search. I'd lean towards (2), but before we invest further into the cache implementation we really need some hard evidence that it benefits anyone in its current form. (I'll also comment on #18686 to that effect.) |
I'll consider this to be a duplicate of #18686, since the issue is not the depset size per se, but the fact that the cache is too small for it to fit. |
This one reproduces with the cache enabled with unlimited size, which is a bit different than #18686 (which was opened before the commit that caused this regression), but I agree that we can consolidate the discussion. |
Description of the bug:
67021ff (#20358) causes spawning certain actions with large numbers of inputs to increase from less than 1s to more than 20s.
They display as
[Prepa]
in the UI during this time.I can work on creating a repro if necessary.
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
67021ff
The text was updated successfully, but these errors were encountered: