Merkle tree cache becomes pathologically slow for deep depsets #20862

DavidANeil · 2024-01-12T00:04:04Z

Description of the bug:

67021ff (#20358) causes spawning certain actions with large numbers of inputs to increase from less than 1s to more than 20s.
They display as [Prepa] in the UI during this time.

I can work on creating a repro if necessary.

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

67021ff

The text was updated successfully, but these errors were encountered:

DavidANeil · 2024-01-12T00:05:26Z

CC: @tjgq (author), @meteorcloudy (reviewer), and @lberki (merged conflicting code since)

lberki · 2024-01-12T07:18:47Z

I'll assign this to @tjgq because he was the original author of this change, but I'm deeply interested in how chis could possibly be.

I'd like to take up your offer on a reproduction; if we have one, it's much faster to figure out if whatever fix we come up with actually fixes the issue and to gather profiles ourselves instead of asking you to (@tjgq might know off the bat what went wrong, but I sure don't)

fmeum · 2024-01-12T08:16:14Z

@bazel-io fork 7.0.1

tjgq · 2024-01-12T13:42:57Z

The only theory I've got is that delaying the flattening somehow moves it into the critical path, whereas before it had a chance to run in parallel with something else. Once a repro is available I can do some profiling and find out.

meteorcloudy · 2024-01-16T09:46:01Z

I'm making this a soft release blocker for 7.0.1, which means it won't block 7.0.1 if other things clear out.

DavidANeil · 2024-01-16T18:44:45Z

My first attempts at a repro were unsuccessful. Does anyone know of an "easy" way to get Bazel to output the shape of a NestedSet so that I can just reproduce that shape from our main repo for the repro case?

lberki · 2024-01-16T18:56:54Z

If you can plumb it to an output group, the structure of NamedSetOfFiles in the build event protocol in OutputGroup in the TargetComplete event corresponds exactly to the structure of the nested set. Dunno if it counts as "easy".

lberki · 2024-01-16T19:05:58Z

If you can't reproduce it, you can try uploading a profile taken by --experimental_command_profile (it'll dump a file called $(bazel info output_base)/profile.jfr). That's better than nothing, but considerably worse for us than a clean repro.

Relates to bazelbuild/bazel#20862

DavidANeil · 2024-01-16T21:59:04Z

Thanks.
I have published a reproduction repo here: https://github.com/DavidANeil/repro-bazel-nested-set

It seems that this repro never finishes without experimental_remote_merkle_tree_cache_size=0. With that flag it takes about 80s on my machine, so it is possible that this is related to #18686.

lberki · 2024-01-17T09:38:13Z

Thanks! From cursory glance, it looks like you indeed replicated the problematic nested set structure verbatim :)

@tjgq @meteorcloudy do you think it's worth waiting with Bazel 7.0.1 until we know the root cause?

meteorcloudy · 2024-01-17T10:26:11Z

7.0.1 has been lasting for too long. I would prefer to not block 7.0.1 anytime longer.

lberki · 2024-01-17T10:28:57Z

I agree. It's been too long indeed.

tjgq · 2024-01-17T13:20:04Z

@DavidANeil Thanks for the very nice repro! I want to highlight the fact that --experimental_remote_merkle_tree_cache is required to reproduce the pathological build time, so it can (hopefully) be worked around by disabling it. (Please let me know if this is incorrect.)

One thing I'd like to know is: before Bazel 7, by how much did --experimental_remote_merkle_tree_cache speed up the end-to-end build time (I mean a real full build, not this repro nor the single isolated action it was distilled from)? I ask because, in my own experience, the Merkle tree cache is often not an optimization at all; most depset topologies aren't able to leverage it effectively.

As to why this issue was introduced by 67021ff:

Before that change, the eager depset flattening meant that, for most actions (roughly everything except C++ and Java), the depset of inputs as observed by the Merkle tree cache had a single node (!) so the only opportunity for reusing cached Merkle trees was in the presence of tree artifacts, which are specially handled by the cache; the cache was otherwise pure overhead (since approximately no two actions have the exact same set of inputs) but the single node could be computed very quickly.

After that change, the Merkle tree cache is able to traverse the original depset structure, compute subtrees from individual nodes, and merge them. Unfortunately, what that reveals is that a lot of time is spent merging a large number of very small nodes. Merging is not an efficient operation because the Merkle tree structure is at odds with the depset structure: Merkle tree nodes must correspond to points in the directory hierarchy, whereas depset nodes don't correspond to anything; they're just arbitrary sets of files. I believe the example you provided is pathological because of the large depset depth, but I can definitely see the overhead in other situations, though not by nearly as much. (In non-pathological multi-action builds the overhead might, of course, be paid off by the ability to reuse previously computed nodes; in a single-action build nothing can be reused, so the cache is always pure overhead.)

Given the above, and because deferring depset flattening is worth it for other reasons (reduced memory pressure, ability for other downstream logic to exploit the depset structure), I’m not convinced that reverting 67021ff is the right path forward.

While there might be some gains to be had in optimizing the Merkle tree merge operation, it might be unsalvageable for very deep depsets. I think we need to rethink/redesign the entire way the cache works so that it actually provides a benefit in most cases without failing catastrophically for the rest (for example, by selectively applying it only to subtrees that can be merged cheaply, such as tree artifacts and runfiles).

Another angle we can explore is optimizing the (uncached) computation of the Merkle tree so that the cache becomes unnecessary. (This is something I've suspected we can optimize for a while; I just haven't had the time to look into it properly.)

One thing I still don’t understand is why the run time explodes when the cache size is reduced; I agree that this is likely to be the same issue as observed in #18686. I didn’t have access to a good repro at that time, but now that I do, I intend to look into it further.

DavidANeil · 2024-01-17T18:27:43Z

--experimental_remote_merkle_tree_cache was created, in part, because of/for us (see #10875, filed by my coworker). It was enabled too long ago for us to still have the build time metrics from when we enabled it, but I remember it was noticeable.
Though I have started an internal discussion around disabling it now, since this change has probably made it an overall regression.

tjgq · 2024-01-17T19:37:13Z

In that case, I feel like I should ask for one further clarification: is the pathological action (that the repro was distilled from) a Starlark action, or a native (C++/Java) one?

My analysis above assumed a Starlark action because that's what I see in the repro, but judging from the discussion in #10875, it seems that the cache was originally introduced with the objective of speeding up C++ actions (which would also explain why its ineffectiveness for Starlark actions wasn't noticed at the time).

DavidANeil · 2024-01-17T19:41:11Z

It is a Starlark action using a nodejs_binary as the tool.

tjgq · 2024-01-18T13:35:46Z

Thanks for confirming.

Regarding the effect of the cache size: the depset in this repro has 7383 nodes and 953936 edges (it's a dag, not a tree). The graph search relies on the cache to avoid visiting the same node multiple times. If the cache size is smaller than 7383, some of the nodes can be visited multiple times because they fall out of the cache before a subsequent visit. As the cache size gets smaller and smaller, the number of steps approaches the number of edges, leading to a catastrophic blowup in run time.

I think the fix is to either (1) ensure the cache size is large enough to fit the entire depset during a search (trivial for a single action, less so when multiple arbitrary actions can run in parallel), or (2) keep a separate visited set for each individual search. I'd lean towards (2), but before we invest further into the cache implementation we really need some hard evidence that it benefits anyone in its current form. (I'll also comment on #18686 to that effect.)

tjgq · 2024-02-16T16:47:24Z

I'll consider this to be a duplicate of #18686, since the issue is not the depset size per se, but the fact that the cache is too small for it to fit.

DavidANeil · 2024-02-21T06:25:36Z

This one reproduces with the cache enabled with unlimited size, which is a bit different than #18686 (which was opened before the commit that caused this regression), but I agree that we can consolidate the discussion.

DavidANeil added type: bug untriaged labels Jan 12, 2024

DavidANeil assigned iancha1992 and rakshitasingh05 Jan 12, 2024

DavidANeil mentioned this issue Jan 12, 2024

Revert "Prevent nested set flattening in ActionSpawn if there are no … #20863

Closed

lberki assigned tjgq and unassigned iancha1992 and rakshitasingh05 Jan 12, 2024

lberki added P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams and removed untriaged labels Jan 12, 2024

bazel-io mentioned this issue Jan 12, 2024

[7.1.0] Performance regression caused by not flattening NestedSets #20867

Closed

tjgq added more data needed awaiting-user-response Awaiting a response from the author labels Jan 15, 2024

meisterT assigned DavidANeil Jan 16, 2024

DavidANeil added a commit to DavidANeil/repro-bazel-nested-set that referenced this issue Jan 16, 2024

Update readme

4d8270b

Relates to bazelbuild/bazel#20862

tjgq removed the more data needed label Jan 17, 2024

tjgq removed the awaiting-user-response Awaiting a response from the author label Jan 17, 2024

tjgq changed the title ~~Performance regression caused by not flattening NestedSets~~ Merkle tree cache becomes pathologically slow for deep depsets Jan 17, 2024

tjgq unassigned DavidANeil Jan 17, 2024

tjgq mentioned this issue Jan 18, 2024

Insufficient remote Merkle tree size causes slow builds #18686

Open

tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2024

cjhopman mentioned this issue Aug 7, 2024

Check performance of tset artifactvalues cache facebook/buck2#735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merkle tree cache becomes pathologically slow for deep depsets #20862

Merkle tree cache becomes pathologically slow for deep depsets #20862

DavidANeil commented Jan 12, 2024 •

edited

Loading

DavidANeil commented Jan 12, 2024

lberki commented Jan 12, 2024

fmeum commented Jan 12, 2024

tjgq commented Jan 12, 2024

meteorcloudy commented Jan 16, 2024

DavidANeil commented Jan 16, 2024

lberki commented Jan 16, 2024

lberki commented Jan 16, 2024

DavidANeil commented Jan 16, 2024

lberki commented Jan 17, 2024

meteorcloudy commented Jan 17, 2024

lberki commented Jan 17, 2024

tjgq commented Jan 17, 2024 •

edited

Loading

DavidANeil commented Jan 17, 2024

tjgq commented Jan 17, 2024

DavidANeil commented Jan 17, 2024

tjgq commented Jan 18, 2024

tjgq commented Feb 16, 2024

DavidANeil commented Feb 21, 2024

Merkle tree cache becomes pathologically slow for deep depsets #20862

Merkle tree cache becomes pathologically slow for deep depsets #20862

Comments

DavidANeil commented Jan 12, 2024 • edited Loading

Description of the bug:

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

DavidANeil commented Jan 12, 2024

lberki commented Jan 12, 2024

fmeum commented Jan 12, 2024

tjgq commented Jan 12, 2024

meteorcloudy commented Jan 16, 2024

DavidANeil commented Jan 16, 2024

lberki commented Jan 16, 2024

lberki commented Jan 16, 2024

DavidANeil commented Jan 16, 2024

lberki commented Jan 17, 2024

meteorcloudy commented Jan 17, 2024

lberki commented Jan 17, 2024

tjgq commented Jan 17, 2024 • edited Loading

DavidANeil commented Jan 17, 2024

tjgq commented Jan 17, 2024

DavidANeil commented Jan 17, 2024

tjgq commented Jan 18, 2024

tjgq commented Feb 16, 2024

DavidANeil commented Feb 21, 2024

DavidANeil commented Jan 12, 2024 •

edited

Loading

tjgq commented Jan 17, 2024 •

edited

Loading