TrieLogPruner.preloadQueue Performance Issue #7322

siladu · 2024-07-15T06:53:52Z

First alerted when Discord user ktmkancha mentioned they upgraded to Besu 24.6.0 (which enabled bonsai-limit-trie-log-enabled by default) and TrieLogPruner was taking 45mins+ to load. It is suspected this user hadn't pruned their trie log backlog and is it not known how much trie log data they have.

A second instance of this was reported to me by @gfukushima when upgrading his node, dev-clc-bu-tk-mainnet-mainnet-archive-bkp which contained 112 GiB (47 GiB of which is SST files, 64 GiB is BlobDB files):

| Column Family                  | Keys            | Total Size  | SST Files Size  | Blob Files Size  |
|--------------------------------|-----------------|-------------|-----------------|------------------|
| TRIE_LOG_STORAGE               | 1564331         | 112 GiB     | 47 GiB          | 64 GiB           |

The time taken was ~45 mins...

{"@timestamp":"2024-07-11T23:45:46,224","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loading first 30000 trie logs from database...","throwable":""}
{"@timestamp":"2024-07-12T00:32:30,543","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loaded 30000 trie logs from database","throwable":""}

Subsequent restarts of this node took ~2 mins to load...

{"@timestamp":"2024-07-13T03:16:19,846","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loading first 30000 trie logs from database...","throwable":""}
{"@timestamp":"2024-07-13T03:18:20,768","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loaded 30000 trie logs from database","throwable":""}

Note, that the first restart pruned ~30_000 trie logs so the total to load reduced by ~2% (from 1,564,331 keys to 1,534,331).

I did some further testing of another canary, prd-elc-besu-lighthouse-mainnet-nightly-bonsai-snap which contained 70 GiB of trie logs (majority of which is BlobDB files)...

|--------------------------------|-----------------|-------------|-----------------|------------------|
| Column Family                  | Keys            | Total Size  | SST Files Size  | Blob Files Size  |
|--------------------------------|-----------------|-------------|-----------------|------------------|
| TRIE_LOG_STORAGE               | 954612          | 70 GiB      | 42 MiB          | 70 GiB           |

This yielded loading times of only 20 seconds...

"@timestamp":"2024-07-12T03:36:29,189","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loading first 30000 trie logs from database...","throwable":""}
{"@timestamp":"2024-07-12T03:36:47,824","level":"INFO","thread":"main","class":"TrieLogPruner","message":"Loaded 30000 trie logs from database","throwable":""}

The default loading window size is 30_000. This was based on some testing which showed this to take between 4-7 seconds for a node with 37 GiB of TRIE_LOG_STORAGE (which predates the storage of trie logs as BlobDB so it would have been all SST files): #6026 (comment)

Since the tests were only done on 37 GiB of data, it seems plausible that the performance degrades in a non-linear fashion, potentially quite rapidly if the size of the database is large.

The difference between the canaries could be explained by the amount of SST vs BlobDB files. A large number of SST files would be present on older nodes since the BlobDB was only enabled in 24.3.0 and applies from time of upgrade onwards, it is not retrospectively applied to old SST files.

An interesting observation is that for gfukushima's node, dev-clc-bu-tk-mainnet-mainnet-archive-bkp, loading time dropped from 45 mins to 2 mins, with only a difference of 30_000 trie logs. This perhaps suggests there is something else going on beyond simply the size of the TRIE_LOG_STORAGE column family.

Another potentially significant factor is that both the user and gfukushima's nodes' besu version was being upgraded when the 45mins loading event occurred. In gfukushima's case, from 24.3.0 -> 24.6.0.

The text was updated successfully, but these errors were encountered:

siladu · 2024-07-15T06:57:28Z

I took some JFR recordings of both canary nodes, running JFR only during the loading window (for < 10 seconds)...
7322-jfr.tar.gz

dev-clc-bu-tk-mainnet-mainnet-archive-bkp trie log pruner load time was ~2 mins
prd-elc-besu-lighthouse-mainnet-nightly-bonsai-snap trie log pruner was ~20 seconds

siladu · 2024-07-15T07:03:38Z

I also did some wall clock profiling with e.g.:
asprof -e wall -t -i 5ms -f dev-clc-bu-tk-mainnet-mainnet-archive-bkp_10secs.html -d 10 237462

A 10-second profile during the loading window of both nodes, and a 60 second profile during the 2 min window of dev-clc-bu-tk-mainnet-mainnet-archive-bkp...
7322-async-wallclock.tar.gz

I suspect the problem lies in the forEach that processes the results, rather than the database loading itself: https://github.com/hyperledger/besu/blob/main/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/common/trielog/TrieLogPruner.java#L77-L92
Perhaps parallelising this processing, or utilizing the RocksDB stream somehow may help here.

siladu · 2024-07-15T07:04:51Z

I have an untested stop-gap solution here, that reduces the amount we attempt to prune from 30_000 to 5000 and warns the user to prune.

#7321

siladu · 2024-07-16T06:22:22Z

Possible improvements after discussing the wall clock profiling with @ahamlat :

Disable RocksDB checksum verification as not needed for reading
Try parallel stream on the forEach, or use executor

siladu · 2024-07-16T06:23:59Z

Another possibility is if I can find a way to preload the queue asynchronously.

siladu added bug Something isn't working performance labels Jul 15, 2024

siladu self-assigned this Jul 16, 2024

This was referenced Jul 17, 2024

Make TrieLogPruner.preload async #7337

Closed

TrieLogPruner preload with 30 second timeout #7365

Merged

siladu closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrieLogPruner.preloadQueue Performance Issue #7322

TrieLogPruner.preloadQueue Performance Issue #7322

siladu commented Jul 15, 2024 •

edited

Loading

siladu commented Jul 15, 2024 •

edited

Loading

siladu commented Jul 15, 2024 •

edited

Loading

siladu commented Jul 15, 2024

siladu commented Jul 16, 2024

siladu commented Jul 16, 2024

TrieLogPruner.preloadQueue Performance Issue #7322

TrieLogPruner.preloadQueue Performance Issue #7322

Comments

siladu commented Jul 15, 2024 • edited Loading

siladu commented Jul 15, 2024 • edited Loading

siladu commented Jul 15, 2024 • edited Loading

siladu commented Jul 15, 2024

siladu commented Jul 16, 2024

siladu commented Jul 16, 2024

siladu commented Jul 15, 2024 •

edited

Loading

siladu commented Jul 15, 2024 •

edited

Loading

siladu commented Jul 15, 2024 •

edited

Loading