Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocks DB: io.vertx.core.VertxException: Thread blocked (--Xplugin-rocksdb-high-spec-enabled) #4443

Open
risasoft opened this issue Sep 26, 2022 · 22 comments

Comments

@risasoft
Copy link

My node failed to attest and needed a restart to recover using the new high spec enabled flag on v22.7.3 with bonsai trees.

=ETH,NET,WEB3,ADMIN --static-nodes-file=/secrets/static-nodes.json --Xplugin-rocksdb-high-spec-enabled

Frequency:
Happened once, stopped attesting completely. Had to restart service to recover.

Versions (Add all that apply)

besu/v22.7.3/linux-x86_64/openjdk-java-17
2022-09-26 16:59:13.153+00:00 | main | INFO  | Besu | Using jemalloc
openjdk 17.0.4 2022-07-19
OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-120.04)
OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-120.04, mixed mode, sharing)
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
Linux b1666277e36c 5.15.0-48-generic #54-Ubuntu SMP Fri Aug 26 13:26:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional Information (Add any of the following or anything else that may be relevant)

Server: 26 GB RAM, 18 cores, AMD

###LOGS


eth1_1           | 2022-09-26 16:52:45.661+00:00 | vert.x-eventloop-thread-3 | ERROR | RoutingContext | Unhandled exception in router
eth1_1           | 2022-09-26 16:52:45.662+00:00 | vert.x-eventloop-thread-3 | ERROR | RoutingContext | Unhandled exception in router
eth1_1           | 2022-09-26 16:52:56.409+00:00 | vertx-blocked-thread-checker | WARN  | BlockedThreadChecker | Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 1640884 ms, time limit is 60000 ms
eth1_1           | io.vertx.core.VertxException: Thread blocked
eth1_1           |      at app//org.rocksdb.RocksDB.get(Native Method)
eth1_1           |      at app//org.rocksdb.RocksDB.get(RocksDB.java:2084)
eth1_1           |      at app//org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage.get(RocksDBColumnarKeyValueStorage.java:209)
eth1_1           |      at app//org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage.get(RocksDBColumnarKeyValueStorage.java:64)
eth1_1           |      at app//org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter.get(SegmentedKeyValueStorageAdapter.java:50)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateKeyValueStorage.getAccountStateTrieNode(BonsaiWorldStateKeyValueStorage.java:128)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiPersistedWorldState.getAccountStateTrieNode(BonsaiPersistedWorldState.java:346)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiPersistedWorldState$$Lambda$1563/0x000000080126c548.getNode(Unknown Source)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNodeFactory.retrieve(StoredNodeFactory.java:96)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNode.load(StoredNode.java:130)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNode.accept(StoredNode.java:63)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNode.accept(StoredNode.java:64)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNode.accept(StoredNode.java:64)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredNode.accept(StoredNode.java:64)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.PutVisitor.visit(PutVisitor.java:68)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.BranchNode.accept(BranchNode.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.trie.StoredMerklePatriciaTrie.put(StoredMerklePatriciaTrie.java:142)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiPersistedWorldState.calculateRootHash(BonsaiPersistedWorldState.java:224)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiInMemoryWorldState.rootHash(BonsaiInMemoryWorldState.java:43)
eth1_1           |      at app//org.hyperledger.besu.ethereum.bonsai.BonsaiInMemoryWorldState.persist(BonsaiInMemoryWorldState.java:53)
eth1_1           |      at app//org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:220)
eth1_1           |      at app//org.hyperledger.besu.ethereum.mainnet.AbstractBlockProcessor.processBlock(AbstractBlockProcessor.java:43)
eth1_1           |      at app//org.hyperledger.besu.ethereum.mainnet.BlockProcessor.processBlock(BlockProcessor.java:80)
eth1_1           |      at app//org.hyperledger.besu.ethereum.MainnetBlockValidator.processBlock(MainnetBlockValidator.java:175)
eth1_1           |      at app//org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAnd
@risasoft
Copy link
Author

It may be unrelated, but also seeing this error in logs after restarting:


eth1_1           | 2022-09-26 17:16:15.942+00:00 | vert.x-worker-thread-9 | ERROR | MainnetTransactionProcessor | Critical Exception Processing Transaction
eth1_1           | org.hyperledger.besu.ethereum.trie.MerkleTrieException: Unable to load trie node value for hash 0x83e8c020d72ea2f47cac91c1948a380e8ca2d937b744af66e523e01911f2ea9e location 0x
eth1_1           |      at org.hyperledger.besu.ethereum.trie.StoredNode.lambda$load$0(StoredNode.java:133)
eth1_1           |      at java.base/java.util.Optional.orElseThrow(Optional.java:403)
eth1_1           |      at org.hyperledger.besu.ethereum.trie.StoredNode.load(StoredNode.java:131)
eth1_1           |      at org.hyperledger.besu.ethereum.trie.StoredNode.accept(StoredNode.java:63)
eth1_1           |      at org.hyperledger.besu.ethereum.trie.StoredMerklePatriciaTrie.get(StoredMerklePatriciaTrie.java:119)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateKeyValueStorage.getStorageValueBySlotHash(BonsaiWorldStateKeyValueStorage.java:200)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiPersistedWorldState.getStorageValueBySlotHash(BonsaiPersistedWorldState.java:381)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiLayeredWorldState.getStorageValueBySlotHash(BonsaiLayeredWorldState.java:154)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateUpdater.getStorageValueBySlotHash(BonsaiWorldStateUpdater.java:320)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateUpdater.getStorageValue(BonsaiWorldStateUpdater.java:307)
eth1_1           |      at org.hyperledger.besu.ethereum.bonsai.BonsaiAccount.getStorageValue(BonsaiAccount.java:208)
eth1_1           |      at org.hyperledger.besu.evm.worldstate.UpdateTrackingAccount.getStorageValue(UpdateTrackingAccount.java:213)
eth1_1           |      at org.hyperledger.besu.evm.worldstate.UpdateTrackingAccount.getStorageValue(UpdateTrackingAccount.java:213)
eth1_1           |      at org.hyperledger.besu.evm.worldstate.UpdateTrackingAccount.getStorageValue(UpdateTrackingAccount.java:213)
eth1_1           |      at org.hyperledger.besu.evm.worldstate.UpdateTrackingAccount.getStorageValue(UpdateTrackingAccount.java:213)
eth1_1           |      at org.hyperledger.besu.evm.worldstate.UpdateTrackingAccount.getStorageValue(UpdateTrackingAccount.java:213)

@ahamlat
Copy link
Contributor

ahamlat commented Sep 26, 2022

It seems to be related to the memory leak issue fixed in version 22.7.4 released today. Some threads may be blocked for some time by GC threads trying to free up memory.
Could you update your Besu to 22.7.4 version and let us know if you still have these issues.

@risasoft
Copy link
Author

risasoft commented Sep 26, 2022 via email

@non-fungible-nelson
Copy link
Contributor

Closing for now - please re-open if we see this behavior later on.

@Ayo91
Copy link

Ayo91 commented Oct 15, 2022

I have the issue with the version 22.7.4:

eth1_1           | 2022-10-15 22:06:24.965+00:00 | vertx-blocked-thread-checker | WARN  | BlockedThreadChecker | Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 64058 ms, time limit is 60000 ms
eth1_1           | io.vertx.core.VertxException: Thread blocked
eth1_1           | 	at [email protected]/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:118)
eth1_1           | 	at [email protected]/java.io.OutputStreamWriter.write(OutputStreamWriter.java:187)
eth1_1           | 	at [email protected]/java.io.Writer.append(Writer.java:389)
eth1_1           | 	at app//io.prometheus.client.exporter.common.TextFormat.writeEscapedLabelValue(TextFormat.java:88)
eth1_1           | 	at app//io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:42)
eth1_1           | 	at app//org.hyperledger.besu.metrics.prometheus.MetricsHttpService.lambda$metricsRequest$4(MetricsHttpService.java:205)
eth1_1           | 	at app//org.hyperledger.besu.metrics.prometheus.MetricsHttpService$$Lambda$1353/0x0000000801216c40.handle(Unknown Source)
eth1_1           | 	at app//io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
eth1_1           | 	at app//io.vertx.core.impl.ContextImpl$$Lambda$1006/0x000000080114df10.handle(Unknown Source)
eth1_1           | 	at app//io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
eth1_1           | 	at app//io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
eth1_1           | 	at app//io.vertx.core.impl.ContextImpl$$Lambda$1005/0x000000080114dce8.run(Unknown Source)
eth1_1           | 	at [email protected]/java.util.concurrent.Thread

Any idea?

@non-fungible-nelson
Copy link
Contributor

Reports in 22.7.7… reopening for now

@non-fungible-nelson
Copy link
Contributor

Observed on 22.7.6 and .7 without high spec mode enabled. Potentially different underlying cause?

@ahamlat
Copy link
Contributor

ahamlat commented Oct 25, 2022

We need to investigate, do you have any feedback from the users regarding these two versions ?
Ideally, share a heap dump to troubleshoot the root cause

@non-fungible-nelson
Copy link
Contributor

Asked @benjaminion for a thread dump.

@ahamlat
Copy link
Contributor

ahamlat commented Oct 25, 2022

I was looking for a heap dump or at least a histogram

  • Create a heap dump (The heap dump size may reach the size of the heap) :

jcmd [PID] GC.heap_dump /[path-to-heapdump]/heapdump.hprof

  • Create a histogram

jcmd [PID] GC.class_histogram > histogram.txt

@benjaminion
Copy link

This is occurring again, right after I tried to connect a dApp to Besu via Metamask. I've taken dumps as requested and will figure out a way to get them to @ahamlat.

@ahamlat
Copy link
Contributor

ahamlat commented Oct 25, 2022

Interesting, it seems to be related to eth_getLog Json RPC call. I can see 7 executing eth_getLog calls, and 5 of them are executing RocksDB.get method. Do you have more information on these eth_getLog calls ? we can proxy them if there is no personal addresses. The purpose of this is to try to reproduce and understand why these calls are blocking on RocksDB.get method.
And in your case, regarding the shared heap dump, there is no memory issue.

@benjaminion
Copy link

My best guess is that connecting metamask to this dApp with my local Besu set as the network provider is the trigger for this. It seems to be repeatable, but I am reluctant to experiment as this is my production staking node. I haven't previously noticed any issues using this Besu instance as my network provider in metamask. For now I have pointed metamask back to Infura 😞

@ahamlat
Copy link
Contributor

ahamlat commented Oct 26, 2022

Thanks Ben, I found this part of the code in their Github, so I guess they are getting logs for large ranges of blocks, and it seems that in Besu, there is no limit to this range.

@non-fungible-nelson I think it is important to investigate the performance and reliability of the eth_getLog Json RPC call. We already have some folks working on other Json RPC calls, I don't know if they are also looking into eth_getLog, maybe try to reproduce with a testnet with the same dApp and proxy the calls to get the ranges.

I also found this old issue on eth_getLog

@ahamlat
Copy link
Contributor

ahamlat commented Oct 26, 2022

I just found in the heapdump fromBlock and toBlock parameters of the eth_getLog call, it is from 0x0 (0 in decimal) to 0xf180de (15827166 in decimal) which represents the whole blocks from genesis to yesterday. There is also a parameter to filter on, but I guess is less efficient with such a number of blocks to filter.

Screenshot 2022-10-26 at 14 14 22

@non-fungible-nelson
Copy link
Contributor

non-fungible-nelson commented Oct 26, 2022

Hmm interesting @ahamlat - should we limit by default? Or another behavior. Tagging @mark-terry to look at for his RPC investigation.

@mark-terry
Copy link
Contributor

The default when fromBlock isn't specified is to search from genesis, to toBlock, which defaults to latest. #4513 doesn't change that.

#4123 suggested adding a configurable maximum search range. This should address what @ahamlat has noted if the call was executed without a fromBlock value.

It'd be interesting to know whether the Metamask issue is caused by eth_getLogs and, if yes, what parameters they're sending.

@mark-terry
Copy link
Contributor

mark-terry commented Oct 26, 2022

Added a ticket for a configurable range: #4563

@xrchz
Copy link

xrchz commented Dec 21, 2022

Now that #4563 is merged can this be closed?

@benjaminion
Copy link

benjaminion commented Dec 21, 2022

I can confirm that this no longer occurs with 22.10.4 when I visit the dApp that used to reliably trigger it. (Besu is now so slow with that site that it's unusable, but that's a separate issue to this one.) I can't say for sure the underlying issue is resolved but it looks promising.

@gooooer
Copy link

gooooer commented Jan 26, 2023

The issue still happens for me on 22.10.4. If anything, it became even worse - more spam in the logs due to this error.

@non-fungible-nelson
Copy link
Contributor

non-fungible-nelson commented Jan 26, 2023

@ahamlat - i think this warrants more investigation. Users are seeing tons of threads blocked on 22.10.3/4 without using the RPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants