Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nimbus spontanously crashes with "database disk image is malformed" #6425

Open
marmarek opened this issue Jul 15, 2024 · 8 comments
Open

nimbus spontanously crashes with "database disk image is malformed" #6425

marmarek opened this issue Jul 15, 2024 · 8 comments

Comments

@marmarek
Copy link

Describe the bug

After about a month of uptime, Nimbus beacon node crashed and refuses to start anymore complaining database is malformed. This happened on two separate hosts about 1h apart.

To Reproduce
Steps to reproduce the behavior:

  1. Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31
  2. Branch/commit used: one instance was 24.5.1 (running for a long time before), the other one was 24.6.0 (first start after update)
  3. Commands being executed: nothing on the beacon node, but it was shortly after restarting validator client (a separate process, not sharing datadir)
  4. Relevant log lines:
crash message from 24.5.1

[2024-07-15 00:10:38] [2895575.688327] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.170+00:00 Database checkpointed                      topics="beacnde" dur=5s30ms452us652ns
[2024-07-15 00:10:38] [2895575.688629] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.170+00:00 Slot end                                   topics="beacnde" slot=9514850 nextActionWait= nextAttestationSlot=9514851 nextProposalSlot=-1 syncCommitteeDuties=none head=d2399b5f:9514850
[2024-07-15 00:10:38] [2895575.692065] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.174+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=4s102ms91us48ns hinterval=700ms
[2024-07-15 00:10:38] [2895575.706602] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.188+00:00 Slot start                                 topics="beacnde" head=d2399b5f:9514850 delay=3s188ms688us570ns finalized=297337:f5b6ecaa peers=159 slot=9514851 sync=synced epoch=297339
[2024-07-15 00:10:42] [2895579.923160] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-15 00:10:42] [2895579.923291] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE
[2024-07-15 00:10:42] [2895579.923363] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-15 00:10:42] [2895579.923435] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-15 00:10:42] [2895579.923508] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.923579] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.923649] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-15 00:10:42] [2895579.923720] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.923789] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(593) _ZN10storeBlock55storeBlock
[2024-07-15 00:10:42] [2895579.923854] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/block_clearance.nim(269) _ZN22addHeadBlockWithParent22addHeadBlockWithParentE3refIN17block_pools_types27ChainDAGRefcolonObjectType_EE3varIN16signatures_batch13BatchVerifierEEN5deneb17SignedBeaconBlockE3refIN9block_dag24BlockRefcolonObjectType_EE4bool4procI3refIN9block_dag24BlockRefcolonObjectType_EEN5deneb24TrustedSignedBeaconBlockE3refIN17block_pools_types24EpochRefcolonObjectType_EEN7helpers19FinalityCheckpointsEE.constprop.0
[2024-07-15 00:10:42] [2895579.923950] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(1800) _ZN14blockchain_dag11updateStateE3refIN17block_pools_types27ChainDAGRefcolonObjectType_EE3varIN5forks23ForkedHashedBeaconStateEEN8block_id11BlockSlotIdE4bool3varIN4base10StateCacheEE
[2024-07-15 00:10:42] [2895579.924049] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(726) _ZN14blockchain_dag8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN7presets13RuntimeConfigE7MDigestI6staticI3intEEN9constants4SlotE3varIN5forks23ForkedHashedBeaconStateEE4procIE
[2024-07-15 00:10:42] [2895579.924127] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1303) _ZN15beacon_chain_db8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN5forks13ConsensusForkE7MDigestI6staticI3intEE3varIN5forks23ForkedHashedBeaconStateEE4procIE
[2024-07-15 00:10:42] [2895579.924205] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1291) _ZN8getState8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE3varIN5deneb11BeaconStateEE4procIE
[2024-07-15 00:10:42] [2895579.924276] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1208) _ZN29getStateOnlyMutableValidators29getStateOnlyMutableValidatorsE9openArrayIN4base23ImmutableValidatorData2EE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN5deneb11BeaconStateEE4procIE
[2024-07-15 00:10:42] [2895579.924352] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(743) _ZN8getSZSSZ8getSZSSZE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN25beacon_chain_db_immutable37DenebBeaconStateNoImmutableValidatorsEE
[2024-07-15 00:10:42] [2895579.924430] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0
[2024-07-15 00:10:42] [2895579.924506] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(376) _ZN17raiseResultDefect17raiseResultDefectE6string6string
[2024-07-15 00:10:42] [2895579.924579] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-15 00:10:42] [2895579.924657] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-15 00:10:42] [2895579.924727] nimbus_beacon_node[816]: [[reraised from:
[2024-07-15 00:10:42] [2895579.924800] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-15 00:10:42] [2895579.924868] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE
[2024-07-15 00:10:42] [2895579.924930] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-15 00:10:42] [2895579.924992] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-15 00:10:42] [2895579.925072] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.925135] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.925196] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-15 00:10:42] [2895579.925260] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-15 00:10:42] [2895579.925321] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(416) _ZN10storeBlock55storeBlock
[2024-07-15 00:10:42] [2895579.925391] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-15 00:10:42] [2895579.925453] nimbus_beacon_node[816]: ]]
[2024-07-15 00:10:42] [2895579.925529] nimbus_beacon_node[816]: [[reraised from:
[2024-07-15 00:10:42] [2895579.925591] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-15 00:10:42] [2895579.925652] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE
[2024-07-15 00:10:43] [2895580.536214] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-15 00:10:43] [2895580.536521] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-15 00:10:43] [2895580.536686] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:43] [2895580.536873] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-15 00:10:43] [2895580.537089] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-15 00:10:43] [2895580.537240] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-15 00:10:43] [2895580.537577] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(416) _ZN10storeBlock55storeBlock
[2024-07-15 00:10:43] [2895580.537750] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-15 00:10:43] [2895580.537943] nimbus_beacon_node[816]: ]]
[2024-07-15 00:10:43] [2895580.538129] nimbus_beacon_node[816]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect]
[2024-07-15 00:10:43] [2895581.018401] systemd[1]: nimbus_beacon_node.service: Main process exited, code=exited, status=1/FAILURE
[2024-07-15 00:10:43] [2895581.018822] systemd[1]: nimbus_beacon_node.service: Failed with result 'exit-code'.
[2024-07-15 00:10:43] [2895581.019117] systemd[1]: nimbus_beacon_node.service: Consumed 1w 16h 56min 45.200s CPU time.
[2024-07-15 00:10:43] [2895581.324030] systemd[1]: nimbus_beacon_node.service: Scheduled restart job, restart counter is at 1.
[2024-07-15 00:10:43] [2895581.324302] systemd[1]: Stopped nimbus_beacon_node.service - Nimbus Beacon Node (Ethereum consensus client).
[2024-07-15 00:10:43] [2895581.324681] systemd[1]: nimbus_beacon_node.service: Consumed 1w 16h 56min 45.200s CPU time.

logs from 24.6.0

[2024-07-14 23:07:48] [   23.735591] nimbus_beacon_node[1115]: INF 2024-07-14 23:07:48.265+00:00 Loading block DAG from database            topics="beacnde" path=/home/nimbus/shared_mainnet_0/db
[   25.489865] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-14 23:07:50] [   25.490017] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-14 23:07:50] [   25.490132] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.490227] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-14 23:07:50] [   25.490352] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextE
EN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE
[2024-07-14 23:07:50] [   25.490473] nimbus_beacon_node[1115]: _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE(371) /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim
[2024-07-14 23:07:50] [   25.490597] nimbus_beacon_node[1115]: _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE(754) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.490725] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(233) _ZN18nimbus_beacon_node12loadChainDagEN4conf14BeaconNodeConfEN7presets13RuntimeConfigE3refIN15beacon_chain_db29B
eaconChainDBcolonObjectType_EEN11beacon_node8EventBusE3refIN17validator_monitor16ValidatorMonitorEE3OptI7MDigestI6staticI3intEEE
[2024-07-14 23:07:50] [   25.490829] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(1104) _ZN4init4initE8typeDescI3refIN17block_pools_types27ChainDAGRefcolonObjectType_EEEN7presets13
RuntimeConfigE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE3refIN17validator_monitor16ValidatorMonitorEE3setIN6extras10UpdateFlagEE6string4procIN5forks30ForkedTrustedSignedBeaconBlockEE4procIN17block_pools_types20HeadChangeInfoO
bjectEE4procIN17block_pools_types15ReorgInfoObjectEE4procI3refIN17block_pools_types27ChainDAGRefcolonObjectType_EEN17block_pools_types22FinalizationInfoObjectEEN11vanity_logs10VanityLogsEN30block_pools_types_light_client21LightClientDataCo
nfigE
[2024-07-14 23:07:50] [   25.490943] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(752) _ZN29getStateOnlyMutableValidators29getStateOnlyMutableValidatorsE9openArrayIN4base23Immutabl
eValidatorData2EE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN5deneb11BeaconStateEE4procIE
[2024-07-14 23:07:50] [   25.491059] nimbus_beacon_node[1115]: Q(1303)                  _ZN15beacon_chain_db8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN5forks13ConsensusForkE7MDigestI6staticI3intEE3varIN5forks23Forke
dHashedBeaconStateEE4procIE
[2024-07-14 23:07:50] [   25.491138] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(1291) _ZN8getState8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE3varIN5deneb11B
eaconStateEE4procIE
[2024-07-14 23:07:50] [   25.491240] nimbus_beacon_node[1115]: _ZN17raiseResultDefect17raiseResultDefectE6string6string(1208) _ZN14blockchain_dag8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN7presets13RuntimeConfigE7MD
igestI6staticI3intEE5SliceIN9constants4SlotEE3varIN5forks23ForkedHashedBeaconStateEE4procIE
[2024-07-14 23:07:50] [   25.491320] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(743) _ZN8getSZSSZ8getSZSSZE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN25beacon_chain_db_immutable
37DenebBeaconStateNoImmutableValidatorsEE
[2024-07-14 23:07:50] [   25.491400] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0
[2024-07-14 23:07:50] [   25.491477] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(376) /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim
[2024-07-14 23:07:50] [   25.491558] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim
[2024-07-14 23:07:50] [   25.491636] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntry
EEE
[2024-07-14 23:07:50] [   25.491713] nimbus_beacon_node[1115]: [[reraised from:
[2024-07-14 23:07:50] [   25.491793] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-14 23:07:50] [   25.491866] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-14 23:07:50] [   25.491936] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-14 23:07:50] [   25.492030] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-14 23:07:50] [   25.492156] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextE
EN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE
[2024-07-14 23:07:50] [   25.492261] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.492330] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(896) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.492433] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntry
EEE
[2024-07-14 23:07:50] [   25.492528] nimbus_beacon_node[1115]: ]]
[2024-07-14 23:07:50] [   25.492600] nimbus_beacon_node[1115]: [[reraised from:
[2024-07-14 23:07:50] [   25.492666] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-14 23:07:50] [   25.492748] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-14 23:07:50] [   25.492812] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-14 23:07:50] [   25.492910] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-14 23:07:50] [   25.493010] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextEEN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE
[2024-07-14 23:07:50] [   25.493074] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.493132] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(896) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-14 23:07:50] [   25.493211] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-14 23:07:50] [   25.493267] nimbus_beacon_node[1115]: ]]
[2024-07-14 23:07:50] [   25.493323] nimbus_beacon_node[1115]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect]
[2024-07-14 23:07:50] [   25.494075] systemd[1]: nimbus_beacon_node.service: Main process exited, code=exited, status=1/FAILURE
[2024-07-14 23:07:50] [   25.494211] systemd[1]: nimbus_beacon_node.service: Failed with result 'exit-code'.
[2024-07-14 23:07:50] [   25.494368] systemd[1]: nimbus_beacon_node.service: Consumed 5.659s CPU time.
[2024-07-14 23:07:50] [   25.844828] systemd[1]: nimbus_beacon_node.service: Scheduled restart job, restart counter is at 1.
[2024-07-14 23:07:50] [   25.845029] systemd[1]: Stopped nimbus_beacon_node.service - Nimbus Beacon Node (Ethereum consensus client).

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
I'm not really sure if those two incidents are related, but since Nimbus was running flawlessly before, and it happened in similar time on two separate hosts (even in separate physical locations), I suspect they might be related.
Few other hosts running nimbus 24.6.0 and 24.5.1 are not affected.

@cheatfate
Copy link
Contributor

Sorry, there is no such information, so i will ask, do you have enough free space on the disk where database is stored?
Could you please also check if disk used by nimbus-eth2 database is ok?
Because as i understand first crash happened, when block being stored, and second crash happened when you tried to start beacon_node again.

@marmarek
Copy link
Author

Sorry, there is no such information, so i will ask, do you have enough free space on the disk where database is stored?

Yes, there is more than enough space in both cases (over 200GB free on both hosts).

Could you please also check if disk used by nimbus-eth2 database is ok?

No disk/filesystem error as far I can see.

Because as i understand first crash happened, when block being stored, and second crash happened when you tried to start beacon_node again.

Yes, but note those are on two separate hosts - on one it failed spontaneously, and on another didn't started anymore after update (no issues before update). I suspect it might be related to something on the network at that time, but I'm not sure...

@cheatfate
Copy link
Contributor

This error message is from SQLITE3 code nimbus-eth2 using, in first case it happens when write operation happened in second case it happened when database file is being opened.

@marmarek
Copy link
Author

sqlite3 you say, so I did this (on the one that failed during write operation):

sqlite3 -cmd 'pragma integrity_check' shared_mainnet_0/db/nbc.sqlite3

and got:

*** in database main ***  
On tree page 10662511 cell 1: invalid page number 235267435
Page 2680842 is never used
Page 2680843 is never used                  
Page 2680844 is never used
Page 2680845 is never used
Page 2680846 is never used
Page 2680847 is never used
Page 2680848 is never used
Page 2680849 is never used
Page 2680850 is never used
Page 2680851 is never used
Page 2680852 is never used
Page 2680853 is never used
Page 2680854 is never used
Page 2680855 is never used
Page 2680856 is never used
Page 2680857 is never used
Page 2680858 is never used
Page 2680859 is never used
...

I'm not sure how helpful that is...

@tersec
Copy link
Contributor

tersec commented Jul 16, 2024

We've never seen this particular error, and it appears to be something happening in the SQLite library itself, given the

On tree page 10662511 cell 1: invalid page number 235267435

Nimbus does not use SQLite3 in a fine-grained enough way to seemingly trigger such an issue unless other random memory corruption or similar issues are happening.

It's worth checking, perhaps, if the nodes and hosts in question:

  • have functioning RAM via memtest
  • there's an unusual filesystem or set of mounting options;
  • go through Docker or some other virtualization where one might see oddities around disk and filesystem access.

Should one be given to understand that

Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31

it's otherwise all defaults, bare metal, ext4, default filesystem mount options?

@marmarek
Copy link
Author

Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31

it's otherwise all defaults, bare metal, ext4, default filesystem mount options?

It is a VM (on Xen), but otherwise plain ext4, and with nosuid,nodev,discard mount options (on this partition).

I don't see anything unusual in monitoring at that time (temperature, i/o rates, RAID state etc all at normal).

BTW, yesterday two more hosts behaved in an usual but different way - OOM killer killed nimbus process, after it quickly reached over 16GB (normally sits at around 4GB). Never happened before.

@marmarek
Copy link
Author

In the meantime, the database crash happened two more times (on yet another hosts), but interestingly, after automatic service restart (via systemd) it continued normally.

Here is one of the crashes:

Details

[2024-07-24 00:32:06] [258380.167614] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-24 00:32:06] [258380.167916] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-24 00:32:06] [258380.167998] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-24 00:32:06] [258380.168094] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-24 00:32:06] [258380.168168] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.168239] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.168309] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-24 00:32:06] [258380.168369] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-24 00:32:06] [258380.168436] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(344) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-24 00:32:06] [258380.168497] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1068) _ZN15beacon_chain_db16getBlobSidecarSZE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE6uInt643varI3seqI5uInt8EE
[2024-07-24 00:32:06] [258380.168564] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0
[2024-07-24 00:32:06] [258380.168630] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(376) _ZN17raiseResultDefect17raiseResultDefectE6string6string
[2024-07-24 00:32:06] [258380.168700] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-24 00:32:06] [258380.168766] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-24 00:32:06] [258380.168835] nimbus_beacon_node[836]: [[reraised from:
[2024-07-24 00:32:06] [258380.168910] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-24 00:32:06] [258380.168972] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-24 00:32:06] [258380.169125] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-24 00:32:06] [258380.169192] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE
[2024-07-24 00:32:06] [258380.169254] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.169316] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.169378] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-24 00:32:06] [258380.169437] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-24 00:32:06] [258380.169496] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(369) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-24 00:32:06] [258380.169572] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-24 00:32:06] [258380.169632] nimbus_beacon_node[836]: ]]
[2024-07-24 00:32:06] [258380.169696] nimbus_beacon_node[836]: [[reraised from:
[2024-07-24 00:32:06] [258380.169755] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
[2024-07-24 00:32:06] [258380.169815] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE
[2024-07-24 00:32:06] [258380.169874] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE
[2024-07-24 00:32:06] [258380.169933] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgontextEE
[2024-07-24 00:32:06] [258380.169992] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.170089] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE
[2024-07-24 00:32:06] [258380.367968] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE
[2024-07-24 00:32:06] [258380.368090] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE
[2024-07-24 00:32:06] [258380.368152] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(369) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectTpe_EE
[2024-07-24 00:32:06] [258380.368216] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE
[2024-07-24 00:32:06] [258380.368319] nimbus_beacon_node[836]: ]]
[2024-07-24 00:32:06] [258380.368380] nimbus_beacon_node[836]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect]

and the startup:

[2024-07-24 00:32:07] [258381.230402] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:08.211+00:00 Launching beacon node                      topics="beacnde" version=v24.6.0-7d0078-stateofus ... (redacted)
[2024-07-24 00:32:07] [258381.350230] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:08.211+00:00 Starting metrics HTTP server               topics="beacnde" url=http://127.0.0.1:8008/metrics
[2024-07-24 00:32:07] [258381.350454] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:08.285+00:00 Threadpool started                         topics="beacnde" numThreads=16
[2024-07-24 00:32:17] [258390.890452] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:17.871+00:00 Loading block DAG from database            topics="beacnde" path=/home/nimbus/shared_mainnet_0/db
[2024-07-24 00:32:19] [258393.053200] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:20.034+00:00 Block DAG initialized                      head=87fb0013:9579758 finalizedHead=b1b64ef8:9579680 tail=8584f5a5:8522751 backfill="(0, \"00000000\")" loadDur=7ms244us7ns summariesDur=1s116ms129us938ns finalizedDur=1s39ms290us768ns frontfillDur=30ns keysDur=52us50ns
[2024-07-24 00:32:20] [258394.028505] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:21.009+00:00 Starting REST HTTP server                  topics="beacnde" url=http://127.0.0.1:5052
[2024-07-24 00:32:20] [258394.028625] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.010+00:00 Generating new networking key              topics="networking" network_public_key=...
[2024-07-24 00:32:20] [258394.030049] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.011+00:00 Discovery ENR initialized                  topics="eth p2p discv5" enrAutoUpdate=true seqNum=1 ...
[2024-07-24 00:32:20] [258394.030180] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.011+00:00 Loading slashing protection database (v2)  topics="beacnde" path=/home/nimbus/shared_mainnet_0/validators
[2024-07-24 00:32:20] [258394.052039] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.033+00:00 Using external payload builder             topics="beacnde" payloadBuilderUrl=http://localhost:18550
[2024-07-24 00:32:21] [258394.528259] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.509+00:00 Initializing fork choice                   topics="beacnde" unfinalized_blocks=78
[2024-07-24 00:32:23] [258396.741980] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:23.722+00:00 State replayed                             topics="chaindag" blocks=32 slots=32 current=481ae9a5:9579679@9579680 ancestor=481ae9a5:9579679@9579680 target=1c8cf120:9579711@9579712 ancestorStateRoot=a0bc9ae4 targetStateRoot=6e2f3a04 found=true assignDur=125us121ns replayDur=1s432ms389us425ns
[2024-07-24 00:32:24] [258397.821519] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:24.802+00:00 Fork choice initialized                    topics="beacnde" justified=299366:654331c4 finalized=299365:b1b64ef8
[2024-07-24 00:32:24] [258397.829446] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:24.811+00:00 Loading validators                         topics="beacval" validatorsDir=/home/nimbus/shared_mainnet_0/validators keystore_cache_available=true
[2024-07-24 00:32:25] [258398.795235] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.776+00:00 State replayed                             topics="chaindag" blocks=1 slots=0 current=87fb0013:9579758 ancestor=7e28565d:9576447@9576448 target=26d03af3:9576448 ancestorStateRoot=2e072c04 targetStateRoot=73fa4983 found=false assignDur=318ms217us100ns replayDur=556ms908us272ns
[2024-07-24 00:32:25] [258398.795761] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.777+00:00 Starting beacon node                       topics="beacnde" version=v24.6.0-7d0078-stateofus nimVersion=1.6.20 enr=...
[2024-07-24 00:32:25] [258398.798795] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.780+00:00 Listening to incoming network requests     topics="beacnde"
[2024-07-24 00:32:25] [258398.798883] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.780+00:00 Starting discovery node                    topics="eth p2p discv5" ...
[2024-07-24 00:32:25] [258398.799648] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.781+00:00 Starting execution layer deposit syncing   topics="elman" contract=0x00000000219ab540356cbb839cbe05303d7705fa
[2024-07-24 00:32:25] [258398.799838] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.781+00:00 Connection attempt started                 topics="elman"
[2024-07-24 00:32:25] [258398.801022] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 REST service started                       address=127.0.0.1:5052
[2024-07-24 00:32:25] [258398.801114] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 Starting light client                      topics="lightcl" trusted_block_root=none(Eth2Digest)
[2024-07-24 00:32:25] [258398.801190] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 Setting up doppelganger detection          topics="gossip_eth2" epoch=299367 broadcast_epoch=299368
[2024-07-24 00:32:25] [258399.055127] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:26.034+00:00 Scheduling first slot action               topics="beacnde" startTime=190w12h32m2s782ms634us512ns nextSlot=9579761 timeToNextSlot=9s217ms365us488ns

The database itself has about 158GB, which I assume is expected size, right?

@tersec
Copy link
Contributor

tersec commented Oct 30, 2024

Yes, that's the expected size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@marmarek @cheatfate @tersec and others