Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: failed to extend chain: leveldb/table: corruption on data-block (pos=369440): checksum mismatch, want=0x1ebba72e got=0x9a72ab32 [file=283423.ldb] #941

Closed
WesleyLiu0717 opened this issue Feb 13, 2020 · 7 comments
Assignees

Comments

@WesleyLiu0717
Copy link

System information

Geth version:
Geth
Version: 1.8.18-stable
Git Commit: 20c95e5
Quorum Version: 2.4.0
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1337
Go Version: go1.11.13
Operating System: linux

OS & Version: Linux ubuntu18.04

Expected behaviour

The node can continuously confirm transaction under 300 tx/s input rate.
If the node crashes, it can be repaired after we restart the node.

Actual behaviour

After 11.5 hours, the node crashed.
panic: failed to extend chain: leveldb/table: corruption on data-block (pos=369440): checksum mismatch, want=0x1ebba72e got=0x9a72ab32 [file=283423.ldb]

INFO [02-12|23:30:19.918] Imported new chain segment               blocks=1 txs=183 mgas=47.965 elapsed=58.772ms     mgasps=816.109  number=120558 hash=d617cf…a07633 cache=267.86mB
INFO [02-12|23:30:19.918] QUORUM-CHECKPOINT                        name=BLOCK-CREATED block=d617cfc0bf2c0bc06d297d2addf2ef71336c9fcf1ea02f8d60e910215aa07633
INFO [02-12|23:30:19.918] persisted the latest applied index       index=120562
INFO [02-12|23:30:20.051] QUORUM-CHECKPOINT                        name=TX-ACCEPTED   tx=0x7e502c8e6e0792f7449721027ebd922ba23f2c96aefd1f3bb6ac3f937a7783c2
... 
INFO [02-12|23:30:20.053] QUORUM-CHECKPOINT                        name=TX-ACCEPTED   tx=0xf7adf094a9a6b0b470c2f931e7a4f72b80a80099ba06cd719bc29fddceefd8e9
INFO [02-12|23:30:20.068] Not minting a new block since there are no pending transactions 
panic: failed to extend chain: leveldb/table: corruption on data-block (pos=369440): checksum mismatch, want=0x1ebba72e got=0x9a72ab32 [file=283423.ldb]

goroutine 277 [running]:
github.com/ethereum/go-ethereum/raft.(*ProtocolManager).applyNewChainHead(0xc000209500, 0xc0a00e4360, 0xfaa5c91be3eda87b)
	/home/travis/gopath/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/raft/handler.go:1047 +0x98b
github.com/ethereum/go-ethereum/raft.(*ProtocolManager).eventLoop(0xc000209500)
	/home/travis/gopath/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/raft/handler.go:894 +0x9f8
created by github.com/ethereum/go-ethereum/raft.(*ProtocolManager).startRaft
	/home/travis/gopath/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/raft/handler.go:596 +0x969

Restart the node and crash again
CRIT [02-13|10:05:15.666] Failed to store last header's hash err="leveldb/table: corruption on data-block (pos=369440): checksum mismatch, want=0x1ebba72e got=0x9a72ab32 [file=283423.ldb]"

INFO [02-13|10:04:51.761] Maximum peer count                       ETH=25 LES=0 total=25
INFO [02-13|10:04:51.762] Starting peer-to-peer node               instance=Geth/v1.8.18-stable-20c95e5d(quorum-v2.4.0)/linux-amd64/go1.11.13
INFO [02-13|10:04:51.762] Allocated cache and file handles         database=*** cache=768 handles=524288
INFO [02-13|10:04:54.106] Initialised chain configuration          config="{ChainID: 10 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 IsQuorum: true Constantinople: <nil> TransactionSizeLimit: 64 MaxCodeSize: 24 Engine: unknown}"
WARN [02-13|10:04:54.106] Ethash used in full fake mode 
INFO [02-13|10:04:54.106] Initialising Ethereum protocol           versions="[63 62]" network=10
WARN [02-13|10:04:54.107] Head state missing, repairing chain      number=120558 hash=d617cf…a07633
INFO [02-13|10:05:15.666] Rewound blockchain to past state         number=0      hash=811f00…5de05d
CRIT [02-13|10:05:15.666] Failed to store last header's hash       err="leveldb/table: corruption on data-block (pos=369440): checksum mismatch, want=0x1ebba72e got=0x9a72ab32 [file=283423.ldb]"

Steps to reproduce the behaviour

Setting: 3 nodes using raft consensus
I use JRPC to send 100 transactions to each node (total 300 tx/s).
This problem occurs two times in my leader change experiment.

P.S. The other two nodes continuously generate blocks.

@vsmk98
Copy link
Contributor

vsmk98 commented Feb 13, 2020

Hi @WesleyLiu0717 - wanted couple of clarifications:

  • You mentioned that you are performing leader change experiment. How are you executing this? is it by killing the current leader node in the networ
  • Can you please share the geth start command?
  • The two nodes which continue generate blocks, I am assuming they are in sycn. Please confirm
    Thanks

@WesleyLiu0717
Copy link
Author

Hi @vsmk98

  1. The experiment is the same as Quorum-Raft fall into leader change loop in stress test  #927. It is a stress test, sorry for the mistake.
PRIVATE_CONFIG=ignore nohup ../geth --datadir node3/data --txpool.accountslots 1800 --txpool.globalslots  100000 --txpool.accountqueue 1800 --txpool.globalqueue 100000 --miner.gastarget 700000000 --miner.gaslimit 700000000 --nodiscover --verbosity 3 --networkid 10 --raft --raftport 50003 --raftblocktime 250 --rpc --rpcaddr 0.0.0.0 --emitcheckpoints --port 30303 --ws --wsorigins "*" --wsaddr 0.0.0.0 >> log 2>&1 &
  1. Yes, the other two nodes are in sync now.

@vsmk98
Copy link
Contributor

vsmk98 commented Feb 13, 2020

Thanks @WesleyLiu0717 . Analysing the same. Will revert back.

@amalrajmani
Copy link
Contributor

Hi @WesleyLiu0717
This issue has been reported in ethereum before and it is caused by one of the following reasons. Can you check if you have any of these issues in your environment?

  • Disk out of space
  • Disk has Bad sector
  • Disk corruption
  • Machine/process crash

You can refer to the list of issues below that were raised in ethereum for the same problem:
ethereum/go-ethereum#14555
ethereum/go-ethereum#2568
ethereum/go-ethereum#16785
Thanks
Amal

@WesleyLiu0717
Copy link
Author

Hi @amalrajmani
I'm not sure how to check these items, I only know that the disk didn't out of space.
We do the experiment in google cloud platform now.
If the same issue occurs again in google cloud platform, I'll update here.
Thanks!

@jpmsam
Copy link
Contributor

jpmsam commented Apr 2, 2020

Please re-open with the details if the issue wasn't solved.

@jpmsam jpmsam closed this as completed Apr 2, 2020
@jayboy-mabushi
Copy link

@WesleyLiu0717 did you resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants