Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed handling transactions when using multiple sentries on BSC #3799

Closed
limhyesook opened this issue Mar 31, 2022 · 9 comments · Fixed by #6045
Closed

Failed handling transactions when using multiple sentries on BSC #3799

limhyesook opened this issue Mar 31, 2022 · 9 comments · Fixed by #6045

Comments

@limhyesook
Copy link

System information

Erigon version: 2022.99.99-dev-4e23187c

OS & Version: Ubuntu

Commit hash : 930d662

Expected behaviour

Erigon is run with multiple sentries and works without errors

Actual behaviour

When launched with remote sentries (tested with 4 sentries on remote servers) errors about handling transactions are intermittently reported in master's logs. Example:
[WARN] [03-31|06:36:02.301] [txpool.fetch] Handling incoming message msg=TRANSACTIONS_66 err="runtime error: invalid memory address or nil pointer dereference, trace: [fetch.go:206 panic.go:1047 panic.go:221 signal_unix.go:735 cursor.go:67 cursor.go:67 txn.go:611 kv_mdbx.go:1042 kv_mdbx.go:1029 kv_mdbx.go:1033 kv_mdbx.go:894 kv_mdbx.go:929 pool.go:592 fetch.go:298 types.go:391 packets.go:179 fetch.go:315 fetch.go:88 fetch.go:314 fetch.go:185 fetch.go:144 fetch.go:101 asm_amd64.s:1581], rlp: f903a2f9018e820211850165a0bc008303a01d9410ed43c718714eb63d5aa57b78b54704e256024e80b9012438ed1739000000000000000000000000000000000000000000000000d02ab486cedc0000000000000000000000000000000000000000000000000000023d11eabe65aaf800000000000000000000000000000000000000000000000000000000000000a0000000000000000000000000eecc51225befcfa56387091c80bf517391d4721b00000000000000000000000000000000000000000000000000000000624534500000000000000000000000000000000000000000000000000000000000000003000000000000000000000000e9e7cea3dedca5984780bafc599bd69add087d5600000000000000000000000055d398326f99059ff775485246999027b3197955000000000000000000000000f2572fdacf09bfae08ff7d35423870b5a8ac26b78194a0ae29ed213e2a70b30c49f4ffd25a16f442f0e69dfcbed2242249796e97fcc6a3a0786e95c72d6446b64afa43dffb42ec615628bbe8e276eed66fa36fa4873bd8e8f9020e820b7c850165a0bc008327798a94468fc54b82b590d1698251d2bce1a9f91c43f04280b901a4d36785290000000000000000000000000000000000000000000000000002cb4178000000000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000000050000000000000000000026f716b9a82891338f9ba80e2d6970fdda79d1eb0dae00000000000000000000271055d398326f99059ff775485246999027b31979550000000000000000000026f2126e06540456d60a200b28181581057a01a561a5000000000000000000002710f2572fdacf09bfae08ff7d35423870b5a8ac26b70000000000000000000026f72f82c93b5eb21f57e46734f798a582594b09a71900000000000000000000271055d398326f99059ff775485246999027b31979550000000000000000000026f7e60f8896d30031cd8aec3badd6ecb1dcb6bdf64b0000000000000000000027101977aee8ee48244ad473815920bcdfe3295c11d10000000000000000000026f70ec3033ef0193fb8922a100c35e57a2b98c175c1000000000000000000002710bb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c8193a029395ec73d650569a571338ccdca2156f7dcd9ab8280b13e424032d087ad1c55a04852341aaf7056870b9880f825ae1339dddebef15bf2b8f15a5cdc86ef57cc09"

I have tested this with different number of sentries (1 to 8 with 100 peers each) and there seems to be a correlation between number of sentries and frequency of errors. On 8 sentries I've experienced node crash as well.

Steps to reproduce the behaviour

Run remote sentries on 4-8 remote servers and run Erigon connecting to them.

Backtrace

[backtrace]
@limhyesook
Copy link
Author

I've conducted 1 hour testing with 8 sentries and no errors were registered.

@AskAlexSharov
Copy link
Collaborator

thank you

@limhyesook
Copy link
Author

After leaving the node for some time with the same setup I have experienced a new crash:

panic: interface conversion: interface {} is nil, not *simplelru.entry

goroutine 206 [running]:
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeElement(0x16d2640, 0xc002f9b648)
github.com/hashicorp/[email protected]/simplelru/lru.go:172 +0x107
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeOldest(0x15bf900)
github.com/hashicorp/[email protected]/simplelru/lru.go:165 +0x34
github.com/hashicorp/golang-lru/simplelru.(*LRU).Add(0xc002f13880, {0x1550ea0, 0xc024724340}, {0x15968c0, 0x3435870})
github.com/hashicorp/[email protected]/simplelru/lru.go:67 +0x394
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).discardLocked(0xc000700180, 0xc01e7e5630, 0xe)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1063 +0x15a
github.com/ledgerwatch/erigon-lib/txpool.promote(0xc0002c4480, 0xc0002c44c0, 0xc0002c4500, 0x3b9aca00, 0xc002f9bbe0)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1294 +0x394
github.com/ledgerwatch/erigon-lib/txpool.addTxs(0xfc79d8, {0x265b5c8, 0xc01cb269f0}, 0x486d85, {{0xc01cb34400, 0x52, 0x80}, {0xc01a57f000, 0x668, 0x800}, ...}, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:925 +0x787
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).processRemoteTxs(0xc000700180, {0x26782b8, 0xc0014a25c0})
github.com/ledgerwatch/[email protected]/txpool/pool.go:507 +0x4bc
github.com/ledgerwatch/erigon-lib/txpool.MainLoop({0x26782b8, 0xc0014a25c0}, {0x2681850, 0xc00027eb40}, {0xc001b7ed00, 0xc001a28a00}, 0xc000700180, 0xc000356840, 0xc0002c4540, 0xc00149e108, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1327 +0x565
created by github.com/ledgerwatch/erigon/eth.New
github.com/ledgerwatch/erigon/eth/backend.go:451 +0x3028

Should I open a new Issue?

@limhyesook
Copy link
Author

Once again there is a similar panic after running node for 2-3 hours.

panic: interface conversion: interface {} is nil, not *simplelru.entry

goroutine 209 [running]:
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeElement(0x16d2640, 0xc003d73758)
github.com/hashicorp/[email protected]/simplelru/lru.go:172 +0x107
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeOldest(0x15bf900)
github.com/hashicorp/[email protected]/simplelru/lru.go:165 +0x34
github.com/hashicorp/golang-lru/simplelru.(*LRU).Add(0xc002e67880, {0x1550ea0, 0xc018265160}, {0x15968c0, 0x3436878})
github.com/hashicorp/[email protected]/simplelru/lru.go:67 +0x394
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).discardLocked(0xc00017e180, 0xc016679090, 0xb)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1063 +0x15a
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).punishSpammer(0xc00017e180, 0xc006ede180)
github.com/ledgerwatch/[email protected]/txpool/pool.go:768 +0x119
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).validateTxs(0xc0014dc0d8, 0xc0000b3360, {0x265b688, 0xc01831cea0})
github.com/ledgerwatch/[email protected]/txpool/pool.go:738 +0x26f
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).processRemoteTxs(0xc00017e180, {0x2678378, 0xc0014a7d80})
github.com/ledgerwatch/[email protected]/txpool/pool.go:500 +0x306
github.com/ledgerwatch/erigon-lib/txpool.MainLoop({0x2678378, 0xc0014a7d80}, {0x2681910, 0xc00026e8c0}, {0x0, 0x0}, 0xc00017e180, 0xc0015ce480, 0xc0016164c0, 0xc0014dc120, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1327 +0x565
created by github.com/ledgerwatch/erigon/eth.New
github.com/ledgerwatch/erigon/eth/backend.go:451 +0x3028

@limhyesook
Copy link
Author

8 more hours of testing - no errors. Seems that this error is not that easy to reproduce

@AskAlexSharov
Copy link
Collaborator

don't know the reason yet

@limhyesook
Copy link
Author

One more (no pressure, just want to provide as more info for debugging as possible)

panic: interface conversion: interface {} is nil, not *simplelru.entry

goroutine 229 [running]:
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeElement(0x16d2640, 0xc001999560)
github.com/hashicorp/[email protected]/simplelru/lru.go:172 +0x107
github.com/hashicorp/golang-lru/simplelru.(*LRU).removeOldest(0x15bf900)
github.com/hashicorp/[email protected]/simplelru/lru.go:165 +0x34
github.com/hashicorp/golang-lru/simplelru.(*LRU).Add(0xc002fede60, {0x1550ea0, 0xc02249d130}, {0x15968c0, 0x34368b0})
github.com/hashicorp/[email protected]/simplelru/lru.go:67 +0x394
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).discardLocked(0xc000a62180, 0xc027f689b0, 0x12)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1063 +0x15a
github.com/ledgerwatch/erigon-lib/txpool.onSenderStateChange(0x100, 0x265b688, {0x1e092d25261f3e, 0x0, 0x0, 0x0}, 0xc0224bf218, 0xc00555ac00, 0xc01bb9bc00, 0x0, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1236 +0x177
github.com/ledgerwatch/erigon-lib/txpool.addTxs(0xfd1fb2, {0x265b688, 0xc0224bf218}, 0xa34700, {{0xc0227e0000, 0x93, 0x100}, {0xc00239c380, 0xb7c, 0xd80}, ...}, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:921 +0x6ac
github.com/ledgerwatch/erigon-lib/txpool.(*TxPool).processRemoteTxs(0xc000a62180, {0x2678378, 0xc0014d6080})
github.com/ledgerwatch/[email protected]/txpool/pool.go:507 +0x4bc
github.com/ledgerwatch/erigon-lib/txpool.MainLoop({0x2678378, 0xc0014d6080}, {0x2681910, 0xc00146c140}, {0x1ad4ccb667b585ab, 0x13748a941b4512d3}, 0xc000a62180, 0xc0015a0ae0, 0xc00159a4c0, 0xc00159e168, ...)
github.com/ledgerwatch/[email protected]/txpool/pool.go:1327 +0x565
created by github.com/ledgerwatch/erigon/eth.New
github.com/ledgerwatch/erigon/eth/backend.go:451 +0x3028

@AskAlexSharov
Copy link
Collaborator

@limhyesook do you run external pool or inside erigon?

@limhyesook
Copy link
Author

Everything is inside except rpcdaemon and sentries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants