Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on shutdown #27237

Closed
holiman opened this issue May 10, 2023 · 3 comments · Fixed by #27238
Closed

Crash on shutdown #27237

holiman opened this issue May 10, 2023 · 3 comments · Fixed by #27238
Labels

Comments

@holiman
Copy link
Contributor

holiman commented May 10, 2023

Seen on a benchmarker during shutdown:

INFO [05-10|05:39:10.603] Writing clean trie cache to disk path=/datadir/geth/geth/triecache threads=16
INFO [05-10|05:39:10.825] Persisted the clean trie cache path=/datadir/geth/geth/triecache elapsed=221.964ms
INFO [05-10|05:39:10.825] Blockchain stopped
panic: pebble: closed
goroutine 179923979 [running]:
github.com/cockroachdb/pebble.(*DB).getInternal(0xc000600000?, {0xc04bd1a060?, 0x0?, 0xc0124e8618?}, 0x40cb8a?, 0x413304?)
github.com/cockroachdb/[email protected]/db.go:518 +0x445
github.com/cockroachdb/pebble.(*DB).Get(...)
github.com/cockroachdb/[email protected]/db.go:501
github.com/ethereum/go-ethereum/ethdb/pebble.(*Database).Get(0x0?, {0xc04bd1a060?, 0xc0124e8678?, 0x5ae62c?})
github.com/ethereum/go-ethereum/ethdb/pebble/pebble.go:253 +0x2b
github.com/ethereum/go-ethereum/core/rawdb.ReadLegacyTrieNode({0x7fd576b0b5e0, 0xc0008462d0}, {0x58, 0xca, 0xbd, 0x5c, 0x84, 0xe9, 0x19, 0x29, ...})
github.com/ethereum/go-ethereum/core/rawdb/accessors_trie.go:145 +0x59
github.com/ethereum/go-ethereum/trie.(*Database).Node(0xc0004fe4e0, {0x58, 0xca, 0xbd, 0x5c, 0x84, 0xe9, 0x19, 0x29, 0xaf, ...})
github.com/ethereum/go-ethereum/trie/database.go:222 +0x196
github.com/ethereum/go-ethereum/trie.(*hashReader).Node(0x40?, {0xd, 0xb4, 0x9b, 0x2f, 0xec, 0x5a, 0x0, 0x61, 0xf2, ...}, ...)
github.com/ethereum/go-ethereum/trie/database.go:669 +0x2f
github.com/ethereum/go-ethereum/trie.(*trieReader).node(0xc02c46a380, {0xc0479caca0, 0x3, 0x8}, {0x58, 0xca, 0xbd, 0x5c, 0x84, 0xe9, ...})
github.com/ethereum/go-ethereum/trie/trie_reader.go:76 +0xe7
github.com/ethereum/go-ethereum/trie.(*Trie).Prove(0xc01446dd60, {0xc0329fdde0, 0x20, 0xc0124e8aa8?}, 0x0, {0x7fd55bdbfb28, 0xc0236d01b0})
github.com/ethereum/go-ethereum/trie/proof.go:67 +0x6b3
github.com/ethereum/go-ethereum/core/state/snapshot.(*diskLayer).proveRange(0xc02400eab0, 0xc00f19ab90, 0xc013babda0, {0xc00b561860, 0x21, 0x30}, {0x1918bc1, 0x7}, {0xc0278007c0, 0x20, ...}, ...)
github.com/ethereum/go-ethereum/core/state/snapshot/generate.go:270 +0xe3b

So the generator is running, and we shut down the blockchain (and thus the db too). I tried modifying db get so that it closes the db before doing the Get.
With leveldb:

$  go run ./cmd/geth --datadir /tmp/ldb db get 0x0102
INFO [05-10|09:04:06.770] Maximum peer count                       ETH=50 LES=0 total=50
INFO [05-10|09:04:06.783] Set global gas cap                       cap=50,000,000
INFO [05-10|09:04:06.784] Using leveldb as the backing database 
INFO [05-10|09:04:06.784] Allocated cache and file handles         database=/tmp/ldb/geth/chaindata cache=512.00MiB handles=262,144 readonly=true
INFO [05-10|09:04:06.790] Using LevelDB as the backing database 
INFO [05-10|09:04:06.790] Opened ancient database                  database=/tmp/ldb/geth/chaindata/ancient/chain readonly=true
INFO [05-10|09:04:06.790] Get operation failed                     key=0x0102 error="leveldb: closed"
leveldb: closed
exit status 1

WIth pebble:

$  go run ./cmd/geth --datadir /tmp/foo db get 0x0102
INFO [05-10|09:03:35.661] Maximum peer count                       ETH=50 LES=0 total=50
INFO [05-10|09:03:35.671] Set global gas cap                       cap=50,000,000
INFO [05-10|09:03:35.671] Using pebble as the backing database 
INFO [05-10|09:03:35.671] Allocated cache and file handles         database=/tmp/foo/geth/chaindata cache=512.00MiB handles=262,144
INFO [05-10|09:03:35.673] Opened ancient database                  database=/tmp/foo/geth/chaindata/ancient/chain readonly=true
panic: pebble: closed

goroutine 1 [running]:
github.com/cockroachdb/pebble.(*DB).getInternal(0xc0000e27c8?, {0xc000737a40?, 0xc000154898?, 0xbb36ca?}, 0xc0000e26e0?, 0xc000154930?)
        /home/user/go/pkg/mod/github.com/cockroachdb/[email protected]/db.go:518 +0x445

So we previously were fine: doing a get on closed db returned an error, and we can handle that. Pebble throws a panic instead, which, although it cannot cause corruption of "main" database, it's obviously not great.

@holiman
Copy link
Contributor Author

holiman commented May 10, 2023

It is kind of hard to work around this.

The naive way would be to just check db.IsClosed as the first thing in func (d *Database) Get(key []byte) ([]byte, error), but that is racy, and there's no IsClosed method exposed.

The more robust approach is to fix the generate procedure to use a different shutdown-signal (other than "oh the db errors out"), and wait for that during blockchain shutdown. I actually think the existing method of "run until it errors due to closed db" is pretty elegant, and anything else we can come up with will be more complex.

The third approach would be to check if upstream would be amenable to changing the behaviour.

@holiman
Copy link
Contributor Author

holiman commented May 10, 2023

cockroachdb/pebble#998

@holiman
Copy link
Contributor Author

holiman commented May 10, 2023

It is kind of hard to work around this.

Ok, it wasn't that hard: #27238, I forgot that we already have a centralized accessor. Might have a bit of overhead though.

holiman added a commit that referenced this issue May 19, 2023
One difference between pebble and leveldb is that the latter returns error when performing Get on a closed database, the former does a panic. This may be triggered during shutdown (see #27237)

This PR changes the pebble driver so we check that the db is not closed already, for several operations. It also adds tests to the db test-suite, so the previously implicit assumption of "not panic:ing at ops on closed database" is covered by tests.
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this issue Nov 10, 2023
One difference between pebble and leveldb is that the latter returns error when performing Get on a closed database, the former does a panic. This may be triggered during shutdown (see ethereum#27237)

This PR changes the pebble driver so we check that the db is not closed already, for several operations. It also adds tests to the db test-suite, so the previously implicit assumption of "not panic:ing at ops on closed database" is covered by tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant