Discovery: throttle node DB commits (#3581) #3656

battlmonstr · 2022-03-07T19:24:09Z

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the commits to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

AskAlexSharov

Please, Create one more PR from stable branch with cherry-pick of this pr. Thank you

battlmonstr · 2022-03-08T10:46:05Z

👌 some tests fail, I'm gonna check them first.

battlmonstr · 2022-03-08T14:03:28Z

The current solution doesn't work well, because there can be only one pending RW transaction globally. Having a separate open transaction for pings makes other writes impossible. For example, updating the local node info blocks forever.

I'd either need to channel all node DB access to this event loop, or make an alternative solution. I'm considering trying something like MDBX_SAFE_NOSYNC instead.

AskAlexSharov · 2022-03-09T01:42:50Z

@battlmonstr MDBX_SAFE_NOSYNC doesn't conceptually change anything - still can't open parallel write txs.

FYI, TxPool works next way:

updates are happening in RAM (in btree, aka cache)
1 goroutine with infinity loop periodically takes lock on btree and flushing it to db and cleaning btree. https://github.com/ledgerwatch/erigon-lib/blob/a2cbecf3730e369022806f5ae451c710826b4473/txpool/pool.go#L1333
read requests - do read from cache with fallback to read tx on db.

Can you explain one more time - what is current problem with having cache in nodedb? (kick me in Discord if you need)
"I'd either need to channel all node DB access to this event loop" - see also bbold.Batch

AskAlexSharov

rejecting until some discussion

go.mod

battlmonstr · 2022-03-09T13:31:01Z

@AskAlexSharov the parallel writes are not needed anymore, I have removed that. It was a problem for the previous solution attempt with a goroutine, but I think that SafeNoSync alone solves the issue by and large.

The issue was reported in discord a while ago that the initial node sync performance is not great when using a HDD.
Alexey did some testing and felt that one of the reason was frequent commits to the nodedb.
Although it is 2 different database files, his hypothesis was that the sequential writes to the chain db are affected by the random writes to the nodedb. At the very least, it was unfortunate that nodedb did so many fsync commits while the info there is not needed to be 100% durable.

SafeNoSync

AFAIK, mdbx commit has 2 steps:

write data: after this the data is accepted by the OS, and will be written to disk eventually
flush OS buffers to disk (let's call it fsync): after this the data is fully persisted.

By default, mdbx tries to be 100% durable, and each Commit() does both write and fsync.
nodedb uses db.Update() calls, which are causing a Commit().
After this change, Update() calls cause fsync at most once in 2 seconds. mdbx has an internal timestamp of the last fsync, and it checks if the time since then is greater than the SyncPeriod.

P.S. If there are no new updates for a long time we risk to lose 2 seconds worth of updates. Here this doesn't seem like a big deal.

AskAlexSharov

agree with everything

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events are causing bursty DB commits (100 per minute). This optimization throttles the disk writes to happen at most once in a few seconds, because this info doesn't need to be persisted immediately. This helps on HDD drives.

* Update to erigon-lib stable * Discovery: throttle node DB commits (#3581) (#3656) UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events are causing bursty DB commits (100 per minute). This optimization throttles the disk writes to happen at most once in a few seconds, because this info doesn't need to be persisted immediately. This helps on HDD drives. * Update erigon-lib * Discovery: split node records to a sepatate DB table (#3581) (#3667) Problem: QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries. In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries, and very few true "node record" entries. After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries. There's a 1/16 chance of hitting a "node record" entry. It means finding just about 10 nodes of 114 total on average from 150 attempts. Solution: Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits. * Discovery: add Context to Listen. (#3577) Add explicit Context to ListenV4 and ListenV5. This makes it possible to stop listening by an external signal. * Discovery: refactor public key to node ID conversions. (#3634) Encode and hash logic was duplicated in multiple places. * Move encoding to p2p/discover/v4wire * Move hashing to p2p/enode/idscheme * Change newRandomLookup to create a proper random key on a curve. * Discovery: speed up lookup tests (#3677) * Update erigon-lib Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: battlmonstr <[email protected]>

* block by timestamp for stable (erigontech#3617) * Add timings of forward stages to logs (erigontech#3621) * save * save * deleted bor and starknet from doc (erigontech#3627) * add nosqlite tag (erigontech#3653) * add nosqlite tag * save * save (erigontech#3665) * save (erigontech#3663) * linter up (erigontech#3672) (erigontech#3673) * linter up (erigontech#3672) * save * save * Revert node DB cache (erigontech#3581) (erigontech#3674) (erigontech#3675) Revert "Prevent frequent commits to the node DB in sentries (erigontech#2505)". This reverts commit 65a9a26. * [stable] Fixes to discovery nodedb (erigontech#3691) * Update to erigon-lib stable * Discovery: throttle node DB commits (erigontech#3581) (erigontech#3656) UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events are causing bursty DB commits (100 per minute). This optimization throttles the disk writes to happen at most once in a few seconds, because this info doesn't need to be persisted immediately. This helps on HDD drives. * Update erigon-lib * Discovery: split node records to a sepatate DB table (erigontech#3581) (erigontech#3667) Problem: QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries. In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries, and very few true "node record" entries. After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries. There's a 1/16 chance of hitting a "node record" entry. It means finding just about 10 nodes of 114 total on average from 150 attempts. Solution: Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits. * Discovery: add Context to Listen. (erigontech#3577) Add explicit Context to ListenV4 and ListenV5. This makes it possible to stop listening by an external signal. * Discovery: refactor public key to node ID conversions. (erigontech#3634) Encode and hash logic was duplicated in multiple places. * Move encoding to p2p/discover/v4wire * Move hashing to p2p/enode/idscheme * Change newRandomLookup to create a proper random key on a curve. * Discovery: speed up lookup tests (erigontech#3677) * Update erigon-lib Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: battlmonstr <[email protected]> * [stable] Fixes for state overrides in RPC (erigontech#3693) * State override support (erigontech#3628) * added stateOverride type * solved import cycle * refactoring * imported wrong package * fixed Call arguments * typo * override for traceCall * Fix eth call (erigontech#3618) * added isFake * using isFake instead of checkNonce * Revert "using isFake instead of checkNonce" This reverts commit 6a202bb. * Revert "added isFake" This reverts commit 2c48024. * only checking EOA if we are checking for Nonce Co-authored-by: Enrique Jose Avila Asapche <[email protected]> * new bootnodes (erigontech#3591) (erigontech#3695) Co-authored-by: Enrique Jose Avila Asapche <[email protected]> * Update skip analysis and preverified hashes (erigontech#3700) (erigontech#3704) Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Update version.go (erigontech#3701) * simulate future blocks: timestamp and block number incremented by 1 for calls to trace_call(Many), debug_traceCall, eth_createAccessList * expose UsedGas through trace_call and trace_callMany * expose accessList via trace_call(Many) * plumb error into trace_call(many) outer json response Co-authored-by: Enrique Jose Avila Asapche <[email protected]> Co-authored-by: Alex Sharov <[email protected]> Co-authored-by: battlmonstr <[email protected]> Co-authored-by: ledgerwatch <[email protected]> Co-authored-by: Alexey Sharp <[email protected]>

* block by timestamp for stable (erigontech#3617) * Add timings of forward stages to logs (erigontech#3621) * save * save * deleted bor and starknet from doc (erigontech#3627) * add nosqlite tag (erigontech#3653) * add nosqlite tag * save * save (erigontech#3665) * save (erigontech#3663) * linter up (erigontech#3672) (erigontech#3673) * linter up (erigontech#3672) * save * save * Revert node DB cache (erigontech#3581) (erigontech#3674) (erigontech#3675) Revert "Prevent frequent commits to the node DB in sentries (erigontech#2505)". This reverts commit 65a9a26. * [stable] Fixes to discovery nodedb (erigontech#3691) * Update to erigon-lib stable * Discovery: throttle node DB commits (erigontech#3581) (erigontech#3656) UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events are causing bursty DB commits (100 per minute). This optimization throttles the disk writes to happen at most once in a few seconds, because this info doesn't need to be persisted immediately. This helps on HDD drives. * Update erigon-lib * Discovery: split node records to a sepatate DB table (erigontech#3581) (erigontech#3667) Problem: QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries. In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries, and very few true "node record" entries. After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries. There's a 1/16 chance of hitting a "node record" entry. It means finding just about 10 nodes of 114 total on average from 150 attempts. Solution: Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits. * Discovery: add Context to Listen. (erigontech#3577) Add explicit Context to ListenV4 and ListenV5. This makes it possible to stop listening by an external signal. * Discovery: refactor public key to node ID conversions. (erigontech#3634) Encode and hash logic was duplicated in multiple places. * Move encoding to p2p/discover/v4wire * Move hashing to p2p/enode/idscheme * Change newRandomLookup to create a proper random key on a curve. * Discovery: speed up lookup tests (erigontech#3677) * Update erigon-lib Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: battlmonstr <[email protected]> * [stable] Fixes for state overrides in RPC (erigontech#3693) * State override support (erigontech#3628) * added stateOverride type * solved import cycle * refactoring * imported wrong package * fixed Call arguments * typo * override for traceCall * Fix eth call (erigontech#3618) * added isFake * using isFake instead of checkNonce * Revert "using isFake instead of checkNonce" This reverts commit 6a202bb. * Revert "added isFake" This reverts commit 2c48024. * only checking EOA if we are checking for Nonce Co-authored-by: Enrique Jose Avila Asapche <[email protected]> * new bootnodes (erigontech#3591) (erigontech#3695) Co-authored-by: Enrique Jose Avila Asapche <[email protected]> * Update skip analysis and preverified hashes (erigontech#3700) (erigontech#3704) Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Update version.go (erigontech#3701) * rpcdaemon: fix TxContext in traceBlock (erigontech#3716) Previously `txCtx` is not updated for every tx, which leads to wrong tracing results. * Mdbx: WriteMap fallback on error (erigontech#3714) * save * save * Pool cost fix (erigontech#3725) * save * save * Update to erigon-lib stable Co-authored-by: Alex Sharp <[email protected]> * mdbx v0.11.6 (erigontech#3771) * mdbx fix after v0.11.6 (erigontech#3775) * save * save * save * [stable] Event log subscription (erigontech#3773) * Logs sub (erigontech#3666) * save * Add onLogs * Fix lint * Add proper logs * Update go.mod * goimports * Add unwind * feat/rpcadaemon_logs_sub (erigontech#3751) * Fixes to subscribe logs (erigontech#3769) * Fixes to subscribe logs * Add criteria to logs subscription * Skeleton of RPC daemon event log distribution * Simplify * Send aggregated filter to Erigon * Change API * Print * Fixes * Fix topics filtering * Fill txHash and blockHash * Timing logs, fill tx index * Print * More print * Print * Asynchronous sending of log events to RPC daemon * Remove prints * Only extract logs if there are subscribers * Check empty when RPC daemon is removed Co-authored-by: Alex Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Fix up * Update to erigon-lib stable * Update to erigon-lib stable Co-authored-by: primal_concrete_sledge <[email protected]> Co-authored-by: Alex Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Update version.go (erigontech#3776) * Update Skip analysis and preverified hashes (erigontech#3777) (erigontech#3778) * Update skip analysis * Add preverified hashes for mainnet and ropsten * preverified hashes and bootnode for sepolia Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Integration: reset StageFinish also (erigontech#3783) * docker hub - fetch git tags before build erigontech#3781 * fix nil pointer in fetch.go (erigontech#3802) * Update preverified hashes and skip analysis (erigontech#3831) (erigontech#3832) * Update skip_analysis * Preverified hashes Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> * Fix 'all defaults' case for eth_estimateGas (erigontech#3790) (erigontech#3824) * Fix 'all defaults' case for eth_estimateGas * fix tests Co-authored-by: Igor Mandrigin <[email protected]> Co-authored-by: Igor Mandrigin <[email protected]> Co-authored-by: Igor Mandrigin <[email protected]> * Update version.go (erigontech#3829) * Change libmdbx submodule origin (erigontech#3894) * save * Restore testdata Co-authored-by: Alexey Sharp <[email protected]> * Update to erigon-lib stable (erigontech#3895) Co-authored-by: Alexey Sharp <[email protected]> * Update version.go (erigontech#3896) * Update skip_analysis.go (erigontech#3897) (erigontech#3898) * save (erigontech#3904) * [stable] Fixes for header download (erigontech#3911) * Rollback preverified hashes for mainnet * Not remove header * Set verified = true * Fix verified extendUp and connect * Skip already persisted links * Prevent rewriting historical headers * Not load links after highestInDb * Restore preverified * Fix tests * Fix error handling Co-authored-by: Alexey Sharp <[email protected]> * save (erigontech#3916) * Update libmdbx source (erigontech#3974) Same change as already merged in `devel` * Makefile (erigontech#3779): pass docker build arguments (erigontech#4239) Dockerfile requires some --build-arg options. Fix "docker" target to pass them. Fix GIT_TAG to reflect the most recent tag related to HEAD, instead of an unrelated most recent tag. Use it as the image VERSION. Image tags need to be passed explicitly if needed: DOCKER_FLAGS='-t erigon:latest' make docker * save (erigontech#4346) * Gray Glacier bomb delay (erigontech#4444) * Update version.go on stable branch (erigontech#4447) * Update version.go * Fix lint Co-authored-by: Alexey Sharp <[email protected]> * Clean up * in transaction execution, subtract from account balance only after enough gaspool is ensured (erigontech#4450) - noticed the difference when executing testdata#10 in go-ethereum and erigon * Update skip_analysis.go (erigontech#4452) * Adjust version Co-authored-by: Enrique Jose Avila Asapche <[email protected]> Co-authored-by: Alex Sharov <[email protected]> Co-authored-by: battlmonstr <[email protected]> Co-authored-by: ledgerwatch <[email protected]> Co-authored-by: Alexey Sharp <[email protected]> Co-authored-by: can <[email protected]> Co-authored-by: Alex Sharp <[email protected]> Co-authored-by: primal_concrete_sledge <[email protected]> Co-authored-by: Igor Mandrigin <[email protected]> Co-authored-by: Igor Mandrigin <[email protected]> Co-authored-by: Andrea Lanfranchi <[email protected]> Co-authored-by: Andrew Ashikhmin <[email protected]> Co-authored-by: sudeep <[email protected]>

battlmonstr requested review from AlexeyAkhunov and AskAlexSharov March 7, 2022 19:24

AskAlexSharov approved these changes Mar 8, 2022

View reviewed changes

battlmonstr marked this pull request as draft March 8, 2022 11:46

battlmonstr force-pushed the nodedb branch from a4ba877 to 48df958 Compare March 8, 2022 18:22

battlmonstr mentioned this pull request Mar 8, 2022

kv: expose mdbx SafeNoSync and OptSyncPeriod options ledgerwatch/erigon-lib#356

Merged

battlmonstr marked this pull request as ready for review March 8, 2022 18:37

battlmonstr force-pushed the nodedb branch from 48df958 to 37daa7c Compare March 8, 2022 18:39

battlmonstr requested a review from AskAlexSharov March 8, 2022 18:47

AskAlexSharov requested changes Mar 9, 2022

View reviewed changes

AskAlexSharov reviewed Mar 9, 2022

View reviewed changes

go.mod Outdated Show resolved Hide resolved

battlmonstr force-pushed the nodedb branch from 37daa7c to ecceee5 Compare March 9, 2022 12:40

battlmonstr force-pushed the nodedb branch from ecceee5 to ce91c0a Compare March 9, 2022 16:13

battlmonstr requested a review from AskAlexSharov March 9, 2022 16:31

AskAlexSharov approved these changes Mar 10, 2022

View reviewed changes

battlmonstr force-pushed the nodedb branch 2 times, most recently from a6b4e55 to 9b56cdb Compare March 10, 2022 12:06

battlmonstr force-pushed the nodedb branch from 9b56cdb to cfe5fa4 Compare March 10, 2022 12:44

battlmonstr merged commit ca97325 into devel Mar 10, 2022

battlmonstr deleted the nodedb branch March 10, 2022 13:02

AlexeyAkhunov added the beta label Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery: throttle node DB commits (#3581) #3656

Discovery: throttle node DB commits (#3581) #3656

battlmonstr commented Mar 7, 2022

AskAlexSharov left a comment

battlmonstr commented Mar 8, 2022 •

edited

Loading

battlmonstr commented Mar 8, 2022 •

edited

Loading

AskAlexSharov commented Mar 9, 2022

AskAlexSharov left a comment

battlmonstr commented Mar 9, 2022 •

edited

Loading

AskAlexSharov left a comment

Discovery: throttle node DB commits (#3581) #3656

Discovery: throttle node DB commits (#3581) #3656

Conversation

battlmonstr commented Mar 7, 2022

AskAlexSharov left a comment

Choose a reason for hiding this comment

battlmonstr commented Mar 8, 2022 • edited Loading

battlmonstr commented Mar 8, 2022 • edited Loading

AskAlexSharov commented Mar 9, 2022

AskAlexSharov left a comment

Choose a reason for hiding this comment

battlmonstr commented Mar 9, 2022 • edited Loading

SafeNoSync

AskAlexSharov left a comment

Choose a reason for hiding this comment

battlmonstr commented Mar 8, 2022 •

edited

Loading

battlmonstr commented Mar 8, 2022 •

edited

Loading

battlmonstr commented Mar 9, 2022 •

edited

Loading