Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery: throttle node DB commits (#3581) #3656

Merged
merged 1 commit into from
Mar 10, 2022
Merged

Discovery: throttle node DB commits (#3581) #3656

merged 1 commit into from
Mar 10, 2022

Conversation

battlmonstr
Copy link
Contributor

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the commits to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

Copy link
Collaborator

@AskAlexSharov AskAlexSharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, Create one more PR from stable branch with cherry-pick of this pr. Thank you

@battlmonstr
Copy link
Contributor Author

battlmonstr commented Mar 8, 2022

👌 some tests fail, I'm gonna check them first.

@battlmonstr battlmonstr marked this pull request as draft March 8, 2022 11:46
@battlmonstr
Copy link
Contributor Author

battlmonstr commented Mar 8, 2022

The current solution doesn't work well, because there can be only one pending RW transaction globally. Having a separate open transaction for pings makes other writes impossible. For example, updating the local node info blocks forever.

I'd either need to channel all node DB access to this event loop, or make an alternative solution. I'm considering trying something like MDBX_SAFE_NOSYNC instead.

@AskAlexSharov
Copy link
Collaborator

@battlmonstr MDBX_SAFE_NOSYNC doesn't conceptually change anything - still can't open parallel write txs.

FYI, TxPool works next way:

Can you explain one more time - what is current problem with having cache in nodedb? (kick me in Discord if you need)
"I'd either need to channel all node DB access to this event loop" - see also bbold.Batch

Copy link
Collaborator

@AskAlexSharov AskAlexSharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rejecting until some discussion

go.mod Outdated Show resolved Hide resolved
@battlmonstr
Copy link
Contributor Author

battlmonstr commented Mar 9, 2022

@AskAlexSharov the parallel writes are not needed anymore, I have removed that. It was a problem for the previous solution attempt with a goroutine, but I think that SafeNoSync alone solves the issue by and large.

The issue was reported in discord a while ago that the initial node sync performance is not great when using a HDD.
Alexey did some testing and felt that one of the reason was frequent commits to the nodedb.
Although it is 2 different database files, his hypothesis was that the sequential writes to the chain db are affected by the random writes to the nodedb. At the very least, it was unfortunate that nodedb did so many fsync commits while the info there is not needed to be 100% durable.

SafeNoSync

AFAIK, mdbx commit has 2 steps:

  1. write data: after this the data is accepted by the OS, and will be written to disk eventually
  2. flush OS buffers to disk (let's call it fsync): after this the data is fully persisted.

By default, mdbx tries to be 100% durable, and each Commit() does both write and fsync.
nodedb uses db.Update() calls, which are causing a Commit().
After this change, Update() calls cause fsync at most once in 2 seconds. mdbx has an internal timestamp of the last fsync, and it checks if the time since then is greater than the SyncPeriod.

P.S. If there are no new updates for a long time we risk to lose 2 seconds worth of updates. Here this doesn't seem like a big deal.

Copy link
Collaborator

@AskAlexSharov AskAlexSharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with everything

@battlmonstr battlmonstr force-pushed the nodedb branch 2 times, most recently from a6b4e55 to 9b56cdb Compare March 10, 2022 12:06
UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the disk writes to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

This helps on HDD drives.
@battlmonstr battlmonstr merged commit ca97325 into devel Mar 10, 2022
@battlmonstr battlmonstr deleted the nodedb branch March 10, 2022 13:02
AlexeyAkhunov pushed a commit that referenced this pull request Mar 14, 2022
UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the disk writes to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

This helps on HDD drives.
AlexeyAkhunov added a commit that referenced this pull request Mar 14, 2022
* Update to erigon-lib stable

* Discovery: throttle node DB commits (#3581) (#3656)

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the disk writes to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

This helps on HDD drives.

* Update erigon-lib

* Discovery: split node records to a sepatate DB table (#3581) (#3667)

Problem:
QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries.
In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries,
and very few true "node record" entries.
After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries.
There's a 1/16 chance of hitting a "node record" entry.
It means finding just about 10 nodes of 114 total on average from 150 attempts.

Solution:
Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits.

* Discovery: add Context to Listen. (#3577)

Add explicit Context to ListenV4 and ListenV5.
This makes it possible to stop listening by an external signal.

* Discovery: refactor public key to node ID conversions. (#3634)

Encode and hash logic was duplicated in multiple places.
* Move encoding to p2p/discover/v4wire
* Move hashing to p2p/enode/idscheme

* Change newRandomLookup to create a proper random key on a curve.

* Discovery: speed up lookup tests (#3677)

* Update erigon-lib

Co-authored-by: Alexey Sharp <[email protected]>
Co-authored-by: battlmonstr <[email protected]>
bgelb added a commit to bgelb/erigon that referenced this pull request Mar 19, 2022
* block by timestamp for stable (erigontech#3617)

* Add timings of forward stages to logs (erigontech#3621)

* save

* save

* deleted bor and starknet from doc (erigontech#3627)

* add nosqlite tag (erigontech#3653)

* add nosqlite tag

* save

* save (erigontech#3665)

* save (erigontech#3663)

* linter up (erigontech#3672) (erigontech#3673)

* linter up (erigontech#3672)

* save

* save

* Revert node DB cache (erigontech#3581) (erigontech#3674) (erigontech#3675)

Revert "Prevent frequent commits to the node DB in sentries (erigontech#2505)".
This reverts commit 65a9a26.

* [stable] Fixes to discovery nodedb (erigontech#3691)

* Update to erigon-lib stable

* Discovery: throttle node DB commits (erigontech#3581) (erigontech#3656)

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the disk writes to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

This helps on HDD drives.

* Update erigon-lib

* Discovery: split node records to a sepatate DB table (erigontech#3581) (erigontech#3667)

Problem:
QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries.
In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries,
and very few true "node record" entries.
After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries.
There's a 1/16 chance of hitting a "node record" entry.
It means finding just about 10 nodes of 114 total on average from 150 attempts.

Solution:
Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits.

* Discovery: add Context to Listen. (erigontech#3577)

Add explicit Context to ListenV4 and ListenV5.
This makes it possible to stop listening by an external signal.

* Discovery: refactor public key to node ID conversions. (erigontech#3634)

Encode and hash logic was duplicated in multiple places.
* Move encoding to p2p/discover/v4wire
* Move hashing to p2p/enode/idscheme

* Change newRandomLookup to create a proper random key on a curve.

* Discovery: speed up lookup tests (erigontech#3677)

* Update erigon-lib

Co-authored-by: Alexey Sharp <[email protected]>
Co-authored-by: battlmonstr <[email protected]>

* [stable] Fixes for state overrides in RPC (erigontech#3693)

* State override support (erigontech#3628)

* added stateOverride type

* solved import cycle

* refactoring

* imported wrong package

* fixed Call arguments

* typo

* override for traceCall

* Fix eth call (erigontech#3618)

* added isFake

* using isFake instead of checkNonce

* Revert "using isFake instead of checkNonce"

This reverts commit 6a202bb.

* Revert "added isFake"

This reverts commit 2c48024.

* only checking EOA if we are checking for Nonce

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>

* new bootnodes (erigontech#3591) (erigontech#3695)

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>

* Update skip analysis and preverified hashes (erigontech#3700) (erigontech#3704)

Co-authored-by: Alexey Sharp <[email protected]>

Co-authored-by: Alexey Sharp <[email protected]>

* Update version.go (erigontech#3701)

* simulate future blocks: timestamp and block number incremented by 1
for calls to trace_call(Many), debug_traceCall, eth_createAccessList

* expose UsedGas through trace_call and trace_callMany

* expose accessList via trace_call(Many)

* plumb error into trace_call(many) outer json response

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>
Co-authored-by: Alex Sharov <[email protected]>
Co-authored-by: battlmonstr <[email protected]>
Co-authored-by: ledgerwatch <[email protected]>
Co-authored-by: Alexey Sharp <[email protected]>
bgelb added a commit to bgelb/erigon that referenced this pull request Jun 16, 2022
* block by timestamp for stable (erigontech#3617)

* Add timings of forward stages to logs (erigontech#3621)

* save

* save

* deleted bor and starknet from doc (erigontech#3627)

* add nosqlite tag (erigontech#3653)

* add nosqlite tag

* save

* save (erigontech#3665)

* save (erigontech#3663)

* linter up (erigontech#3672) (erigontech#3673)

* linter up (erigontech#3672)

* save

* save

* Revert node DB cache (erigontech#3581) (erigontech#3674) (erigontech#3675)

Revert "Prevent frequent commits to the node DB in sentries (erigontech#2505)".
This reverts commit 65a9a26.

* [stable] Fixes to discovery nodedb (erigontech#3691)

* Update to erigon-lib stable

* Discovery: throttle node DB commits (erigontech#3581) (erigontech#3656)

UpdateFindFails/UpdateLastPingReceived/UpdateLastPongReceived events
are causing bursty DB commits (100 per minute).

This optimization throttles the disk writes to happen at most once in a few seconds,
because this info doesn't need to be persisted immediately.

This helps on HDD drives.

* Update erigon-lib

* Discovery: split node records to a sepatate DB table (erigontech#3581) (erigontech#3667)

Problem:
QuerySeeds will poke 150 random entries in the whole node DB and ignore hitting "field" entries.
In a bootstrap scenario it might hit hundreds of :lastping :lastpong entries,
and very few true "node record" entries.
After running for 15 minutes I've got totalEntryCount=1508 nodeRecordCount=114 entries.
There's a 1/16 chance of hitting a "node record" entry.
It means finding just about 10 nodes of 114 total on average from 150 attempts.

Solution:
Split "node record" entries to a separate table such that QuerySeeds doesn't do idle cycle hits.

* Discovery: add Context to Listen. (erigontech#3577)

Add explicit Context to ListenV4 and ListenV5.
This makes it possible to stop listening by an external signal.

* Discovery: refactor public key to node ID conversions. (erigontech#3634)

Encode and hash logic was duplicated in multiple places.
* Move encoding to p2p/discover/v4wire
* Move hashing to p2p/enode/idscheme

* Change newRandomLookup to create a proper random key on a curve.

* Discovery: speed up lookup tests (erigontech#3677)

* Update erigon-lib

Co-authored-by: Alexey Sharp <[email protected]>
Co-authored-by: battlmonstr <[email protected]>

* [stable] Fixes for state overrides in RPC (erigontech#3693)

* State override support (erigontech#3628)

* added stateOverride type

* solved import cycle

* refactoring

* imported wrong package

* fixed Call arguments

* typo

* override for traceCall

* Fix eth call (erigontech#3618)

* added isFake

* using isFake instead of checkNonce

* Revert "using isFake instead of checkNonce"

This reverts commit 6a202bb.

* Revert "added isFake"

This reverts commit 2c48024.

* only checking EOA if we are checking for Nonce

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>

* new bootnodes (erigontech#3591) (erigontech#3695)

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>

* Update skip analysis and preverified hashes (erigontech#3700) (erigontech#3704)

Co-authored-by: Alexey Sharp <[email protected]>

Co-authored-by: Alexey Sharp <[email protected]>

* Update version.go (erigontech#3701)

* rpcdaemon: fix TxContext in traceBlock (erigontech#3716)

Previously `txCtx` is not updated for every tx, which
leads to wrong tracing results.

* Mdbx: WriteMap fallback on error (erigontech#3714)

* save

* save

* Pool cost fix (erigontech#3725)

* save

* save

* Update to erigon-lib stable

Co-authored-by: Alex Sharp <[email protected]>

* mdbx v0.11.6 (erigontech#3771)

* mdbx fix after v0.11.6 (erigontech#3775)

* save

* save

* save

* [stable] Event log subscription (erigontech#3773)

* Logs sub (erigontech#3666)

* save

* Add onLogs

* Fix lint

* Add proper logs

* Update go.mod

* goimports

* Add unwind

* feat/rpcadaemon_logs_sub (erigontech#3751)

* Fixes to subscribe logs (erigontech#3769)

* Fixes to subscribe logs

* Add criteria to logs subscription

* Skeleton of RPC daemon event log distribution

* Simplify

* Send aggregated filter to Erigon

* Change API

* Print

* Fixes

* Fix topics filtering

* Fill txHash and blockHash

* Timing logs, fill tx index

* Print

* More print

* Print

* Asynchronous sending of log events to RPC daemon

* Remove prints

* Only extract logs if there are subscribers

* Check empty when RPC daemon is removed

Co-authored-by: Alex Sharp <[email protected]>
Co-authored-by: Alexey Sharp <[email protected]>

* Fix up

* Update to erigon-lib stable

* Update to erigon-lib stable

Co-authored-by: primal_concrete_sledge <[email protected]>
Co-authored-by: Alex Sharp <[email protected]>
Co-authored-by: Alexey Sharp <[email protected]>

* Update version.go (erigontech#3776)

* Update Skip analysis and preverified hashes (erigontech#3777) (erigontech#3778)

* Update skip analysis

* Add preverified hashes for mainnet and ropsten

* preverified hashes and bootnode for sepolia

Co-authored-by: Alexey Sharp <[email protected]>

Co-authored-by: Alexey Sharp <[email protected]>

* Integration: reset StageFinish also (erigontech#3783)

* docker hub - fetch git tags before build erigontech#3781

* fix nil pointer in fetch.go (erigontech#3802)

* Update preverified hashes and skip analysis (erigontech#3831) (erigontech#3832)

* Update skip_analysis

* Preverified hashes

Co-authored-by: Alexey Sharp <[email protected]>

Co-authored-by: Alexey Sharp <[email protected]>

* Fix 'all defaults' case for eth_estimateGas (erigontech#3790) (erigontech#3824)

* Fix 'all defaults' case for eth_estimateGas

* fix tests

Co-authored-by: Igor Mandrigin <[email protected]>

Co-authored-by: Igor Mandrigin <[email protected]>
Co-authored-by: Igor Mandrigin <[email protected]>

* Update version.go (erigontech#3829)

* Change libmdbx submodule origin  (erigontech#3894)

* save

* Restore testdata

Co-authored-by: Alexey Sharp <[email protected]>

* Update to erigon-lib stable (erigontech#3895)

Co-authored-by: Alexey Sharp <[email protected]>

* Update version.go (erigontech#3896)

* Update skip_analysis.go (erigontech#3897) (erigontech#3898)

* save (erigontech#3904)

* [stable] Fixes for header download (erigontech#3911)

* Rollback preverified hashes for mainnet

* Not remove header

* Set verified = true

* Fix verified extendUp and connect

* Skip already persisted links

* Prevent rewriting historical headers

* Not load links after highestInDb

* Restore preverified

* Fix tests

* Fix error handling

Co-authored-by: Alexey Sharp <[email protected]>

* save (erigontech#3916)

* Update libmdbx source (erigontech#3974)

Same change as already merged in `devel`

* Makefile (erigontech#3779): pass docker build arguments (erigontech#4239)

Dockerfile requires some --build-arg options.
Fix "docker" target to pass them.
Fix GIT_TAG to reflect the most recent tag related to HEAD, instead of an unrelated most recent tag.
Use it as the image VERSION.

Image tags need to be passed explicitly if needed:

    DOCKER_FLAGS='-t erigon:latest' make docker

* save (erigontech#4346)

* Gray Glacier bomb delay (erigontech#4444)

* Update version.go on stable branch (erigontech#4447)

* Update version.go

* Fix lint

Co-authored-by: Alexey Sharp <[email protected]>

* Clean up

* in transaction execution, subtract from account balance only after enough gaspool is ensured (erigontech#4450)

- noticed the difference when executing testdata#10 in go-ethereum and erigon

* Update skip_analysis.go (erigontech#4452)

* Adjust version

Co-authored-by: Enrique Jose  Avila Asapche <[email protected]>
Co-authored-by: Alex Sharov <[email protected]>
Co-authored-by: battlmonstr <[email protected]>
Co-authored-by: ledgerwatch <[email protected]>
Co-authored-by: Alexey Sharp <[email protected]>
Co-authored-by: can <[email protected]>
Co-authored-by: Alex Sharp <[email protected]>
Co-authored-by: primal_concrete_sledge <[email protected]>
Co-authored-by: Igor Mandrigin <[email protected]>
Co-authored-by: Igor Mandrigin <[email protected]>
Co-authored-by: Andrea Lanfranchi <[email protected]>
Co-authored-by: Andrew Ashikhmin <[email protected]>
Co-authored-by: sudeep <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants