ledger: shorter deltas lookback implementation (320 rounds project) #4003

algorandskiy · 2022-05-17T21:23:23Z

Summary

Main

Reduce deltas size from 320 to 8 by introducing a new online accounts tracker that preserves history of state (online/offline, stake, protos) for at least MaxBalLookback = 2 x SeedRefreshInterval x SeedLookback rounds back from Latest. New data are stores in new tables and they are excluded from catchpoints.

Additional

TxTail stores its data into a table as well in order to prevent full blocks loading on startup
TxTail stores MaxTxnLife + DeeperBlockHistory blocks
TxTail caches up to MaxTxnLife + DeeperBlockHistor recent block headers
Catchpoint generation is made in two stages: data file at X-320 and catchpoint itself at round X.
Voters subtracker can extend longer than MaxBalLookback history persisted in DB.

Performance impact

Regular nodes see ~3x memory consumption decrease on high 6,000 TPS load.
Archival nodes demonstrate near the same memory pressure around catchpoint round as master branch.
More tests are in progress

Test plan

Unit tests + coverage check
Manual tests on private nets, Beta, Test, Main - upgrades, catchup, fast catchup, incremental network update.
Instrumented nodes runs for LookupAgreement/OnlineTotals and catchpoint verification on private nets, Beta, Test, Main
Long betanet, testnet, mainnet nodes run.

At the moment the new tracker has the same logic as account updates but has own onlineacctbase round and own committing logic Because of trackers' initialization logic the new onlineacctbase and old acctbase must be synchronized. This is done at the end of onlineAccounts.produceCommittingTask Disclaimer: eventually most of this code will be removed but this PR allows independent work on removing 320 rounds from account updates and implementing the history storing in online accounts tracker

merge master into feature/320-rounds

Move the transaction tail storage to be on a tracker database table txtail

* online accounts DB support * tests

* Added LRU cache to track the most recent updates into onlineaccounts table * Added online accounts tests for updates and expirations * TODO: add another cache for looking up since accounts can change and we need answers about specific round, not the most recent write round

New table and data structure to track online totals and protocol versions history

* Add CatchpointLookback=320 to agreement * Add MaxAcctLookback=8 to options * Updated tests

* Cache block headers in txtail * Introduce DeeperBlockHeaderHistory consensus parameter

TestAcctOnlineRoundParamsCache was failing because it randomly increments rewards over 640 rounds and caused an overflow on the reward calculation. Fixed by limiting the number of rounds run in the test to 200 by changing the maxBalLookback consensus param.

Fixed t.lowWatermark value in loadFromDisk to match master as discovered by a test

ledger/tracker.go

jannotti · 2022-07-04T00:11:17Z

ledger/ledger.go

+func (l *Ledger) BlockHdrCached(rnd basics.Round) (hdr bookkeeping.BlockHeader, err error) {
+	l.trackerMu.RLock()


I think this should just be BlockHdr, not BlockHdrCached. Just like with other lookup routines, if you ask for something with the rnd too early, it can fail. But we don't call them all lookupResourceCached and so on. Those fail if rnd if before the dbround, this one fails if asked before txtail length. But the caller probably thinks of them very similarly.

Sure I tried this. See #3935 (comment)
TLDR: replay deadlocks.

jannotti

Blockheader cache questions

jannotti · 2022-07-04T01:17:55Z

ledger/ledger.go

 		synchronousMode:                db.SynchronousMode(cfg.LedgerSynchronousMode),
 		accountsRebuildSynchronousMode: db.SynchronousMode(cfg.AccountsRebuildSynchronousMode),
 		verifiedTxnCache:               verify.MakeVerifiedTransactionCache(verifiedCacheSize),
 		cfg:                            cfg,
+		dbPathPrefix:                   dbPathPrefix,
 	}

 	l.headerCache.maxEntries = 10


Can we eliminate this cache? It seems that it is a small heap that replicates a small portion of the blockheaders that we have to maintain for 1001 rounds anyway. The lookups in txtail can be constant time so why bother with an LRU heap for 10 of them?

It seems like that's the only use of heapLRUCache, so we get to eliminate runtime cost and the code.

I may be missing something, because I also don't understand the code for blockQueue.getBlockHdr. It seems to be doing a lot of work, but now the txtail offers a simple lookup, I think.

So I suppose this is actually a followup on why I was surprised that BlockHdrCached() was called that. It's because we already have BlockHdr(), but now my question is why the BlockHeaderCached() method can't take the place of BlockHdr() and we can remove a lot of other code.

I had a plan to revisit this after the merge and use BlockHdr only eventually. This will be done as part of locking light refactoring project.

State proofs also need a block header cache, plan is to review with @algoidan and efforts to make block headers available to their work without duplicate caches.

logged as https://github.com/algorand/go-algorand-internal/issues/2147

* Rename round -> rnd column in new table for consistency * Other PR feedback

jannotti

A few more questions before tomorrow's meeting.

ledger/acctonline.go

ledger/accountdb.go

jannotti · 2022-07-05T18:51:37Z

ledger/accountdb.go

+		roundHash = append(roundHash, crypto.Hash(data))
+		expectedRound--
+	}
+	// reverse the array ordering in-place so that it would be incremental order.


Why do we read them with ORDER BY rnd DESC and then reverse the results, rather than read them by ASC?

I think I see that it's so you can calculate the roundHashes. Are they explained somewhere? I'm really surprised we're hashing the msgpack form of the txtail record.

yes, we can read in reverse but the idea is to latest to 1000 back and then check expected size (rnd number outside).
I'd change it as a separate PR.

jannotti · 2022-07-05T19:13:46Z

ledger/txtail.go

+	roundTailHashes []crypto.Digest
+
+	// blockHeaderData contains the recent (MaxTxnLife + DeeperBlockHeaderHistory + len(deltas)) block header data.
+	// The oldest entry is lowestBlockHeaderRound = database round - (MaxTxnLife + DeeperBlockHeaderHistory) + 1
+	blockHeaderData map[basics.Round]bookkeeping.BlockHeader


I don't understand the hashes exactly yet, but it does seem like roundTailHashes and blockHeaderData ought to be managed the same way. They seem to have the same amount of data in them. But roundTailHashes is managed as a slice that keeps losing its front entries, while blockHeaderData is a map where we delete the ones we don't care about any more. Why not handle them the same way? For that matter, why not stick them both in a struct and manage them together?

I wrote a little circular buffer called Recents. It does pretty well.

go test -run ^NOTHING -bench 'BenchmarkRecents|BenchmarkTable|BenchmarkSlice' goos: darwin goarch: amd64 pkg: github.com/algorand/go-algorand/util cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz BenchmarkRecents-8 574202229 2.036 ns/op 0 B/op 0 allocs/op BenchmarkTable-8 17191695 68.92 ns/op 1 B/op 0 allocs/op BenchmarkSlice-8 186941323 6.276 ns/op 31 B/op 0 allocs/op PASS ok github.com/algorand/go-algorand/util 4.680s

These are sort of dumb microbenchmarks that grow the buffer up to 1000, and then start adding and removing one item at a time. The "op" being benchmarked is one add/drop cycle. I got the Add() and Drop() methods small enough to be inlined, so it's pretty much just shuffling some offsets for each operation.

jannotti · 2022-07-05T19:34:26Z

ledger/txtail.go

+				list = append(list, txTailRound.TxnIDs[i])
+				lastValid[txTailRound.LastValid[i]] = list


Maybe not worth doing in the PR, but it seems quite surprising to me that we go through so much trouble around the size of these lists - guessing they will start at 256, and managing the slice capacities ourselves. How about just using the size of the list for the previous block, and let Go do the management beyond that? It seems like a better guess, and simpler code.

I also not quite remember the rationale. Maybe part of the effort of running on r-pi 2 or something

ledger/accountdb.go

* Move record.StateProofID setting to before online/offline switch * Reuse account resource separation logic * Remove baseAcctDataMigrate * Update account hash for updated accounts Co-authored-by: chris erway <[email protected]> Co-authored-by: Pavel Zbitskiy <[email protected]>

winder

Per discussion: things seem complete enough that we should merge this into master so that the entire engineering team can start exercising the new code. Review can continue here as long as it's useful even after the merge.

* seenAddr in onlineAccountsAll * createNormalizedOnlineBalanceIndex * comments

cce · 2022-07-07T22:11:06Z

ledger/tracker.go

@@ -417,7 +452,7 @@ func (tr *trackerRegistry) commitSyncer(deferredCommits chan *deferredCommitCont
 }

 // commitRound commits the given deferredCommitContext via the trackers.
-func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) {
+func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) error {


we made commitRound return an error, but we do not check the error when it is called in commitSyncer

algorandskiy · 2022-07-07T22:19:00Z

We do at https://github.com/algorand/go-algorand/blob/master/ledger/acctonline_test.go#L53

…

On Thu, Jul 7, 2022 at 6:11 PM cce ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In ledger/tracker.go <#4003 (comment)>: > @@ -417,7 +452,7 @@ func (tr *trackerRegistry) commitSyncer(deferredCommits chan *deferredCommitCont } // commitRound commits the given deferredCommitContext via the trackers. -func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) { +func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) error { we made commitRound return an error, but we do not check the error when it is called in commitSyncer — Reply to this email directly, view it on GitHub <#4003 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APSMCYCXU7GAUVBV2T3RS43VS5IYLANCNFSM5WGICHOQ> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

cce · 2022-07-07T22:23:14Z

test/e2e-go/cli/goal/expect/catchpointCatchupWebProxy/webproxy.go

@@ -55,6 +55,7 @@ func main() {
 		mu.Unlock()
 		// prevent requests for block #2 to go through.
 		if strings.HasSuffix(request.URL.String(), "/block/2") {
+			response.Write([]byte("webProxy prevents block 2 from serving"))


calling Write before WriteHeader means HTTP 200 will be sent

In #4003 some pointer receivers and pointer arguments were changed to pass by value and value receivers, which could lead to a performance regression. This changes them back.

algorandskiy and others added 27 commits March 18, 2022 19:05

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

542b4c9

Merge pull request #3834 from algorandskiy/feature/320-rounds

03d1925

merge master into feature/320-rounds

320 rounds: transaction tail tracker implementation (#3748)

a704a6b

Move the transaction tail storage to be on a tracker database table txtail

320 rounds: online accounts DB support (#3814)

3c0938a

* online accounts DB support * tests

Minor txTail fixes (#3874)

50674d9

Fix TestFullCatchpointWriter (#3876)

2d871d2

Implement onlineroundparamstail (#3851)

6a7c305

New table and data structure to track online totals and protocol versions history

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

ac6d629

Post merge fixes

0028e06

Merge master into feature/320-rounds #3936

f18bbed

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

823737d

Post merge fixes

035def5

Merge master into feature/320-rounds #3942

c10cdda

320 rounds: parameterized lookback (#3899)

5b0fa7a

* Add CatchpointLookback=320 to agreement * Add MaxAcctLookback=8 to options * Updated tests

More precise online accounts test (#3951)

1a3e310

320 rounds: restore voting data in accountbase (#3885)

4b18677

320 rounds: expose block header from txtail (#3935)

62b7add

* Cache block headers in txtail * Introduce DeeperBlockHeaderHistory consensus parameter

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

aae73a1

merge master into feature/320-rounds #3965

2922959

fix flaky test (#3968)

b12d0c1

TestAcctOnlineRoundParamsCache was failing because it randomly increments rewards over 640 rounds and caused an overflow on the reward calculation. Fixed by limiting the number of rounds run in the test to 200 by changing the maxBalLookback consensus param.

Ledger keyreg test (#3974)

c82e902

320 rounds: Fix onlineroundparamstail (#3979)

c7dc3c6

TxTail reload test (#3988)

3637bdc

Fixed t.lowWatermark value in loadFromDisk to match master as discovered by a test

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

4c5a0cf

Merge master into feature/320-rounds #3999

13d5ee8

algorandskiy added the Team Carbon-11 label May 17, 2022

algorandskiy self-assigned this May 17, 2022

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

9c1bef6

ghost reviewed Jul 1, 2022

View reviewed changes

ledger/tracker.go Outdated Show resolved Hide resolved

ghost approved these changes Jul 1, 2022

View reviewed changes

jannotti reviewed Jul 4, 2022

View reviewed changes

jannotti reviewed Jul 5, 2022

View reviewed changes

algorandskiy added 2 commits July 5, 2022 12:18

320 rounds PR feedback (#4211)

364d423

* Rename round -> rnd column in new table for consistency * Other PR feedback

Re-enable catchpoint expect test (#4209)

419fd7e

jannotti reviewed Jul 5, 2022

View reviewed changes

cce mentioned this pull request Jul 5, 2022

cleanup: attempt to clean up lookupOnlineAccountData #4220

Closed

cce reviewed Jul 5, 2022

View reviewed changes

ledger/accountdb.go Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into feature/320-rounds

5648111

cce reviewed Jul 6, 2022

View reviewed changes

ledger/accountdb.go Show resolved Hide resolved

winder approved these changes Jul 6, 2022

View reviewed changes

algorandskiy added the Low Priority label Jul 6, 2022

320 rounds: PR feedback fixes (#4221)

a568d3d

* seenAddr in onlineAccountsAll * createNormalizedOnlineBalanceIndex * comments

algorandskiy removed the Low Priority label Jul 6, 2022

algorandskiy mentioned this pull request Jul 6, 2022

Develop a cpu/memory profiler/tool for analyzing arg struct by value vs pointer #4234

Closed

320 rounds: fix linter complains (#4235)

c52a5b4

algorandskiy merged commit 90b1c05 into master Jul 6, 2022

cce added a commit to cce/go-algorand that referenced this pull request Jul 7, 2022

restore pointer receivers and args from algorand#4003

d0d0689

cce mentioned this pull request Jul 7, 2022

ledger: restore pointer receivers and args from #4003 #4239

Merged

cce reviewed Jul 7, 2022

View reviewed changes

cce mentioned this pull request Jul 25, 2022

config: set MaxAcctLookback to 4 #4296

Merged

Algo-devops-service mentioned this pull request Aug 9, 2022

go-algorand 3.9.0-beta Release PR #4383

Merged

Algo-devops-service mentioned this pull request Sep 1, 2022

go-algorand 3.9.2-stable Release PR #4506

Merged

algojohnlee deleted the feature/320-rounds branch November 2, 2022 22:24

algonautshant mentioned this pull request Jul 24, 2023

ledger: Remove redundant block header cache #5540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ledger: shorter deltas lookback implementation (320 rounds project) #4003

ledger: shorter deltas lookback implementation (320 rounds project) #4003

algorandskiy commented May 17, 2022 •

edited

Loading

jannotti Jul 4, 2022

algorandskiy Jul 5, 2022

jannotti left a comment

jannotti Jul 4, 2022

algorandskiy Jul 5, 2022

cce Jul 6, 2022

algorandskiy Jul 7, 2022

jannotti left a comment

jannotti Jul 5, 2022

algorandskiy Jul 5, 2022

jannotti Jul 5, 2022

jannotti Jul 6, 2022 •

edited

Loading

jannotti Jul 5, 2022

algorandskiy Jul 6, 2022

winder left a comment

cce Jul 7, 2022

algorandskiy commented Jul 7, 2022 via email

cce Jul 7, 2022

		func (l *Ledger) BlockHdrCached(rnd basics.Round) (hdr bookkeeping.BlockHeader, err error) {
		l.trackerMu.RLock()

		list = append(list, txTailRound.TxnIDs[i])
		lastValid[txTailRound.LastValid[i]] = list

ledger: shorter deltas lookback implementation (320 rounds project) #4003

ledger: shorter deltas lookback implementation (320 rounds project) #4003

Conversation

algorandskiy commented May 17, 2022 • edited Loading

Summary

Main

Additional

Performance impact

Test plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannotti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannotti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannotti Jul 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

winder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

algorandskiy commented Jul 7, 2022 via email

Choose a reason for hiding this comment

algorandskiy commented May 17, 2022 •

edited

Loading

jannotti Jul 6, 2022 •

edited

Loading