-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ledger: shorter deltas lookback implementation (320 rounds project) #4003
Conversation
At the moment the new tracker has the same logic as account updates but has own onlineacctbase round and own committing logic Because of trackers' initialization logic the new onlineacctbase and old acctbase must be synchronized. This is done at the end of onlineAccounts.produceCommittingTask Disclaimer: eventually most of this code will be removed but this PR allows independent work on removing 320 rounds from account updates and implementing the history storing in online accounts tracker
merge master into feature/320-rounds
Move the transaction tail storage to be on a tracker database table txtail
* online accounts DB support * tests
* Added LRU cache to track the most recent updates into onlineaccounts table * Added online accounts tests for updates and expirations * TODO: add another cache for looking up since accounts can change and we need answers about specific round, not the most recent write round
New table and data structure to track online totals and protocol versions history
* Add CatchpointLookback=320 to agreement * Add MaxAcctLookback=8 to options * Updated tests
* Cache block headers in txtail * Introduce DeeperBlockHeaderHistory consensus parameter
TestAcctOnlineRoundParamsCache was failing because it randomly increments rewards over 640 rounds and caused an overflow on the reward calculation. Fixed by limiting the number of rounds run in the test to 200 by changing the maxBalLookback consensus param.
Fixed t.lowWatermark value in loadFromDisk to match master as discovered by a test
func (l *Ledger) BlockHdrCached(rnd basics.Round) (hdr bookkeeping.BlockHeader, err error) { | ||
l.trackerMu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be BlockHdr
, not BlockHdrCached
. Just like with other lookup routines, if you ask for something with the rnd
too early, it can fail. But we don't call them all lookupResourceCached
and so on. Those fail if rnd
if before the dbround, this one fails if asked before txtail length. But the caller probably thinks of them very similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I tried this. See #3935 (comment)
TLDR: replay
deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blockheader cache questions
synchronousMode: db.SynchronousMode(cfg.LedgerSynchronousMode), | ||
accountsRebuildSynchronousMode: db.SynchronousMode(cfg.AccountsRebuildSynchronousMode), | ||
verifiedTxnCache: verify.MakeVerifiedTransactionCache(verifiedCacheSize), | ||
cfg: cfg, | ||
dbPathPrefix: dbPathPrefix, | ||
} | ||
|
||
l.headerCache.maxEntries = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we eliminate this cache? It seems that it is a small heap that replicates a small portion of the blockheaders that we have to maintain for 1001 rounds anyway. The lookups in txtail can be constant time so why bother with an LRU heap for 10 of them?
It seems like that's the only use of heapLRUCache
, so we get to eliminate runtime cost and the code.
I may be missing something, because I also don't understand the code for blockQueue.getBlockHdr. It seems to be doing a lot of work, but now the txtail offers a simple lookup, I think.
So I suppose this is actually a followup on why I was surprised that BlockHdrCached() was called that. It's because we already have BlockHdr()
, but now my question is why the BlockHeaderCached()
method can't take the place of BlockHdr()
and we can remove a lot of other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a plan to revisit this after the merge and use BlockHdr only eventually. This will be done as part of locking light refactoring project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State proofs also need a block header cache, plan is to review with @algoidan and efforts to make block headers available to their work without duplicate caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Rename round -> rnd column in new table for consistency * Other PR feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more questions before tomorrow's meeting.
roundHash = append(roundHash, crypto.Hash(data)) | ||
expectedRound-- | ||
} | ||
// reverse the array ordering in-place so that it would be incremental order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we read them with ORDER BY rnd DESC
and then reverse the results, rather than read them by ASC
?
I think I see that it's so you can calculate the roundHashes. Are they explained somewhere? I'm really surprised we're hashing the msgpack form of the txtail record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can read in reverse but the idea is to latest to 1000 back and then check expected size (rnd number outside).
I'd change it as a separate PR.
roundTailHashes []crypto.Digest | ||
|
||
// blockHeaderData contains the recent (MaxTxnLife + DeeperBlockHeaderHistory + len(deltas)) block header data. | ||
// The oldest entry is lowestBlockHeaderRound = database round - (MaxTxnLife + DeeperBlockHeaderHistory) + 1 | ||
blockHeaderData map[basics.Round]bookkeeping.BlockHeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the hashes exactly yet, but it does seem like roundTailHashes
and blockHeaderData
ought to be managed the same way. They seem to have the same amount of data in them. But roundTailHashes
is managed as a slice that keeps losing its front entries, while blockHeaderData
is a map where we delete the ones we don't care about any more. Why not handle them the same way? For that matter, why not stick them both in a struct and manage them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a little circular buffer called Recents
. It does pretty well.
go test -run ^NOTHING -bench 'BenchmarkRecents|BenchmarkTable|BenchmarkSlice'
goos: darwin
goarch: amd64
pkg: github.com/algorand/go-algorand/util
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkRecents-8 574202229 2.036 ns/op 0 B/op 0 allocs/op
BenchmarkTable-8 17191695 68.92 ns/op 1 B/op 0 allocs/op
BenchmarkSlice-8 186941323 6.276 ns/op 31 B/op 0 allocs/op
PASS
ok github.com/algorand/go-algorand/util 4.680s
These are sort of dumb microbenchmarks that grow the buffer up to 1000, and then start adding and removing one item at a time. The "op" being benchmarked is one add/drop cycle. I got the Add() and Drop() methods small enough to be inlined, so it's pretty much just shuffling some offsets for each operation.
list = append(list, txTailRound.TxnIDs[i]) | ||
lastValid[txTailRound.LastValid[i]] = list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not worth doing in the PR, but it seems quite surprising to me that we go through so much trouble around the size of these lists - guessing they will start at 256, and managing the slice capacities ourselves. How about just using the size of the list for the previous block, and let Go do the management beyond that? It seems like a better guess, and simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also not quite remember the rationale. Maybe part of the effort of running on r-pi 2 or something
* Move record.StateProofID setting to before online/offline switch * Reuse account resource separation logic * Remove baseAcctDataMigrate * Update account hash for updated accounts Co-authored-by: chris erway <[email protected]> Co-authored-by: Pavel Zbitskiy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion: things seem complete enough that we should merge this into master so that the entire engineering team can start exercising the new code. Review can continue here as long as it's useful even after the merge.
* seenAddr in onlineAccountsAll * createNormalizedOnlineBalanceIndex * comments
@@ -417,7 +452,7 @@ func (tr *trackerRegistry) commitSyncer(deferredCommits chan *deferredCommitCont | |||
} | |||
|
|||
// commitRound commits the given deferredCommitContext via the trackers. | |||
func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) { | |||
func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we made commitRound return an error, but we do not check the error when it is called in commitSyncer
… On Thu, Jul 7, 2022 at 6:11 PM cce ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ledger/tracker.go
<#4003 (comment)>:
> @@ -417,7 +452,7 @@ func (tr *trackerRegistry) commitSyncer(deferredCommits chan *deferredCommitCont
}
// commitRound commits the given deferredCommitContext via the trackers.
-func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) {
+func (tr *trackerRegistry) commitRound(dcc *deferredCommitContext) error {
we made commitRound return an error, but we do not check the error when it
is called in commitSyncer
—
Reply to this email directly, view it on GitHub
<#4003 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSMCYCXU7GAUVBV2T3RS43VS5IYLANCNFSM5WGICHOQ>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
@@ -55,6 +55,7 @@ func main() { | |||
mu.Unlock() | |||
// prevent requests for block #2 to go through. | |||
if strings.HasSuffix(request.URL.String(), "/block/2") { | |||
response.Write([]byte("webProxy prevents block 2 from serving")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling Write before WriteHeader means HTTP 200 will be sent
Summary
Main
Reduce deltas size from 320 to 8 by introducing a new online accounts tracker that preserves history of state (online/offline, stake, protos) for at least
MaxBalLookback = 2 x SeedRefreshInterval x SeedLookback
rounds back from Latest. New data are stores in new tables and they are excluded from catchpoints.Additional
Performance impact
master
branch.Test plan