Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incentives: cache top online accounts and use when building AbsentParticipationAccounts #6085

Merged
merged 26 commits into from
Oct 30, 2024

Conversation

cce
Copy link
Contributor

@cce cce commented Jul 26, 2024

Summary

In #5757 a mechanism was introduced to suspend "absentee" accounts that don't participate (by making a proposal, or heartbeat as in #5799), by adding a block header AbsentParticipationAccounts, similar to ExpiredParticipationAccounts.

Currently, the list is generated by considering any account touched by a transaction in the current block, since this data is readily available at endOfBlock(). This PR adds a periodically-updated cache of top online accounts to the ledger, to find additional online accounts not mentioned in the current block.

All of these tracked addresses will now be checked for absentee or expired status each round. To get a recent list of top online accounts, this PR uses recent work done by the votersTracker and state proof worker. (Every 256 rounds, the state proof system performs a TopOnlineAccounts query.) This adds access to the votersTracker to fetch the most recent list of top online addresses, and for each address looks up the latest round's data from the online account cache.

LastProposed and LastHeartbeat are added to the online accounts table's DB representation in this PR. This also fixes an issue introduced in #5965 where uses of ledgercore.OnlineAccountData (which didn't have LastHeartbeat/LastProposed fields) were replaced by basics.OnlineAccountData (which did) and ended up with those fields not being set in a couple of conversions from AccountData.

Test Plan

  • update TestAbsenteeChecks
  • update TestExpiredAccountGeneration
  • maybe update TestExpiredAccountGenerationWithDiskFailure?
  • update TestAbsentTracking
  • added new TestLatestCompletedVotersUpTo
  • update TestAbsenteeChallenges
  • update TestEvalFunctionForExpiredAccounts
  • disable Payouts for state proof E2E test TestTotalWeightChanges (TODO return later after auto-heartbeating added, to ensure the accounts aren't suspended)
  • update test/e2e-go/features/incentives/suspension_test.go (TODO return later after heartbeats)

@cce cce force-pushed the track-incentive-candidates branch from 975ddb4 to 21db44d Compare July 26, 2024 17:35
ledger/ledgercore/onlineacct.go Outdated Show resolved Hide resolved
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just a thought - why not to init top online on startup and then maintain the list in acctonline while processing incoming blocks?

@cce
Copy link
Contributor Author

cce commented Jul 27, 2024

Just a thought - why not to init top online on startup and then maintain the list in acctonline while processing incoming blocks?

My first approach was to make it a field in the onlineAccounts tracker like the ao.voters sub-tracker, but as I kept cutting it down to MVP and removed its access to the onlineAccounts.deltas, and took it out of being called in onlineAccounts.newBlockImpl(), I moved it out since it had no dependencies left. It felt like I kept adding duplicate state that was already being maintained in onlineAccounts, and duplicate logic for looking it up. So I wanted to try just the absolute minimal approach to start, by relying totally on the onlineAccounts tracker's caching system (which we already know works correctly) rather than add a new cached list of online accounts and delta-processing code..

cmd/tealdbg/localLedger.go Outdated Show resolved Hide resolved
ledger/eval/eval.go Outdated Show resolved Hide resolved
ledger/eval/prefetcher/prefetcher_alignment_test.go Outdated Show resolved Hide resolved
ledger/toponline.go Outdated Show resolved Hide resolved
ledger/eval/eval.go Show resolved Hide resolved
Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 77.57009% with 24 lines in your changes missing coverage. Please review.

Project coverage is 56.27%. Comparing base (619d257) to head (9d46fa6).
Report is 8 commits behind head on feature/heartbeats.

Files with missing lines Patch % Lines
ledger/eval/eval.go 87.27% 4 Missing and 3 partials ⚠️
ledger/ledger.go 75.00% 2 Missing and 2 partials ⚠️
ledger/ledgercore/votersForRound.go 0.00% 4 Missing ⚠️
ledger/tracker.go 0.00% 3 Missing ⚠️
cmd/tealdbg/localLedger.go 0.00% 2 Missing ⚠️
daemon/algod/api/server/v2/dryrun.go 0.00% 2 Missing ⚠️
ledger/ledgercore/accountdata.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/heartbeats    #6085      +/-   ##
======================================================
+ Coverage               56.22%   56.27%   +0.05%     
======================================================
  Files                     494      494              
  Lines                   69954    70040      +86     
======================================================
+ Hits                    39330    39416      +86     
+ Misses                  27947    27944       -3     
- Partials                 2677     2680       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ledger/eval/eval.go Show resolved Hide resolved
ledger/eval/eval.go Show resolved Hide resolved
ledger/ledger.go Outdated Show resolved Hide resolved
Co-authored-by: John Jannotti <[email protected]>
@cce cce requested a review from algorandskiy August 21, 2024 15:22
ledger/ledger.go Outdated Show resolved Hide resolved
@cce cce marked this pull request as ready for review October 4, 2024 17:31
@@ -458,7 +458,7 @@ func TestOnlineAcctModelSimple(t *testing.T) {
})
// test same scenario on double ledger
t.Run("DoubleLedger", func(t *testing.T) {
m := newDoubleLedgerAcctModel(t, protocol.ConsensusFuture, true)
m := newDoubleLedgerAcctModel(t, protocol.ConsensusV39, true) // TODO simulate heartbeats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this on future?

Copy link
Contributor Author

@cce cce Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails because heartbeats aren't implemented, but proposers aren't being set, so the big accounts are challenged and kicked offline, and all the stake numbers don't match the test expectations. I could have tried to fix this by ensuring all the test accounts show up as proposers as often as necessary to avoid suspension, but I thought maybe it would be better to see after heartbeats were implemented whether that would make the tests pass without as much modification.

@@ -47,6 +47,7 @@ type roundCowParent interface {
// lookup retrieves agreement data about an address, querying the ledger if necessary.
lookupAgreement(basics.Address) (basics.OnlineAccountData, error)
onlineStake() (basics.MicroAlgos, error)
knockOfflineCandidates() (map[basics.Address]basics.OnlineAccountData, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a NIT: should we actually call this top online accounts or similar naming? It's very clear from comments that's what we are requesting, more a debate over if the name should be based on what it's sourced from vs the use-case we have for this atm.

Copy link
Contributor Author

@cce cce Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a potentially stale list of top online accounts, if new accounts appeared online in the last 256 rounds (since the last state proof) they wouldn't appear. So the word "candidates" was intended to make it seem a little less definitive that this was the complete list of top online accounts for the round... but happy to pick any other name, I wasn't particularly happy with this name.

This is already being used in a method JJ called "generateKnockOfflineAccountsList" in #5757 which is where the "knockOffline" part came from.

ledger/eval/eval.go Show resolved Hide resolved
@@ -810,6 +810,9 @@ func TestTotalWeightChanges(t *testing.T) {
a := require.New(fixtures.SynchronizedTest(t))

consensusParams := getDefaultStateProofConsensusParams()
consensusParams.Payouts = config.ProposerPayoutRules{} // TODO re-enable payouts when nodes aren't suspended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we tracking these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should make an issue to address the "update this test once heartbeats are implemented" TODOs in this PR

Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will not have the top online cache before the first state proof, right? Maybe it would make sense to seed it during genesis (since the onlince accounts are listed out for us in the genesis file, I think). That could avoid special cases in the tests.

ledger/eval/eval.go Show resolved Hide resolved
func (eval *BlockEvaluator) endOfBlock() error {
// When generating a block, participating addresses are passed to prevent a
// proposer from suspending itself.
func (eval *BlockEvaluator) endOfBlock(participating ...basics.Address) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ...basics.Address instead of []basics.Address? I assume callers always have a slice, as opposed to call sites with, say, 5 explicit arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true, this is just me optimizing for a smaller diff, to not change other endOfBlock callers, but the idea is to pass a slice — can change

IncentiveEligible bool // currently unused below, but may be needed in the future
}
candidates := make(map[basics.Address]candidateData)
partAddrs := util.MakeSet(participating...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do anything else with this slice? Maybe we should push the Set type up through the callers, so that it is built as a Set when it is first created to pass to endOfBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in GenerateBlock while making a map of end-of-block account state for participating addresses, to include in the UnfinishedBlock ... if we pushed it up to GenerateBlock then it could protect against looking up the same participating address twice, if duplicate addresses were passed to GenerateBlock.

ledger/eval/eval.go Show resolved Hide resolved
if maxSuspensions > 0 {
knockOfflineCandidates, err := eval.state.knockOfflineCandidates()
if err != nil {
// Log an error and keep going; generating lists of absent and expired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this implies some nodes can "choose" not to search for absent/expired accounts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, when generating a block it is not required they put any accounts in the {Absent,Expired}ParticipationAccounts block headers, but if they are in the list, validation rules require that the accounts are actually absent or expired.


// Now, check these candidate accounts to see if they are expired or absent.
for accountAddr, acctData := range candidates {
if acctData.MicroAlgosWithRewards.IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% of time, zero balance implies being closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct, my understanding is currently the only way you can have a zero balance at the end of the round is if your account has been closed.

//
// This function is passed a list of participating addresses so a node will not
// propose a block that suspends or expires itself.
func (eval *BlockEvaluator) generateKnockOfflineAccountsList(participating []basics.Address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

participating is really "participating accounts excluding any I host"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, the "participating" argument is the accounts that the node hosts.

Copy link
Contributor

@gmalouf gmalouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good in general, a few small comments.

blkEval = l.nextBlock(t)
//require.Empty(t, vb.Block().ExpiredParticipationAccounts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this added commented out?

Copy link
Contributor Author

@cce cce Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test sets up a bunch of participating accounts that are separate from the ones that I'm interested in, and they do expire (before they didn't because they weren't noticed), but in a separate branch I was working on updating this

challenge := byte(0)
for i := uint64(0); i < uint64(1210); i++ { // A bit past one grace period (200) past challenge at 1000.
vb := l.endBlock(t, blkEval)
for i := uint64(0); i < uint64(1200); i++ { // Just before first suspension at 1171
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not go past first suspension - why 1200?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's based on the values are set for certain accounts initializing LastHeartbeat/LastProposed earlier in the test

}

st := txn.Sign(keys[0])
err = eval.Transaction(st, transactions.ApplyData{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all of these eval.Transaction calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you no longer need to send transactions to cause GenerateBlock/BlockEvaluator to "notice" an account is expired or not participating

}

// fetch fresh data up to this round from online account cache. These accounts should all
// be in cache, as long as proto.StateProofTopVoters < onlineAccountsCacheMaxSize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a condition to call out in the consensus file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TestOnlineAccountsCacheSizeBiggerThanStateProofTopVoters

@cce cce changed the base branch from master to feature/heartbeats October 30, 2024 14:39
@cce cce force-pushed the track-incentive-candidates branch from a843630 to c558d59 Compare October 30, 2024 14:43
@cce
Copy link
Contributor Author

cce commented Oct 30, 2024

Merging this into feature/heartbeats branch now to follow with two follow-up PRs for addressing PR feedback here:

  • branch of changes to address @jannotti's feedback "Maybe it would make sense to seed it during genesis (since the online accounts are listed out for us in the genesis file, I think). That could avoid special cases in the tests"
  • branch of changes to make BlockEvaluator.endOfBlock(participating) a non-optional argument. Updates tests to require Proposer is set "realistically" to a participating account that is not the fee sink or rewards pool address in test ledger implementations that generate blocks, like evalTestLedger.endBlock() and simple/DoubleLedger.

@cce cce merged commit 28338ff into algorand:feature/heartbeats Oct 30, 2024
17 checks passed
@cce cce deleted the track-incentive-candidates branch October 30, 2024 19:41
@@ -1607,25 +1619,94 @@ type challenge struct {
// deltas and testing if any of them needs to be reset/suspended. Expiration
// takes precedence - if an account is expired, it should be knocked offline and
// key material deleted. If it is only suspended, the key material will remain.
func (eval *BlockEvaluator) generateKnockOfflineAccountsList() {
//
// Different ndoes may propose different list of addresses based on node state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndoes -> nodes

candidates[accountAddr] = candidateData{
VoteLastValid: acctData.VoteLastValid,
VoteID: acctData.VoteID,
Status: basics.Online, // from lookupOnlineAccountData, which only returns online accounts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess need a test to enforce knockOfflineCandidates -> lookupOnlineAccountData control flow

for addr := range voters.AddrToPos {
data, err := l.acctsOnline.lookupOnlineAccountData(rnd, addr)
if err != nil {
continue // skip missing / not online accounts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would voters ever return non-online account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the voters are only calculating Top N every 256 rounds, so if a lookup for the current round (for the cached addr from the last state proof interval) being requested is that the account was closed/deleted, you could hit an error here.

Copy link
Contributor Author

@cce cce Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a comment and write a test exercising this case, realizing it is kind of complicated now after writing it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants