Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reducing the frequency of pending validator indices queries from validator client #4388

Closed
jimmygchen opened this issue Jun 9, 2023 · 5 comments
Assignees
Labels
optimization Something to make Lighthouse run more efficiently. val-client Relates to the validator client binary

Comments

@jimmygchen
Copy link
Member

jimmygchen commented Jun 9, 2023

Description

Currently the Validator Client (VC) polls the validator states for all inactive validators from the Beacon Node (for indices retrieval), once every slot (12 seconds on mainnet)

loop {
// Run this poll before the wait, this should hopefully download all the indices
// before the block/attestation tasks need them.
poll_validator_indices(&duties_service).await;
if let Some(duration) = duties_service.slot_clock.duration_to_next_slot() {
sleep(duration).await;
} else {
// Just sleep for one slot if we are unable to read the system clock, this gives
// us an opportunity for the clock to eventually come good.
sleep(duties_service.slot_clock.slot_duration()).await;
}
}
},

This is mostly not an issue until the number of inactive validator validators reaches a large number ~1000, which is probably quite rare. However we've recently seen some performance issues when the endpoint beacon/states/{state_id}/validators/{validator_id} is called repeatedly in a short period of time. Here's a script created by @michaelsproul to spam this endpoint aggressively, and it turns out this could cause an OOM on the beacon node.

There has been some discussions on how to improve this, potentially queuing the requests, however they might take a while to implement. In the mean time, we can probably reduce the frequency of this query to once or twice per epoch to reduce the performance impact on the node, as validator activation only happens once every epoch, and it may not be necessary to query the indices so often.

@jimmygchen jimmygchen added the val-client Relates to the validator client binary label Jun 9, 2023
@jimmygchen jimmygchen changed the title Consider reducing the frequency of pending validator indices query Consider reducing the frequency of pending validator indices queries Jun 9, 2023
@jimmygchen jimmygchen changed the title Consider reducing the frequency of pending validator indices queries Consider reducing the frequency of pending validator indices queries from validator client Jun 9, 2023
@jimmygchen jimmygchen added the optimization Something to make Lighthouse run more efficiently. label Sep 28, 2023
@paulhauner
Copy link
Member

paulhauner commented Apr 15, 2024

One way to reduce the frequency of polling would be to make an assumption that the VC only needs to know a validator index for an active validator.

Validators are added to the BeaconState via Deposit objects in a BeaconBlock. This happens on a per slot basis, so new validators can land in the BeaconState (i.e. be assigned an index) each slot. However, validators are activated on a per epoch basis (process_epoch > process_registry_updates).

So, it seems that it would be safe for us to:

  1. Poll for all validator indices on startup.
  2. When we've received a None response for a validator, poll each epoch.
    • I think we should consider not polling in the first slot of the epoch. That slot is known to be a weak point for us. Validators are activated at least 1 + MAX_SEED_LOOKAHEAD == 5 epochs after they first appear in the BeaconState so there's no rush to discover the validator.

There's an edge-case here where a syncing BN might return None, then import several epochs of the chain and then return Some and that validator is already active. In per-slot polling we would resolve this within a slot, but with per-epoch processing it takes an epoch to resolve. I'm not sure how I feel about this edge-case, it's hard to weigh off the costs of optimising for weird sync cases vs optimising for the case of big operators running lots of undeposited validators.

@jimmygchen jimmygchen self-assigned this Apr 15, 2024
@jimmygchen
Copy link
Member Author

Thanks Paul!
As discussed the above edge case is quite unlikely and it's a much more common scenario for validators to get their infrastructure running before even making deposits. So the proposal above sounds like a good way to go 👍

@dapplion
Copy link
Collaborator

dapplion commented Apr 15, 2024

What's the UX benefit of discovering the deposit inclusion immediately on the next slot? You won't be activated until your deposit is finalized anyway, there's nothing useful the VC can do besides logging

@paulhauner
Copy link
Member

What's the UX benefit of discovering the deposit inclusion immediately on the next slot?

That's my argument here: #4388 (comment)

I think per-slot discovery has some benefits in very rare edge cases (e.g., syncing several epochs of the canonical chain and then switching to it). However that's very low impact for the operator and insignificant for the chain as a whole, so I'd say we let the edge-case suffer.

@chong-he
Copy link
Member

Closed by #5628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. val-client Relates to the validator client binary
Projects
None yet
Development

No branches or pull requests

4 participants