Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-6110 in-protocol deposits epic #5366

Closed
9 tasks
g11tech opened this issue Apr 15, 2023 · 10 comments · Fixed by #6042
Closed
9 tasks

EIP-6110 in-protocol deposits epic #5366

g11tech opened this issue Apr 15, 2023 · 10 comments · Fixed by #6042
Labels
Epic Issues used as milestones and tracking multiple issues.

Comments

@g11tech
Copy link
Contributor

g11tech commented Apr 15, 2023

The EIP aims to bring deposit mechanism in the beacon chain. Currently the deposits and fetched and processed wiith a sufficient lag (12 hrs) that sort of effectively rule out any change of re-org in terms of processing deposits. With now the proposed deposit receipt being the first order citizen of the beacon network and a beacon node bundle and pack them in a block to add them into canonical chain (like other messages like exit, bls changes etc)

Following would be the tracker items:

  • Add new deposit types and its processing as per the consensus specs
  • Add submission, pooling and bundling of such receipts in a block
  • Handle the gossip for receipts

However apart from these routine tasks, lodestar would need to refactor and handle its pubkey <> index map which is currently part of EpochContext which is a global object. Currently since deposit logs from EL are processed with such a big log, its very deterministic the order in which deposits would be processed and packed in the blocks and hence gets assigned indexes in EpochContext without any worries.

But with moving the deposit receipt processing in the beacon, this determinism goes away as now the indexes will now depend on the block receipt packing, more precisely varying between the forks. This leads to the precursor refactoring task.

  • Refactor the pubkey to index map in the state to be fork specific. This can be done in the following ways:
    • Now global epoch context's map corresponds to the finalized state of the forkchoice
    • Each fork represented by the tip node in the forkchoice protoarray (i.e no children) have an additional forkMap which is basically obtained by applying the receipts in order of the unfinalized blocks added in the fork.
    • Each time a child block is added in forkchoice the corresponding tip map is updated and assigned to the new tip
    • At any forkchoice node, the total map is: map of finalized in epoch context + map of the best Descendant and can be easily refered to (for any pubkey to find index, check first in epoch context's map and on miss check in forkchoice node's map)
    • These map doesn't necessarily need to be maintained in forkchoice but a reflecting structure can be maintained in epoch context so that its handily available to the beacon state

PS: still wip and pending discussions

cc @dapplion @wemeetagain

@philknows philknows added the Epic Issues used as milestones and tracking multiple issues. label Apr 15, 2023
@dapplion
Copy link
Contributor

dapplion commented Apr 17, 2023

Background

The current interface of the pubkey cache is

interface PubkeyCache {
  /** When processing signed objects must resolve validator indexes into a representation of the pubkey 
      optimized for fast aggregation */
  getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey;
  /** `process_deposit` requires to efficiently lookup if a pubkey is known in the state or not */
  getPubkeyIndex(pubkey: bytes): ValidatorIndex | null;
  /** Register new pubkey to cache. May be done when processing deposit or latter */
  addPubkey(pubkey: bytes, index: ValidatorIndex): void;
}

That API is currently implemented with two data structures which are globally shared on all states

type Index2PubkeyCache = PublicKey[];
type PubkeyIndexMap = Map<PubkeyHex, ValidatorIndex>();

Split cache

As @g11tech mentions we must have a cache that's fork aware. We should use the persistent-ts structurally shared data structures to make cloning cheap.

So attach to every state this two datastructures that are cloned on each state clone and can be safely mutated on block processing

unfinalizedPubkeyCache = immutable.List<PublicKey | null>
unfinalizedPubkeyIndexMap = immutable.Map<PubkeyHex, ValidatorIndex>

The mutliplex on both caches

function getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey {
  if (index < latestFinalizedValidatorsLen) {
    // Lookup global cache
    // - Should cover +99% of cases. Random access on fast Array
    return globalPubkeyCache[index];
  } else {
    // Lookup state's local cache
    // - Unlikely unless long non-finality. Slower access on immutable.List
    if (unfinalizedPubkeyCache[index] == null) {
      // Prime cache reading into the state.validators[index].pubkey
      // To prevent deserializing pubkey twice, keep global `unfinalizedPubkeyMap = Map<PubkeyHex, PublicKey>`
      // Deserialization may be done proactively if unfinalized validators become active
      primeUnfinalizedPubkeyCache(index);
    }
    return unfinalizedPubkeyCache[index];
  }
}
function getPubkeyIndex(pubkey: bytes): ValidatorIndex | null {
  // Lookup global cache
  return globalPubkeyIndexMap.get(pubkey)
  // Then lookup local cache
  // - Note now for new deposits both caches will be lookup
  || unfinalizedPubkeyIndexMap.get(pubkey);
}

Regarding cache insertion, we must ensure that each pubkey is only deserialized once, since those representations are memory heavy. getPubkeyForAggregation should be lazy for the unfinalized portion since it's unlikely to be called in normal cases. The local cache of the finalized state has to be rotated into the global cache

function addPubkey(pubkey: bytes, index: ValidatorIndex): void {
  unfinalizedPubkeyCache.set(index, null); // lazy cache, prime latter
  unfinalizedPubkeyIndexMap.set(pubkey, index);
}

Re-use indexes

@g11tech
Copy link
Contributor Author

g11tech commented Apr 18, 2023

I think we can make unfinalizedPubkeyCache = immutable.List<PubkeyHex | null>, and have a global hexToPublicKey: Map <PubkeyHex, PublicKey> in epoch context itself

@dapplion dapplion mentioned this issue Apr 19, 2023
13 tasks
@dapplion dapplion changed the title EIP-6110 strategy and tracker EIP-6110 in-protocol deposits epic May 29, 2023
@philknows philknows mentioned this issue Jul 12, 2023
15 tasks
@ensi321
Copy link
Contributor

ensi321 commented Aug 15, 2023

Background

The current interface of the pubkey cache is

interface PubkeyCache {
  /** When processing signed objects must resolve validator indexes into a representation of the pubkey 
      optimized for fast aggregation */
  getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey;
  /** `process_deposit` requires to efficiently lookup if a pubkey is known in the state or not */
  getPubkeyIndex(pubkey: bytes): ValidatorIndex | null;
  /** Register new pubkey to cache. May be done when processing deposit or latter */
  addPubkey(pubkey: bytes, index: ValidatorIndex): void;
}

That API is currently implemented with two data structures which are globally shared on all states

type Index2PubkeyCache = PublicKey[];
type PubkeyIndexMap = Map<PubkeyHex, ValidatorIndex>();

Split cache

As @g11tech mentions we must have a cache that's fork aware. We should use the persistent-ts structurally shared data structures to make cloning cheap.

So attach to every state this two datastructures that are cloned on each state clone and can be safely mutated on block processing

unfinalizedPubkeyCache = immutable.List<PublicKey | null>
unfinalizedPubkeyIndexMap = immutable.Map<PubkeyHex, ValidatorIndex>

The mutliplex on both caches

function getPubkeyForAggregation(index: ValidatorIndex): bls.PublicKey {
  if (index < latestFinalizedValidatorsLen) {
    // Lookup global cache
    // - Should cover +99% of cases. Random access on fast Array
    return globalPubkeyCache[index];
  } else {
    // Lookup state's local cache
    // - Unlikely unless long non-finality. Slower access on immutable.List
    if (unfinalizedPubkeyCache[index] == null) {
      // Prime cache reading into the state.validators[index].pubkey
      // To prevent deserializing pubkey twice, keep global `unfinalizedPubkeyMap = Map<PubkeyHex, PublicKey>`
      // Deserialization may be done proactively if unfinalized validators become active
      primeUnfinalizedPubkeyCache(index);
    }
    return unfinalizedPubkeyCache[index];
  }
}
function getPubkeyIndex(pubkey: bytes): ValidatorIndex | null {
  // Lookup global cache
  return globalPubkeyIndexMap.get(pubkey)
  // Then lookup local cache
  // - Note now for new deposits both caches will be lookup
  || unfinalizedPubkeyIndexMap.get(pubkey);
}

Regarding cache insertion, we must ensure that each pubkey is only deserialized once, since those representations are memory heavy. getPubkeyForAggregation should be lazy for the unfinalized portion since it's unlikely to be called in normal cases. The local cache of the finalized state has to be rotated into the global cache

function addPubkey(pubkey: bytes, index: ValidatorIndex): void {
  unfinalizedPubkeyCache.set(index, null); // lazy cache, prime latter
  unfinalizedPubkeyIndexMap.set(pubkey, index);
}

Re-use indexes

I have dug into the current use of index2pubkey in the codebase. It is used in block, attestation, slashing and sync committee related case for which all of them require active validators.

There is also a case where process_voluntary_exit() uses index2pubkey, but it was guarded by is_active_validator() check right before we validate the signature.

I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the states

@g11tech
Copy link
Contributor Author

g11tech commented Aug 15, 2023

aha very nice analysis @naviechan

I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state

what about

  1. very long cases of non finality that we do see on the devnets if not testnets (and hopefully never on mainnet). i think the suggested getPubkeyForAggregation also talks about this.
  2. serving beacon apis querying for status of unfinalized validators

i suggest to keep the original approach proposed, where we prime them in a lazy fashion.

@dapplion
Copy link
Contributor

aha very nice analysis @naviechan

I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state

what about

1. very long cases of non finality that we do see on the devnets if not testnets (and hopefully never on mainnet). i think the suggested `getPubkeyForAggregation` also talks about this.

2. serving beacon apis querying for status of unfinalized validators

i suggest to keep the original approach proposed, where we prime them in a lazy fashion.

Having a fork aware auto-pruned index2pubkey for non-finalized pubkeys is a complexity we should avoid if not necessary. To verify signatures the fallback can be:

  • retrieve relevant state
  • go to state.validators[i].pubkey
  • deserialize and use for validation

awfully slow, but covers the case. We should have WARN logs if that codepath happens

For pubkey2index we 100% need the un-finalized cache

@g11tech
Copy link
Contributor Author

g11tech commented Aug 16, 2023

makes sense, so the we can encapsulate this away in getPubkeyForAggregation and obviously pass state to it to make it fork aware

(or put it in cache as well depending upon the optional flags to cache, but all that can be done later on based on usage requirements)

@ensi321
Copy link
Contributor

ensi321 commented Aug 29, 2023

I have dug into the current use of index2pubkey in the codebase. It is used in block, attestation, slashing and sync committee related case for which all of them require active validators.

There is also a case where process_voluntary_exit() uses index2pubkey, but it was guarded by is_active_validator() check right before we validate the signature.

I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the states

I realized this is a bold claim without any supporting evidence.
I have written up a more thorough analysis as a proof of the needlessness of unfinalized index2pubkey:
https://hackmd.io/@adrninistrator1/SkHmz972n

Many thanks to @dapplion for suggestions and comments

@ensi321
Copy link
Contributor

ensi321 commented Sep 6, 2023

aha very nice analysis @naviechan

I believe there is no need to have the multiplex in getPubkeyForAggregation() and we don’t need to have unfinalizedPubkeyCache attached to the state

what about

  1. very long cases of non finality that we do see on the devnets if not testnets (and hopefully never on mainnet). i think the suggested getPubkeyForAggregation also talks about this.
  2. serving beacon apis querying for status of unfinalized validators

i suggest to keep the original approach proposed, where we prime them in a lazy fashion.

In the case of long non-finality:

  • finalized_checkpoint.epoch only affects new validators entering activation queue. Validators that are already in the queue will dequeue just fine
  • is_active_validator() is not affected by long non-finality, so any call to index2Pubkey will stay the same (attestation, sync committee etc.), and hence the conclusion of the needlessness of unfinalizedIndex2Pubkey remains unchanged
  • The only way unfinalizedPubkey2Index getting populated is through process_deposit() which the function already has the public key object, it makes sense we populate it directly because the validator index is just validators.length - 1. If we decide to do this lazily, we will need to search for the validator index later which will be quite slow. Something like [i for i, v in enumerate(state.validators) if v.pubkey == pubkey]
  • Lazy cache made sense in unfinalizedIndex2Pubkey as Lion initally proposed. Because 1) we can delay storing the pubkey 2) we can just check state.validators[i].pubkey with index i. But the idea of unfinalizedIndex2Pubkey is scraped.
  • I also believe there is no need to have a map Map<PubkeyHex, PublicKey>. Can someone confirm that?

@g11tech
Copy link
Contributor Author

g11tech commented Sep 6, 2023

I also believe there is no need to have a map Map<PubkeyHex, PublicKey>. Can someone confirm that?

yes no need i think 👍 , also no need for unfinalizedPubkey2Index which is state dependancy and can be computed on the fly for serving the apis via utility fn which takes state and pubkey as inputs

@ensi321
Copy link
Contributor

ensi321 commented Feb 21, 2024

Closing since #6042 is merged. Some minor follow-ups will be tracked along with other Electra EIPs in #6341

@ensi321 ensi321 closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Issues used as milestones and tracking multiple issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants