Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beacon, core, eth, miner: integrate witnesses into production Geth #30069

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Jun 25, 2024

This PR integrates witness-enabled block production, witness-creating payload execution and stateless cross-validation into the engine API. The purpose of the PR is to enable the following use-cases (for API details, please see next section):

  • Cross validating locally created blocks:

    • Call forkchoiceUpdatedWithWitness instead of forkchoiceUpdated to trigger witness creation too.
    • Call getPayload as before to retrieve the new block and also the above created witness.
    • Call executeStatelessPayload against another client to cross-validate the block.
  • Cross validating locally processed blocks:

    • Call newPayloadWithWitness instead of newPayload to trigger witness creation too.
    • Call executeStatelessPayload against another client to cross-validate the block.
  • Block production for stateless clients (local or MEV builders):

    • Call forkchoiceUpdatedWithWitness instead of forkchoiceUpdated to trigger witness creation too.
    • Call getPayload as before to retrieve the new block and also the above created witness.
    • Propagate witnesses across the consensus libp2p network for stateless Ethereum.
  • Stateless validator validation:

    • Call executeStatelessPayload with the propagated witness to statelessly validate the block.

Note, the various WithWitness methods could also just be an additional boolean flag on the base methods, but this PR wanted to keep the methods separate until a final consensus is reached on how to integrate in production.


The following engine API types are introduced:

// StatelessPayloadStatusV1 is the result of a stateless payload execution.
type StatelessPayloadStatusV1 struct {
	Status          string      `json:"status"`
	StateRoot       common.Hash `json:"stateRoot"`
	ReceiptsRoot    common.Hash `json:"receiptsRoot"`
	ValidationError *string     `json:"validationError"`
}
  • Add forkchoiceUpdatedWithWitnessV1,2,3 with same params and returns as forkchoiceUpdatedV1,2,3, but triggering a stateless witness building if block production is requested.
  • Extend getPayloadV2,3 to return executionPayloadEnvelope with an additional witness field of type bytes iff created via forkchoiceUpdatedWithWitnessV2,3.
  • Add newPayloadWithWitnessV1,2,3,4 with same params and returns as newPayloadV1,2,3,4, but triggering a stateless witness creation during payload execution to allow cross validating it.
  • Extend payloadStatusV1 with a witness field of type bytes if returned by newPayloadWithWitnessV1,2,3,4.
  • Add executeStatelessPayloadV1,2,3,4 with same base params as newPayloadV1,2,3,4 and one more additional param (witness) of type bytes. The method returns statelessPayloadStatusV1, which mirrors payloadStatusV1 but replaces latestValidHash with stateRoot and receiptRoot.

@karalabe karalabe marked this pull request as ready for review June 25, 2024 16:59
@@ -1713,20 +1715,20 @@ func (bc *BlockChain) insertChain(chain types.Blocks, setHead bool) (int, error)
if setHead {
// First block is pruned, insert as sidechain and reorg only if TD grows enough
log.Debug("Pruned ancestor, inserting as sidechain", "number", block.Number(), "hash", block.Hash())
return bc.insertSideChain(block, it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insertSideChain is only called if: (a) parent state is pruned (b) the blocks are inserted via InsertChain.

Given that InsertChain is only used in downloader with the semantics of insert the block as canonical chain. Therefore, we don't need to bother the witness construction here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part was needed so that I could run the state tests for pre-merge networks too? Can't really say for sure, but I kind of remember there was one code path that felt weird to me too, but was needed to allow running the old tests. Maybe not this one?

@@ -1713,20 +1715,20 @@ func (bc *BlockChain) insertChain(chain types.Blocks, setHead bool) (int, error)
if setHead {
// First block is pruned, insert as sidechain and reorg only if TD grows enough
log.Debug("Pruned ancestor, inserting as sidechain", "number", block.Number(), "hash", block.Hash())
return bc.insertSideChain(block, it)
return bc.insertSideChain(block, it, makeWitness)
} else {
// We're post-merge and the parent is pruned, try to recover the parent state
log.Debug("Pruned ancestor", "number", block.Number(), "hash", block.Hash())
_, err := bc.recoverAncestors(block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass the makeWitness flag into recoverAncestors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And inside of recoverAncestors, we only need to generate the witness if makeWitness is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so we only want to generate a witness for the very last block. I.e. we only care about witnesses from newPayload calls. As far as I understand this code, we aare only recovering the parent, and for that we never want to make a witness. Later on line 1820 is when we actually make the witness. The reason we have the witness enabled on 1718 above is because that returns it directly, but the recover only preps the ancestor. At least accoring to the comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recoverAncestors will be called in the call site of newPayload, specifically:

  • newPayload calls InsertBlockWithoutSetHead
  • InsertBlockWithoutSetHead calls recoverAncestors if the state of parent block is not available
  • the block specified by newPayload is executed within the recoverAncestors after all its ancestors

suggested diff

diff --git a/core/blockchain.go b/core/blockchain.go
index 8030fb84d1..f7c921fe64 100644
--- a/core/blockchain.go
+++ b/core/blockchain.go
@@ -1688,7 +1688,7 @@ func (bc *BlockChain) insertChain(chain types.Blocks, setHead bool, makeWitness
 		} else {
 			// We're post-merge and the parent is pruned, try to recover the parent state
 			log.Debug("Pruned ancestor", "number", block.Number(), "hash", block.Hash())
-			_, err := bc.recoverAncestors(block)
+			_, err := bc.recoverAncestors(block, makeWitness)
 			return nil, it.index, err
 		}
 	// Some other error(except ErrKnownBlock) occurred, abort.
@@ -2110,7 +2110,7 @@ func (bc *BlockChain) insertSideChain(block *types.Block, it *insertIterator, ma
 // all the ancestor blocks since that.
 // recoverAncestors is only used post-merge.
 // We return the hash of the latest block that we could correctly validate.
-func (bc *BlockChain) recoverAncestors(block *types.Block) (common.Hash, error) {
+func (bc *BlockChain) recoverAncestors(block *types.Block, makeWitness bool) (common.Hash, error) {
 	// Gather all the sidechain hashes (full blocks may be memory heavy)
 	var (
 		hashes  []common.Hash
@@ -2150,7 +2150,7 @@ func (bc *BlockChain) recoverAncestors(block *types.Block) (common.Hash, error)
 		} else {
 			b = bc.GetBlock(hashes[i], numbers[i])
 		}
-		if _, _, err := bc.insertChain(types.Blocks{b}, false, false); err != nil {
+		if _, _, err := bc.insertChain(types.Blocks{b}, false, makeWitness && i == 0); err != nil {
 			return b.ParentHash(), err
 		}
 	}
@@ -2387,7 +2387,7 @@ func (bc *BlockChain) SetCanonical(head *types.Block) (common.Hash, error) {
 
 	// Re-execute the reorged chain in case the head state is missing.
 	if !bc.HasState(head.Root()) {
-		if latestValidHash, err := bc.recoverAncestors(head); err != nil {
+		if latestValidHash, err := bc.recoverAncestors(head, false); err != nil {
 			return latestValidHash, err
 		}
 		log.Info("Recovered head state", "number", head.Number(), "hash", head.Hash())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

eth/catalyst/api.go Outdated Show resolved Hide resolved
@etan-status
Copy link

For binary representation, an approach on SSZ would benefit latency and throughput.
This, independently of the witnesses.

One can follow the same concept of beacon-API and switch to SSZ based on the HTTP Accept / Content-Type headers being set to application/octet-stream instead of application/json.

SSZ supports optional fields with EIP-7495: SSZ StableContainer. This is currently being implemented across many libraries, with implementation progress documented at https://stabilitynow.box
With optional fields, witness data could be passed optionally. If present, it's the stateless call, otherwise it's a stateful call. Or, alternatively, just go with regular SSZ Container if no optional support is needed. The calls would still be versioned to select what SSZ Profile to go with, i.e., the payload's fork is determined by the version.

The ExecutionPayload already has a native SSZ encoding as defined in consensus-specs. The additional fields for forkchoiceUpdated and newPayload, namely the PayloadAttributes and PayloadID would require SSZ pendents. Everything would then be bundled in a surrounding SSZ StableContainer (or regular Container, but then PayloadAttributes would have to be always present e.g. with zero data to denote absence).

@1010adigupta
Copy link

I am implementing this prototype in reth as part of my ethereum protocol fellowship project, it would help in further benchmarking

@rjl493456442
Copy link
Member

Please address the one comment I left, otherwise lgtm

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if gary's point is addressed. Also, the new idea is to export the witness to the CL, and have the CL direct the validations, is that correct?
If so, I don't think that's ideal. It forces the data to be fed over http, to CL, then (most likely) over http again to some other EL. Beause afaik there's no "pipe" integration with any CL.

However, since the "extra-validator EL" client should be pretty low on ram consumption and negligible on disk, it should be perfectly doable to run both geth + another witness-validating EL on the same machine. And then the network roundtrips can be avoided if geth puts it on disk and the other clients just read it back up.

@karalabe
Copy link
Member Author

Also, the new idea is to export the witness to the CL, and have the CL direct the validations, is that correct?

Yes.

If so, I don't think that's ideal. It forces the data to be fed over http, to CL, then (most likely) over http again to some other EL. Beause afaik there's no "pipe" integration with any CL.

HTTP itself is not an issue, as long as you're shuffling data in RAM it's fast. If your EL and CL is split across different machines, then you do have the networking latency/bandwidth to consider, but given that most people will run EL/CL on the same machine, this is moot. For those who run it split, a gigabit network will incur a 30-40ms delay one way, myeah, not ideal, but also not that terrible. The reason I went with engine API is because that's the only available channel for this interoperability (nobody will ship a new thing, that's guaranteed) + it will be needed for Verkle in a similar way (need to solve the same problems) + Verkle will make the witnesses small enough not to matter even so.

However, since the "extra-validator EL" client should be pretty low on ram consumption and negligible on disk, it should be perfectly doable to run both geth + another witness-validating EL on the same machine. And then the network roundtrips can be avoided if geth puts it on disk and the other clients just read it back up.

Yes, but that requires inventing a new communication pathway between ELs, which nobody will do. That is why I discarded all such attempts. ELs' existing evm runner is also unsuitable, so I can't piggieback on that either. All things considered, the goal was to make it shippable as a first version and iterate vs overdesign and never ship.

@karalabe karalabe added this to the 1.14.10 milestone Sep 17, 2024
@0x00101010
Copy link

0x00101010 commented Sep 17, 2024

Regarding the order of operations applied to trie (deletions) mentioned by below paragraph:
Pre-fethcing trie nodes during execution is a wonderful way to speed up witness creation, but it's important to note that such tries may be incomplete. Before the block is finalized and the final root hash can be computed, the self-destructed accounts and deleted slots are removed from the tries. This can end up with trie paths being collapsed form full nodes to short nodes, resulting in sibling trie nodes to be accessed for the hashing. Trie insertions that are on close paths might also interfere, causing different siblings to be accessed based on whether delete or insert happens first. To make this part of the witness deterministic, clients need to apply deletions first and updates afterwards (I think it produces smaller witnesses than other way around (applying updates and then deletes)).

What would be the thoughts on standardizing it as

  1. updates first
  2. deletions later (but apply in the order of sorted keys, hashmap does not really guarantee order)

This way the witness is deterministic and can be shared across clients

cc: @karalabe

@karalabe karalabe merged commit 9326a11 into ethereum:master Sep 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants