-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving blob propagation post-PeerDAS with Decentralized Blob Building #6268
base: unstable
Are you sure you want to change the base?
Conversation
Some early testing results:
Bandwidth-limited fullnode (
This shows EL inbound traffic (fetch blos from peers) isn't too bad for MAX 6 blobs Next steps:
|
This was originally intended experimental but it's been pretty stable, 800 epochs on devnet and 100% participation, and I think we can merge this. |
dbf4bb8
to
01b91cb
Compare
Bring the attack to prod |
@jimmygchen Any issues with the kzg libraries? |
🙈 Trying to not mention the flag I use for testing. Luckily Just realized we need the |
Nope all good so far! I was just flagging the potential challenges that I could imagine when we increase blob count, but I haven't actually run into any issues yet. Will keep you updated, thanks |
|
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Outdated
Show resolved
Hide resolved
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Outdated
Show resolved
Hide resolved
beacon_node/network/src/network_beacon_processor/gossip_methods.rs
Outdated
Show resolved
Hide resolved
I think we should review blob publishing before merging this to avoid adding excessive bandwidth usage for Deneb. |
Be great to have #6403 merged first, as there will be some more conflicts that needs to be resolved, and a bit easier in this order. We also need to prioritise block publishing, could do it in this PR. |
# Conflicts: # beacon_node/beacon_chain/src/data_availability_checker.rs # beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs # beacon_node/execution_layer/src/engine_api.rs # beacon_node/execution_layer/src/engine_api/json_structures.rs # beacon_node/network/src/network_beacon_processor/gossip_methods.rs # beacon_node/network/src/network_beacon_processor/mod.rs # beacon_node/network/src/network_beacon_processor/sync_methods.rs
Remaining TODOs
|
63427e9
to
0a2c6f7
Compare
.await | ||
} | ||
|
||
async fn check_engine_blob_availability_and_import( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should inline this function into process_engine_blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a weak preference of not inlining it for consistency with other similar functions and keeping process_rpc_blobs
small.
/// This DOES NOT perform KZG verification because the KZG proofs should have been constructed | ||
/// immediately prior to calling this function so they are assumed to be valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not change this function and process_engine_blobs
to take KzgVerifiedBlobList
and make it correct by construction?
); | ||
let all_blobs_received = block_kzg_commitments_count_opt | ||
.map_or(false, |num_expected_blobs| { | ||
num_expected_blobs == num_received_blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth switching to >=
in case there's some future bug?
num_expected_blobs == num_received_blobs | |
num_received_blobs >= num_expected_blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we received more valid blobs than we should, then we probably want it to fail rather than covering it and importing them into the database?
Not sure how that's possible though, as blob sidecar fixed list is constructed here, and will fail if it does not have a corresponding kzg commitment:
lighthouse/beacon_node/beacon_chain/src/fetch_blobs.rs
Lines 298 to 304 in bf19769
fn build_blob_sidecars<E: EthSpec>( | |
block: &Arc<SignedBeaconBlock<E, FullPayload<E>>>, | |
response: Vec<Option<BlobAndProofV1<E>>>, | |
signed_block_header: SignedBeaconBlockHeader, | |
kzg_commitments_proof: &FixedVector<Hash256, E::KzgCommitmentsInclusionProofDepth>, | |
) -> Result<FixedBlobSidecarList<E>, FetchEngineBlobError> { | |
let mut fixed_blob_sidecar_list = FixedBlobSidecarList::default(); |
let Some(verified_blobs) = verified_blobs | ||
.into_iter() | ||
.map(|b| b.map(|b| b.to_blob())) | ||
.take(num_blobs_expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to take here? Do we consider the case where we have too many blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an existing check from Deneb to make sure we have exactly the same number of blobs as the kzg commitments, otherwise it returns AvailabilityCheckError::Unexpected
log, | ||
"Publishing data columns from EL"; | ||
"count" => publishable.len() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can compute the efficiency of this delay publish mechanism: track which columns were originally scheduled for publishing and became seen in this for loop. We can track that as a counter of "already_seen_before_publish" or something related. And maybe print a summary after the for loop summarizing how many we published in total, and how many we skipped publishing or how many we intended to publish at the beginning.
Also, do we log somewhere which specific column index we publish? Otherwise, it might be helpful to log in this line a vec of indices of which columns we are publishing.
tokio::time::sleep(supernode_data_column_publication_batch_interval).await; | ||
} | ||
}, | ||
"handle_data_columns_publish", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker to merging, but since reviews are slow maybe there's time to de-dup this bit
Issue Addressed
Built on top of #5829, an optimization came up by @michaelsproul, to fetches blobs from the EL to reduce the delay to block import.
This PR goes further to publish the blobs to the network, which helps improve the resiliency of the network, by having nodes with more resources contribute to blob propagation. This experimental solution is an attempt to solve the self building proposer bandwidth issue discussed on R&D Discord and described in @dankrad's post here.
The benefits of this proposals are:
Proposed Changes
fetch_blobs_and_publish
is triggered after a node has processed a gossip / rpc block and is still missing blob components. Once the node fetches the blob from EL, it then publishes the remaining blobs that hasn't seen on gossip to the network.Next steps:
Challenges:
c-kzg-4844
andrust-eth-kzg
) may struggle with constructing large number of cells and proofs at once due to the current memory allocation approach.eth/68
) hence doesn't incur the same gossip amplification cost (8x) on the CL.TODO before merging
engine_getBlobsV1
ethereum/consensus-specs#3864engine_getBlobsV1
ethereum/execution-apis#559Reference:
unstable
Get blobs from the EL's blob pool #5829