Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruct data columns without blocking processing and import #5990

Closed

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jun 25, 2024

Issue Addressed

Part of #4983. This is a work in progress.

The current sequence of reconstruction:

  1. Receive blocks & data columns
  2. As soon as the supernode receives 50% of columns, start reconstruction (while holding the availability cache write lock)
  3. After reconstruction completes, the node imports block and publishes the remaining 50% of columns
  4. Now it may also receives the remaining 50%, which gets verified and then ignored

Proposed Changes

  • Changes to data column reconstruction:
    • Attempt reconstruction without holding availability cache lock, so we can process other gossip / rpc data columns simultaneously
    • Check availability cache again before publishing reconstructed columns to avoid publishing excess duplicates
  • Remove some unnecessary RuntimeVariableList conversion for data columns

Additional Info

Below are logs from local testing, notice that the reconstruction blocks processing of incoming data columns (due to holding the availability cache write lock), and they gets ignored as duplicates:

Jun 24 08:01:15.781 DEBG Successfully verified gossip data column sidecar, index: 63, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.280 DEBG Reconstructed columns                   count: 64, service: availability_cache, service: beacon
Jun 24 08:01:16.281 DEBG Writing data_columns to store           count: 128, block_root: 0x690d…0e3e, service: beacon
Jun 24 08:01:16.284 INFO Gossipsub data column processed, imported fully available block, block_root: 0x690d…0e3e
Jun 24 08:01:16.284 DEBG Sending pubsub messages                 topics: [DataColumnSidecar(DataColumnSubnetId(0)), DataColumnSidecar(DataColumnSubnetId(1)), DataColumnSidecar(DataColumnSubnetId(2)) .. DataColumnSidecar(DataColumnSubnetId(31))], count: 64, service: network
Jun 24 08:01:16.314 DEBG Successfully verified gossip data column sidecar, index: 64, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.314 DEBG Ignoring gossip column already imported, data_column_index: 64, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.323 DEBG Successfully verified gossip data column sidecar, index: 65, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.323 DEBG Ignoring gossip column already imported, data_column_index: 65, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.332 DEBG Successfully verified gossip data column sidecar, index: 66, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.332 DEBG Ignoring gossip column already imported, data_column_index: 66, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.341 DEBG Successfully verified gossip data column sidecar, index: 67, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.341 DEBG Ignoring gossip column already imported, data_column_index: 67, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.349 DEBG Successfully verified gossip data column sidecar, index: 68, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.349 DEBG Ignoring gossip column already imported, data_column_index: 68, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.358 DEBG Successfully verified gossip data column sidecar, index: 69, block_root: 0x690d…0e3e, slot: 40556
Jun 24 08:01:16.358 DEBG Ignoring gossip column already imported, data_column_index: 69, block_root: 0x690d4885fe843d085464b4f36b774597525d2dbddcd26ceca95fd4fdf4930e3e
Jun 24 08:01:16.367 DEBG Successfully verified gossip data column sidecar, index: 70, block_root: 0x690d…0e3e, slot: 40556

Copy link

mergify bot commented Jun 25, 2024

⚠️ The sha of the head commit of this PR conflicts with #5986. Mergify cannot evaluate rules on this PR. ⚠️

@jimmygchen jimmygchen added work-in-progress PR is a work-in-progress das Data Availability Sampling labels Jun 25, 2024
@jimmygchen jimmygchen self-assigned this Jun 26, 2024
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jun 28, 2024
@jimmygchen
Copy link
Member Author

Thanks for the review! I've addressed the comments and will start local testing.

@jimmygchen jimmygchen requested a review from dapplion June 28, 2024 04:43
@jimmygchen
Copy link
Member Author

Thanks for the review! Addressed your comments in 7d2d826.

@jimmygchen jimmygchen requested a review from dapplion July 1, 2024 03:45
jimmygchen added a commit that referenced this pull request Jul 1, 2024
Squashed commit of the following:

commit 7d2d826
Author: Jimmy Chen <[email protected]>
Date:   Mon Jul 1 13:44:45 2024 +1000

    Send import results to sync after reconstruction. Add more logging and metrics.

commit 4b30ebe
Merge: f93e2b5 7206909
Author: Jimmy Chen <[email protected]>
Date:   Fri Jun 28 17:23:22 2024 +1000

    Merge branch 'das' into fork/reconstruct-without-blocking-import

commit f93e2b5
Author: Jimmy Chen <[email protected]>
Date:   Fri Jun 28 14:42:04 2024 +1000

    Code cleanup: add type aliases and update comments.

commit 6ac055d
Author: Jimmy Chen <[email protected]>
Date:   Fri Jun 28 14:26:40 2024 +1000

    Revert reconstruction behaviour to always go ahead rather than allowing one at a time. Address other review comments.

commit 1e3964e
Author: Jimmy Chen <[email protected]>
Date:   Tue Jun 25 00:02:19 2024 +1000

    Reconstruct columns without blocking processing and import.
@jimmygchen jimmygchen force-pushed the reconstruct-without-blocking-import branch from db43a2a to 0f355a7 Compare July 1, 2024 05:57
@@ -1044,7 +1017,7 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> {
"block_root" => %block_root,
);

// Potentially trigger reconstruction
self.attempt_data_column_reconstruction(block_root).await;
Copy link
Member Author

@jimmygchen jimmygchen Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this doesn't work, issues:

  1. Still seeing reconstruction blocking gossip data column processing
  2. This change should be made to the cached block instead of the clone:
  3. Because of 1., after the block is imported, I see a bunch of:
Jul 01 06:27:21.130 DEBG Attempted to publish duplicate message  kind: data_column_sidecar_31, service: libp2p

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed #2.

I can't seem to figure out why reconstruction is blocking the processing of other incoming data columns. I don't see any lock being held during reconstruction, perhaps it is just waiting for a beacon processor worker to become available.

We could send it to the back of the beacon processor queue, but I'll do some more testing to make sure this is the case.

@jimmygchen jimmygchen added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Jul 1, 2024
@jimmygchen jimmygchen force-pushed the reconstruct-without-blocking-import branch from 808e84a to 3badc3b Compare July 11, 2024 04:41
@jimmygchen jimmygchen force-pushed the reconstruct-without-blocking-import branch from 3badc3b to efabb98 Compare July 11, 2024 04:53
@jimmygchen jimmygchen mentioned this pull request Jul 16, 2024
52 tasks
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Aug 16, 2024
@dapplion
Copy link
Collaborator

Generally looks good to me! Lots of conflicts, tho

Do you want to port this to unstable?

@jimmygchen
Copy link
Member Author

jimmygchen commented Aug 19, 2024

Thanks, I've just tried to resolve conflicts, but unfortunately I see this is going to conflict a lot with #6268, and it might be easier to implement it once that one is merged, as it has quite a bit of common logic and changes. Since this one is very outdated, I think it make sense to just create a new PR, using the branch from #6268 as base branch.

@jimmygchen jimmygchen closed this Aug 19, 2024
@jimmygchen jimmygchen added do-not-merge and removed ready-for-review The code is ready for review labels Aug 22, 2024
@jimmygchen
Copy link
Member Author

Leaving this PR open so we don't forget this.

@jimmygchen jimmygchen reopened this Aug 22, 2024
@mergify mergify bot deleted the branch sigp:das August 27, 2024 04:10
@mergify mergify bot closed this Aug 27, 2024
@michaelsproul
Copy link
Member

Please update to point at unstable by either:

  1. Rebasing on unstable (if your branch has a small number of commits that are easy to tease out), or
  2. Merging origin/das into this PR: git fetch origin; git merge origin/das. This will result in the smallest number of conflict resolutions and is better for branches that already contain merge commits or have extensive history.

@jimmygchen
Copy link
Member Author

Closing in favour of #6403

@jimmygchen jimmygchen closed this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling das-devnet-1 do-not-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants