Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce NW BatchV2 in Protocol Version 12 & Pipe ProtocolConfig into NW #12178

Merged
merged 5 commits into from
May 30, 2023

Conversation

arun-koshy
Copy link
Contributor

@arun-koshy arun-koshy commented May 24, 2023

Description

This is attempt 2 of getting protocol config into narwhal. Previous attempt (PR#11519) had to be reverted because mismatched batch versions were causing the validators to panic. The issue was that we were not handling the protocol upgrade correctly. ProtocolConfig was passed in when NarwhalManager was created but on epoch change monitor_reconfiguration does not recreate NarwhalManager if it still has a handle on the ValidatorComponents but rather just calls start from the existing NarwhalManager. This is not a problem for NarwhalConfiguration parameters which is also only passed in on NarwhalManager creation because the moment the node is restarted for a binary update the parameters take effect. However in the case of protocol upgrades ProtocolConfig is only updated on the following epoch.

For example if the validator restarts to update its binary from version N to version N+1, NarwhalManager would be constructed with ProtocolConfig at version N (not N+1) because we need to ensure we have a majority quorum before actually going to version N+1 which happens at epoch change. On epoch change because the node still has a handle on ValidatorComponents it just starts Narwhal with the existing NarwhalManager and ends up using ProtocolConfig at version N which is the root cause of the issues.

To resolve this the following changes were added to the last PR to fix the issue and make it more robust

  • Filter the received or fetched batches to ensure the supported batch version is received.
  • Pass ProtocolConfig to NarwhalManager on start and not on creation

Test Plan

Added unit tests & tested protocol upgrade in labnet from mainnet release branch to v12


Type of Change (Check all that apply)

  • protocol change
  • user-visible impact
  • breaking change for a client SDKs
  • breaking change for FNs (FN binary must upgrade)
  • breaking change for validators or node operators (must upgrade binaries)
  • breaking change for on-chain data layout
  • necessitate either a data wipe or data migration

Release notes

Start using BatchV2 in Narwhal which introduces VersionedMetadata that allows for more granular tracking of NW batch execution latency.

@vercel
Copy link

vercel bot commented May 24, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
offline-signer-helper ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 30, 2023 5:15pm
4 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) May 30, 2023 5:15pm
explorer-storybook ⬜️ Ignored (Inspect) May 30, 2023 5:15pm
sui-wallet-kit ⬜️ Ignored (Inspect) May 30, 2023 5:15pm
wallet-adapter ⬜️ Ignored (Inspect) May 30, 2023 5:15pm

Copy link
Member

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at batch_fetcher and batch_maker

narwhal/types/src/primary.rs Show resolved Hide resolved
crates/sui-core/src/consensus_validator.rs Show resolved Hide resolved
crates/sui-protocol-config/src/lib.rs Outdated Show resolved Hide resolved
narwhal/test-utils/src/lib.rs Outdated Show resolved Hide resolved
narwhal/executor/src/metrics.rs Outdated Show resolved Hide resolved
narwhal/node/src/worker_node.rs Outdated Show resolved Hide resolved
narwhal/types/src/primary.rs Outdated Show resolved Hide resolved
narwhal/types/src/primary.rs Outdated Show resolved Hide resolved
narwhal/types/src/primary.rs Show resolved Hide resolved
narwhal/worker/src/handlers.rs Outdated Show resolved Hide resolved
Copy link
Member

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

// TODO: Remove once we have upgraded to protocol version 12.
if self.protocol_config.narwhal_versioned_metadata() {
// Set received_at timestamp for remote batches.
let mut updated_new_batches = HashMap::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make new_batches mut and update them inplace? It would avoid the copy and maybe a bit of duplicated logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had issues trying to do this before which is why I had to do it this way, but I will follow up in a separate PR if I can get it to work in place

Copy link
Contributor

@akichidis akichidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As we discussed offline let's also confirm that the current cross-epoch network protection mechanism works which should have made us avoid the additional epoch changes.

@petvaizAkhtar
Copy link

Reply

crates/sui-core/src/narwhal_manager/mod.rs Outdated Show resolved Hide resolved
crates/sui-core/src/narwhal_manager/mod.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants