Batch execution with single execution adapter #818

aakoshh · 2022-08-18T20:04:11Z

This PR is a more aggressive alternative for #815

The basic motivation is the same: let the executor know about every certificate, whether it's empty or not. But instead of having a different method for empty certificates, this changes the executor::Core to expect a new BatchExecutionState which takes the whole certificate with all the batches it includes. It takes SerializedTransaction so it can decide how to deal with deserialization errors.

The trait also has a load_next_certificate_index method to support replay (which we discussed @asonnino is currently missing). The replay is expected to happen from the index it returns, and it's the responsibility of the state to skip any transactions it has already processed, if the whole certificate wasn't processed atomically.

For backwards compatibility, the the PR also has a SingleExecutionState that has the same method that takes singular transactions, and a SingleExecutor adapter which maintains the ExecutionIndices for it. This way you don't have to change Sui too much, only in the application root where the execution state is instantiated.

The SingleExecutor also takes over the responsiblity of feeding the output channel, that is, it feeds the pairs of (outcome, transaction_bytes) to any observers. This didn't seem to make sense for the batch execution state, which could have extra outcome without transactions. Arguably the execution state itself can forward its own output to where it has to go, so threading the tx_confirmation or tx_output through the whole system seems to serve vestigial purposes; and it's also easy to do using some adapter that inspects the return value.

There is a slight discrepancy between the BatchExecutionState and the SingleExecutionState in that the latter takes a deserialized Transaction type, rather than bytes. The SingleExecutor preserves the current behaviour of reporting outcome even on malformed transactions, and only passing valid ones to the execution state. It also assumes bincode is used.

Overall I think this is a cleaner solution than the previous PR, and it makes it easier to expand the solution to patterns I mentioned there; for example to enact committee changes at the end of (potentially empty) batches based on how many certificates we have seen, rather than upon individual transactions.

Resolves MystenLabs/sui#5322

asonnino · 2022-08-18T21:15:50Z

executor/src/lib.rs

+
+    /// Execute the transactions and atomically persist the consensus index.
+    ///
+    /// TODO: This function should be allowed to return a new committee to reconfigure the system.


Reconfiguration can simply be done by sending the following message to the Primary and the Workers through the network: ReconfigureNotification::NewCommittee(new_committee).

Below is an example:

narwhal/node/src/restarter.rs

Lines 98 to 101 in ad05db4

let message = PrimaryWorkerMessage::Reconfigure(ReconfigureNotification::Shutdown);

let worker_cancel_handles = worker_network

.unreliable_broadcast(addresses, &message)

.await;

Thanks for the link! True, we discussed that we can send reconfiguration messages through gRPC.

I included this TODO because the current docs of the handle_consensus_transaction say that it can return a new committee. While that's not true, as it clearly cannot, I thought the idea that it should do so was a good one. It capture committee transitions on the level of the ExecutionState, if it is to happen as part of transaction execution. The gRPC way seems like a technical detail that the implementors of the ExecutionState should not have to concern themselves with.

But it's up to you, I just wanted it to be consistent with the other docs.

Cool, let's leave the TODO then (we haven't fully deployed Narwhal reconfiguration in DevNet yet, so this part may still change)

asonnino · 2022-08-18T21:25:21Z

executor/src/core.rs

+                    ))),
+                },
+            )
+            .collect::<Result<Vec<_>, _>>()?;


Could we collect a Vec rather than a Result so that we can skip faulty transactions rather than the whole batch?

I thought about this while debugging one of the unit tests that failed to deserialise (the compiler decided to send i32 instead of u64). I think this is a bit of a grey area. I see that you label the error as ClientExecutionError which, during execution, is treated as a non-fatal one. However, it's not clear whether treating deseralisation errors as non-fatal would be correct.

That's because it's difficult to say why we can't deserialise a transaction: Is it because some malicious validator put a malformed message into one of the batches? Or is it because the Transaction type on our side is somehow incorrect, and our machine is the only one that fails to deserialize something? Perhaps other validators have added an extra variant to an enum, but we forgot to update our node? Skipping such transactions could lead to consensus failure down the road.

What we should do depends on how much pre-validation happens on the contents before quorum is reached over their availability. Can malicious validators use Narwhal to atomically broadcast absolute rubbish content? Or is there some vetting before we vote on it to see if it at least conforms to some basic expectations about format?

If there is such validation, then we could be sure that the honest majority thinks this transaction looks legit, so if we can't deserialize it, we should stop and fix our software. If there is no such validation, then we can't decide whose fault is it, ours or the batcher's, in which case I don't know what to do.

With that in mind, do you want to just issue a warning and skip?

Changed it back to skipping.

On further thought, changed it to return raw bytes at this stage.

It's a good point. Validators can input any rubbish as transactions, there are no checks on transactions format upon voting. Before voting, validators simply verify that the payload is available (and that the header doesn't break any safety or GC rules).

You are right that if the transaction serialisation format changes and some validators are late to the party, then it may be a safety problem. Should we open an issue for that and see what others think about it?

Opening an issue can't hurt, at least there would be a story about it it. Someone using this library might want to protect themselves from the clients sending invalid content. If it's encoded into a trait, anyone happy with raw bytes can trivially allow everything in.

executor/src/core.rs

asonnino · 2022-08-18T21:39:17Z

executor/src/core.rs

+        let transactions = transactions
+            .into_iter()
+            .map(
+                |serialized| match bincode::deserialize::<State::Transaction>(&serialized) {


If this is annoying we can also do the deserialisation of transactions without the ExecutorState?

You mean to push the deserialization into the ExecutorState implementation itself?

Like I said I think letting the system know about the Transaction type is a good thing, it keeps your options open for hardening in the future by rejecting malicious input at the peripheries.

I highlighted another change in the PR description, which is that tx_output will only contain the outcome for transactions that we managed to at least deserialize (previously it had raw bytes, now it's the model). It doesn't look like this would be a problem though, because you weren't able to tell the difference in the errors anyway, so you didn't now whether the raw bytes were legit without trying to deserialize again.

Let me know, though, because pushing deserialization into the state itself could restore full freedom.

Moving it out would also put the decision of what serialization to use back with the application, ie. bincode would not be prescribed by this library. But you can achieve this by adding your own traits as well.

Pushed a change which moves the deserialisation into the SingleExecutor, to restore the current behaviour of reporting the outcome of all transactions, even if they are malformed. The BatchExecuttionState now takes raw transactions, and it's up to the application to decide how much of it to process, and what deserialisation scheme to use.

…s an output report.

This is MystenLabs#4219, but it updates the Narwhal pointer to the commit immediately preceding MystenLabs/narwhal#818 and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.

This is #4219, but it updates the Narwhal pointer to the commit immediately preceding MystenLabs/narwhal#818 and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.

This PR requires adaptations to Sui, see MystenLabs/sui#4219 (comment) for details. We've run out of time to land those changes and need to update the NW commit in Sui.

…enLabs#818)

Batch execution with single execution adapter

…enLabs/narwhal#818)

aakoshh added 4 commits August 18, 2022 14:04

Add ExecutionState::handle_consensus_without_transactions

5e8dfb8

Test that the empty handler is called.

5f3da1f

Separate out a BatchExecutionState and a SingleExecutionState

cc8e918

Merge remote-tracking branch 'origin/main' into batch-and-single-exec

6432daa

aakoshh marked this pull request as ready for review August 18, 2022 20:04

aakoshh requested a review from asonnino as a code owner August 18, 2022 20:04

aakoshh mentioned this pull request Aug 18, 2022

Add ExecutionState::handle_consensus_without_transactions #815

Closed

asonnino reviewed Aug 18, 2022

View reviewed changes

aakoshh added 2 commits August 19, 2022 10:32

Skip transactions that cannot be deserialized.

c89ce20

Push deserialization into the SingleExecutor so every transaction get…

054569c

…s an output report.

aakoshh requested a review from asonnino August 19, 2022 12:50

asonnino approved these changes Aug 19, 2022

View reviewed changes

asonnino merged commit 0305170 into MystenLabs:main Aug 19, 2022

aakoshh deleted the batch-and-single-exec branch August 19, 2022 18:31

joyqvq mentioned this pull request Aug 22, 2022

chore: upgrade narwhal pointer MystenLabs/sui#4219

Closed

huitseeker mentioned this pull request Aug 23, 2022

chore(deps): update the Narwhal pointer (fast/easy version) MystenLabs/sui#4229

Merged

adlrocha mentioned this pull request Aug 23, 2022

🚧 | B3: Turn B1 into production-ready system for core actors and FVM consensus-shipyard/consensuslab#6

Closed

huitseeker mentioned this pull request Aug 25, 2022

Devnet 0.7.1 rc (for CI) #847

Merged

huitseeker added a commit to huitseeker/narwhal that referenced this pull request Aug 29, 2022

revert: 0305170 - Batch execution with single execution adapter (Myst…

2c58436

…enLabs#818)

huitseeker mentioned this pull request Aug 29, 2022

Revert 818 + increase the batch timeout #859

Merged

huitseeker added a commit to huitseeker/narwhal that referenced this pull request Aug 29, 2022

revert: 0305170 - Batch execution with single execution adapter (Myst…

925fd7d

…enLabs#818)

huitseeker added a commit that referenced this pull request Aug 30, 2022

revert: 0305170 - Batch execution with single execution adapter (#818)

6b65680

mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 30, 2022

Batch execution with single execution adapter (MystenLabs/narwhal#818)

f78a4c5

Batch execution with single execution adapter

mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 30, 2022

revert: f78a4c5 - Batch execution with single execution adapter (Myst…

8f1626a

…enLabs/narwhal#818)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch execution with single execution adapter #818

Batch execution with single execution adapter #818

aakoshh commented Aug 18, 2022 •

edited

Loading

asonnino Aug 18, 2022

aakoshh Aug 19, 2022

asonnino Aug 19, 2022

asonnino Aug 18, 2022

aakoshh Aug 19, 2022

aakoshh Aug 19, 2022

aakoshh Aug 19, 2022

asonnino Aug 19, 2022

aakoshh Aug 19, 2022

asonnino Aug 18, 2022

aakoshh Aug 19, 2022

aakoshh Aug 19, 2022

aakoshh Aug 19, 2022

aakoshh Aug 19, 2022

	let message = PrimaryWorkerMessage::Reconfigure(ReconfigureNotification::Shutdown);
	let worker_cancel_handles = worker_network
	.unreliable_broadcast(addresses, &message)
	.await;

Batch execution with single execution adapter #818

Batch execution with single execution adapter #818

Conversation

aakoshh commented Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aakoshh commented Aug 18, 2022 •

edited

Loading