Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

Batch execution with single execution adapter #818

Merged
merged 6 commits into from
Aug 19, 2022

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Aug 18, 2022

This PR is a more aggressive alternative for #815

The basic motivation is the same: let the executor know about every certificate, whether it's empty or not. But instead of having a different method for empty certificates, this changes the executor::Core to expect a new BatchExecutionState which takes the whole certificate with all the batches it includes. It takes SerializedTransaction so it can decide how to deal with deserialization errors.

The trait also has a load_next_certificate_index method to support replay (which we discussed @asonnino is currently missing). The replay is expected to happen from the index it returns, and it's the responsibility of the state to skip any transactions it has already processed, if the whole certificate wasn't processed atomically.

For backwards compatibility, the the PR also has a SingleExecutionState that has the same method that takes singular transactions, and a SingleExecutor adapter which maintains the ExecutionIndices for it. This way you don't have to change Sui too much, only in the application root where the execution state is instantiated.

The SingleExecutor also takes over the responsiblity of feeding the output channel, that is, it feeds the pairs of (outcome, transaction_bytes) to any observers. This didn't seem to make sense for the batch execution state, which could have extra outcome without transactions. Arguably the execution state itself can forward its own output to where it has to go, so threading the tx_confirmation or tx_output through the whole system seems to serve vestigial purposes; and it's also easy to do using some adapter that inspects the return value.

There is a slight discrepancy between the BatchExecutionState and the SingleExecutionState in that the latter takes a deserialized Transaction type, rather than bytes. The SingleExecutor preserves the current behaviour of reporting outcome even on malformed transactions, and only passing valid ones to the execution state. It also assumes bincode is used.

Overall I think this is a cleaner solution than the previous PR, and it makes it easier to expand the solution to patterns I mentioned there; for example to enact committee changes at the end of (potentially empty) batches based on how many certificates we have seen, rather than upon individual transactions.

Resolves MystenLabs/sui#5322

@aakoshh aakoshh marked this pull request as ready for review August 18, 2022 20:04
@aakoshh aakoshh requested a review from asonnino as a code owner August 18, 2022 20:04

/// Execute the transactions and atomically persist the consensus index.
///
/// TODO: This function should be allowed to return a new committee to reconfigure the system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconfiguration can simply be done by sending the following message to the Primary and the Workers through the network: ReconfigureNotification::NewCommittee(new_committee).

Below is an example:

let message = PrimaryWorkerMessage::Reconfigure(ReconfigureNotification::Shutdown);
let worker_cancel_handles = worker_network
.unreliable_broadcast(addresses, &message)
.await;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link! True, we discussed that we can send reconfiguration messages through gRPC.

I included this TODO because the current docs of the handle_consensus_transaction say that it can return a new committee. While that's not true, as it clearly cannot, I thought the idea that it should do so was a good one. It capture committee transitions on the level of the ExecutionState, if it is to happen as part of transaction execution. The gRPC way seems like a technical detail that the implementors of the ExecutionState should not have to concern themselves with.

But it's up to you, I just wanted it to be consistent with the other docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, let's leave the TODO then (we haven't fully deployed Narwhal reconfiguration in DevNet yet, so this part may still change)

))),
},
)
.collect::<Result<Vec<_>, _>>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we collect a Vec rather than a Result so that we can skip faulty transactions rather than the whole batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this while debugging one of the unit tests that failed to deserialise (the compiler decided to send i32 instead of u64). I think this is a bit of a grey area. I see that you label the error as ClientExecutionError which, during execution, is treated as a non-fatal one. However, it's not clear whether treating deseralisation errors as non-fatal would be correct.

That's because it's difficult to say why we can't deserialise a transaction: Is it because some malicious validator put a malformed message into one of the batches? Or is it because the Transaction type on our side is somehow incorrect, and our machine is the only one that fails to deserialize something? Perhaps other validators have added an extra variant to an enum, but we forgot to update our node? Skipping such transactions could lead to consensus failure down the road.

What we should do depends on how much pre-validation happens on the contents before quorum is reached over their availability. Can malicious validators use Narwhal to atomically broadcast absolute rubbish content? Or is there some vetting before we vote on it to see if it at least conforms to some basic expectations about format?

If there is such validation, then we could be sure that the honest majority thinks this transaction looks legit, so if we can't deserialize it, we should stop and fix our software. If there is no such validation, then we can't decide whose fault is it, ours or the batcher's, in which case I don't know what to do.

With that in mind, do you want to just issue a warning and skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to skipping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, changed it to return raw bytes at this stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. Validators can input any rubbish as transactions, there are no checks on transactions format upon voting. Before voting, validators simply verify that the payload is available (and that the header doesn't break any safety or GC rules).

You are right that if the transaction serialisation format changes and some validators are late to the party, then it may be a safety problem. Should we open an issue for that and see what others think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening an issue can't hurt, at least there would be a story about it it. Someone using this library might want to protect themselves from the clients sending invalid content. If it's encoded into a trait, anyone happy with raw bytes can trivially allow everything in.

executor/src/core.rs Show resolved Hide resolved
executor/src/core.rs Show resolved Hide resolved
let transactions = transactions
.into_iter()
.map(
|serialized| match bincode::deserialize::<State::Transaction>(&serialized) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is annoying we can also do the deserialisation of transactions without the ExecutorState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to push the deserialization into the ExecutorState implementation itself?

Like I said I think letting the system know about the Transaction type is a good thing, it keeps your options open for hardening in the future by rejecting malicious input at the peripheries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highlighted another change in the PR description, which is that tx_output will only contain the outcome for transactions that we managed to at least deserialize (previously it had raw bytes, now it's the model). It doesn't look like this would be a problem though, because you weren't able to tell the difference in the errors anyway, so you didn't now whether the raw bytes were legit without trying to deserialize again.

Let me know, though, because pushing deserialization into the state itself could restore full freedom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it out would also put the decision of what serialization to use back with the application, ie. bincode would not be prescribed by this library. But you can achieve this by adding your own traits as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a change which moves the deserialisation into the SingleExecutor, to restore the current behaviour of reporting the outcome of all transactions, even if they are malformed. The BatchExecuttionState now takes raw transactions, and it's up to the application to decide how much of it to process, and what deserialisation scheme to use.

@aakoshh aakoshh requested a review from asonnino August 19, 2022 12:50
@asonnino asonnino merged commit 0305170 into MystenLabs:main Aug 19, 2022
@aakoshh aakoshh deleted the batch-and-single-exec branch August 19, 2022 18:31
huitseeker added a commit to huitseeker/sui that referenced this pull request Aug 23, 2022
This is MystenLabs#4219, but it updates the Narwhal pointer to the commit immediately preceding
MystenLabs/narwhal#818

and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.
huitseeker added a commit to MystenLabs/sui that referenced this pull request Aug 23, 2022
This is #4219, but it updates the Narwhal pointer to the commit immediately preceding
MystenLabs/narwhal#818

and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.
huitseeker added a commit that referenced this pull request Aug 24, 2022
This PR requires adaptations to Sui, see MystenLabs/sui#4219 (comment) for details.
We've run out of time to land those changes and need to update the NW commit in Sui.
huitseeker added a commit to huitseeker/narwhal that referenced this pull request Aug 29, 2022
huitseeker added a commit to huitseeker/narwhal that referenced this pull request Aug 29, 2022
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 30, 2022
Batch execution with single execution adapter
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass empty batches/transactions to ExecutorState
2 participants