-
Notifications
You must be signed in to change notification settings - Fork 67
Batch execution with single execution adapter #818
Conversation
|
||
/// Execute the transactions and atomically persist the consensus index. | ||
/// | ||
/// TODO: This function should be allowed to return a new committee to reconfigure the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconfiguration can simply be done by sending the following message to the Primary and the Workers through the network: ReconfigureNotification::NewCommittee(new_committee)
.
Below is an example:
Lines 98 to 101 in ad05db4
let message = PrimaryWorkerMessage::Reconfigure(ReconfigureNotification::Shutdown); | |
let worker_cancel_handles = worker_network | |
.unreliable_broadcast(addresses, &message) | |
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link! True, we discussed that we can send reconfiguration messages through gRPC.
I included this TODO because the current docs of the handle_consensus_transaction
say that it can return a new committee. While that's not true, as it clearly cannot, I thought the idea that it should do so was a good one. It capture committee transitions on the level of the ExecutionState
, if it is to happen as part of transaction execution. The gRPC way seems like a technical detail that the implementors of the ExecutionState
should not have to concern themselves with.
But it's up to you, I just wanted it to be consistent with the other docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, let's leave the TODO then (we haven't fully deployed Narwhal reconfiguration in DevNet yet, so this part may still change)
executor/src/core.rs
Outdated
))), | ||
}, | ||
) | ||
.collect::<Result<Vec<_>, _>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we collect a Vec rather than a Result so that we can skip faulty transactions rather than the whole batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this while debugging one of the unit tests that failed to deserialise (the compiler decided to send i32
instead of u64
). I think this is a bit of a grey area. I see that you label the error as ClientExecutionError
which, during execution, is treated as a non-fatal one. However, it's not clear whether treating deseralisation errors as non-fatal would be correct.
That's because it's difficult to say why we can't deserialise a transaction: Is it because some malicious validator put a malformed message into one of the batches? Or is it because the Transaction
type on our side is somehow incorrect, and our machine is the only one that fails to deserialize something? Perhaps other validators have added an extra variant to an enum, but we forgot to update our node? Skipping such transactions could lead to consensus failure down the road.
What we should do depends on how much pre-validation happens on the contents before quorum is reached over their availability. Can malicious validators use Narwhal to atomically broadcast absolute rubbish content? Or is there some vetting before we vote on it to see if it at least conforms to some basic expectations about format?
If there is such validation, then we could be sure that the honest majority thinks this transaction looks legit, so if we can't deserialize it, we should stop and fix our software. If there is no such validation, then we can't decide whose fault is it, ours or the batcher's, in which case I don't know what to do.
With that in mind, do you want to just issue a warning and skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it back to skipping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, changed it to return raw bytes at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point. Validators can input any rubbish as transactions, there are no checks on transactions format upon voting. Before voting, validators simply verify that the payload is available (and that the header doesn't break any safety or GC rules).
You are right that if the transaction serialisation format changes and some validators are late to the party, then it may be a safety problem. Should we open an issue for that and see what others think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening an issue can't hurt, at least there would be a story about it it. Someone using this library might want to protect themselves from the clients sending invalid content. If it's encoded into a trait, anyone happy with raw bytes can trivially allow everything in.
executor/src/core.rs
Outdated
let transactions = transactions | ||
.into_iter() | ||
.map( | ||
|serialized| match bincode::deserialize::<State::Transaction>(&serialized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is annoying we can also do the deserialisation of transactions without the ExecutorState?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to push the deserialization into the ExecutorState
implementation itself?
Like I said I think letting the system know about the Transaction
type is a good thing, it keeps your options open for hardening in the future by rejecting malicious input at the peripheries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highlighted another change in the PR description, which is that tx_output
will only contain the outcome for transactions that we managed to at least deserialize (previously it had raw bytes, now it's the model). It doesn't look like this would be a problem though, because you weren't able to tell the difference in the errors anyway, so you didn't now whether the raw bytes were legit without trying to deserialize again.
Let me know, though, because pushing deserialization into the state itself could restore full freedom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving it out would also put the decision of what serialization to use back with the application, ie. bincode
would not be prescribed by this library. But you can achieve this by adding your own traits as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a change which moves the deserialisation into the SingleExecutor
, to restore the current behaviour of reporting the outcome of all transactions, even if they are malformed. The BatchExecuttionState
now takes raw transactions, and it's up to the application to decide how much of it to process, and what deserialisation scheme to use.
This is MystenLabs#4219, but it updates the Narwhal pointer to the commit immediately preceding MystenLabs/narwhal#818 and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.
This is #4219, but it updates the Narwhal pointer to the commit immediately preceding MystenLabs/narwhal#818 and therefore avoids the compatibility issues brought by the refactoring of the Sui/NW interface.
This PR requires adaptations to Sui, see MystenLabs/sui#4219 (comment) for details. We've run out of time to land those changes and need to update the NW commit in Sui.
Batch execution with single execution adapter
This PR is a more aggressive alternative for #815
The basic motivation is the same: let the executor know about every certificate, whether it's empty or not. But instead of having a different method for empty certificates, this changes the
executor::Core
to expect a newBatchExecutionState
which takes the whole certificate with all the batches it includes. It takesSerializedTransaction
so it can decide how to deal with deserialization errors.The trait also has a
load_next_certificate_index
method to support replay (which we discussed @asonnino is currently missing). The replay is expected to happen from the index it returns, and it's the responsibility of the state to skip any transactions it has already processed, if the whole certificate wasn't processed atomically.For backwards compatibility, the the PR also has a
SingleExecutionState
that has the same method that takes singular transactions, and aSingleExecutor
adapter which maintains theExecutionIndices
for it. This way you don't have to change Sui too much, only in the application root where the execution state is instantiated.The
SingleExecutor
also takes over the responsiblity of feeding the output channel, that is, it feeds the pairs of(outcome, transaction_bytes)
to any observers. This didn't seem to make sense for the batch execution state, which could have extra outcome without transactions. Arguably the execution state itself can forward its own output to where it has to go, so threading thetx_confirmation
ortx_output
through the whole system seems to serve vestigial purposes; and it's also easy to do using some adapter that inspects the return value.There is a slight discrepancy between the
BatchExecutionState
and theSingleExecutionState
in that the latter takes a deserializedTransaction
type, rather than bytes. TheSingleExecutor
preserves the current behaviour of reporting outcome even on malformed transactions, and only passing valid ones to the execution state. It also assumesbincode
is used.Overall I think this is a cleaner solution than the previous PR, and it makes it easier to expand the solution to patterns I mentioned there; for example to enact committee changes at the end of (potentially empty) batches based on how many certificates we have seen, rather than upon individual transactions.
Resolves MystenLabs/sui#5322