fix(swingset): don't deduplicate inbound mailbox messages #3492
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The mailbox device tracking the highest inbound message number and ack for
each peer, to de-duplicate repeated messages, so it could reduce the amount
of kernel activity. Each call to
deliverInbound
would return a boolean toindicate whether the messages/ack were new, and thus the kernel needed to be
cycled.
However, the device was holding this tracking data in non-durable state, so
if/when the kernel was restarted, the state would be lost. A duplicate
message/ack arriving in the restarted process would trigger kernel activity
that would not have run in the original process. These extra cranks caused
diverge between validators when one of them was restarted, and the client
sent a duplicate message (such as the pre-emptive
ack
all clients send atstartup). The extra crank does not get very far, because vattp does its own
deduplication, so the divergence was only visible in the slog. But when #3442
is implemented, even a single extra crank will flag the validator as out of
consensus.
The fix is to remove the mailbox device's dedup code, and rely upon vattp for
this function. The test was also updated to match, and a new test (comparing
two parallel kernels, one restarted, one not) was added.
closes #3471