Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible consensus bug with proposer slashings #1065

Closed
JustinDrake opened this issue Apr 27, 2020 · 9 comments · Fixed by #1086
Closed

Possible consensus bug with proposer slashings #1065

JustinDrake opened this issue Apr 27, 2020 · 9 comments · Fixed by #1086
Assignees
Labels
bug Something isn't working

Comments

@JustinDrake
Copy link
Contributor

JustinDrake commented Apr 27, 2020

The process_proposer_slashings function has two key steps, namely verifying the proposer slashings in parallel and then calling slash_validator once per proposer_slashing. The parallel proposer slashing verification seems to assume proposer slashings can be verified independently but there is some state dependency in assert is_slashable_validator(proposer, get_current_epoch(state)) (see spec here).

Specifically, if the same (unslashed) validator is put several times in process_proposer_slashings then the second proposer slashing should throw an error. As far as I can tell slash_validator in Lighthouse will not throw an error if the validator is already slashed and process_proposer_slashings will not throw an error if the same validator is slashed several times.

@protolambda
Copy link

The spec has no explicit concurrency, Lighthouse may have optimized it wrong, and the edge case deserves a test. After first slashing, validator.slashed = True, and is-slashable will be False and abort the 2nd slashing.

@protolambda
Copy link

Can confirm the bug;

The operations are verified with a parallel iterator, only reading the state. Rust doesn't complain here. And then the slashings are applied, and the two proposer slashings that were valid on the pre-state, but not the specced intermediate state, would be confirmed.

Maybe it's a good idea to adjust slash_validator in the spec to explicitly check is_slashable_validator, so this can never happen again?

@paulhauner
Copy link
Member

Thanks @JustinDrake, this is a great pickup.

I think this comes from back when I was working on serenity-benches, so I'll take responsibility for this.

We do this type of parallel execution for several block-operations processing functions because we saw a significant gain from doing BLS verification on multiple cores. Now we're doing batch BLS verification and we never* actually verify signatures during per_block_processing/per_state_processing I think we can realistically remove all these par_iter. @michaelsproul keen to hear your thoughts, as always.

*: we still verify signatures during deposits processing. Perhaps keeping only concurrent signature verification here is sensible.

@paulhauner paulhauner added the bug Something isn't working label Apr 27, 2020
@michaelsproul
Copy link
Member

michaelsproul commented Apr 28, 2020

I had a look at attestation processing, and it looks like it's safe because it only verifies the indexed attestations in parallel, and does the actual slashing in series, checking the non-empty slashing condition here:

verify!(!slashable_indices.is_empty(), Invalid::NoSlashableIndices);

For proposer slashings, I agree we could remove the parallel iterator entirely. And I guess we could defensively remove all parallel iterators from state processing?


Oh damn, and I just checked the op pool, which I thought handled this correctly, but it looks like it doesn't :\ It takes great care to avoid including duplicate attester slashings, but just piles in all the proposer slashings that are compatible with a single state:

let proposer_slashings = filter_limit_operations(
self.proposer_slashings.read().values(),
|slashing| {
state
.validators
.get(slashing.signed_header_1.message.proposer_index as usize)
.map_or(false, |validator| !validator.slashed)
},
T::MaxProposerSlashings::to_usize(),
);

@JustinDrake
Copy link
Contributor Author

I think we can realistically remove all these par_iter

In this particular instance it may be wise to remove the par_iter especially if the performance gains are small. Having said that, as more of a meta comment for the codebase as a whole, I quite like that you guys are going "beyond" the spec to squeeze out performance :)

@JustinDrake
Copy link
Contributor Author

As a meta question for @zedt3ster, why didn't the fuzzing infrastructure catch this bug or the recent consensus bug in Prysm regarding reward calculations?

@protolambda
Copy link

protolambda commented Apr 28, 2020

@JustinDrake Prysm introduced the bug just a few days before testnet launch with a last-minute v0.11 update. Can't catch something with fuzzing when the fuzzer doesn't run the code. But I'm sure it could have, if it were up to date.
For lighthouse, it's more questionable. Maybe the double proposer slashing is too specific to find with the fuzzer. The fuzzer would need to figure out to duplicate the exact slashing operation, not impossible though.

@zedt3ster
Copy link
Member

So our beacon fuzz fuzzers were targeting v0.10 up until a few days ago. As mentioned by @protolambda, the code that was instrumented didn't include the bug in question. We've also ran into multiple issues with the Prysm build (that are now resolved) which prevented us from integrating them into our continuous fuzzing.

We're currently revamping the differential fuzzer completely to include new mutation engines and leveraging structural fuzzing. In fact, we just identified an SSZ decoding/processing bug in Prysm. The differential part of the fuzzers is currently "on hold", we'll be sharing a new design this week - this type of bug could definitely be picked up in the future.

@michaelsproul
Copy link
Member

I'm going to fix this

michaelsproul added a commit that referenced this issue Apr 29, 2020
Remove parallelism for proposer slashing verification.

Closes #1065
michaelsproul added a commit that referenced this issue Apr 30, 2020
Remove parallelism from proposer slashing verification.

Closes #1065
michaelsproul added a commit that referenced this issue Apr 30, 2020
Remove parallelism from proposer slashing verification.

Closes #1065
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants