Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle block invariant checks during replay appropriately and in particular ensure --force-all-checks is respected #693

Closed
arhag opened this issue Sep 3, 2024 · 0 comments · Fixed by #715, #721, #735 or #740
Assignees
Labels
bug The product is not working as was intended. 👍 lgtm
Milestone

Comments

@arhag
Copy link
Member

arhag commented Sep 3, 2024

The bug in #691 revealed that there is a more sensible way of handling the checks for block invariants involving QCs.

Specifically, verify_qc_claim was trying to do too much, and it revealed that important block invariants were not being checked for Legacy and Transition blocks.

Those invariants were captured in the issue description of #691.

In particular there were four specific categories of block invariant validation:

  1. Validations for Legacy blocks
  2. Validations for Transition blocks
  3. Basic validations for Proper Savanna blocks
  4. Complete QC validation for Proper Savanna blocks that have a QC block extension present

The changes of #691 are now validating all of these categories of block invariants (as appropriate for the specific block type).

Additionally, there was some clean up in the functions that handle these different categories of validation. However, the Basic validation for Proper Savanna blocks (which is cheap to validation) and the complete QC validation for Proper Savanna blocks that have a QC block extension present (which is more computationally expensive to validate) were still both carried out (in order) in the verify_qc_claim function.

This monolithic approach to checking the invariants of a Proper Savanna block almost led to a bug where posting the verify_qc_claim check to a separate thread (as an optimization) would have prevented a check for a critical block invariant that was assumed as a pre-condition in later parts of the code that was indirectly called when constructing the block_state; see https://github.com/AntelopeIO/spring/pull/538/files#r1717405576.

So in addition to any other (protocol compatible) refactoring of these different categories of checks within the implementation that may be desired to do for code clarity reasons, this issue may present an opportunity (for reasons that will be more clear when the issue gets into the actual bug described by this issue) to separate out the checks for basic validation for Proper Savanna blocks from the complete QC validation for Proper Savanna blocks that have a QC block extension present. And maybe even to go further and optimize the validation by carrying out the complete QC validation for Proper Savanna blocks that have a QC block extension present in parallel with the construction of the block_state. However, it is still critical to finish the basic validation for Proper Savanna blocks prior to starting the construction the block_state and to finish the complete QC validation for Proper Savanna blocks that have a QC block extension present prior to calling integrate_received_qc_to_block.

But the main goal of this issue is to address a bug with the way these block invariant checks are handled during replay.

For optimization reasons, it is desirable to not do the complete QC validation during replay normally since it is computationally expensive. It is debatable whether checking block invariants of Legacy blocks are useful to do during replay (but they should also be cheap). But the block_state construction that needs to happen during replay assumes some pre-conditions that must be true and that are only validated when checking the block invariants of Transition blocks and doing the basic checks for Proper Savanna blocks. The pre-conditions would only be checked if nodeos is compiled in DEBUG mode; otherwise undefined behavior can happen. So it would be wise to not simply assume that the block logs on disk are correct and instead always do those checks during replay. And since the block invariant checks of Legacy blocks are cheap as well, it makes sense to do always do all the block invariant checks of Legacy, Transition, and Proper Savanna blocks (other than the complete QC validation) during replay.

The complete QC validation should not normally be done during replay. But there is an additional bug where the complete QC validation is not done during a replay with --force-all-checks. The --force-all-checks option is there so that the operator can do a replay without having any trust in the integrity of the blocks log they have in their data folder. It should be no different in terms of trust than if they received the blocks from their peers in the P2P network. So it is critical to actually validate the QCs attached in the blocks (along with all other block validation checks that are normally done when syncing) when running a replay with --force-all-checks.

@arhag arhag added bug The product is not working as was intended. 👍 lgtm labels Sep 3, 2024
@arhag arhag added this to the Spring v1.0.1 milestone Sep 3, 2024
@enf-ci-bot enf-ci-bot moved this to Todo in Team Backlog Sep 3, 2024
@arhag arhag removed the triage label Sep 3, 2024
@arhag arhag changed the title Always check basic block invariants involving QCs for all block types; handle replay with --force-all-checks appropriately Handle replay with --force-all-checks appropriately Sep 4, 2024
@arhag arhag changed the title Handle replay with --force-all-checks appropriately Handle block invariant checks during replay appropriately and in particular ensure --force-all-checks is respected Sep 4, 2024
@linh2931 linh2931 moved this from Todo to In Progress in Team Backlog Sep 5, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Team Backlog Sep 9, 2024
@arhag arhag linked a pull request Sep 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment