You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bug in #691 revealed that there is a more sensible way of handling the checks for block invariants involving QCs.
Specifically, verify_qc_claim was trying to do too much, and it revealed that important block invariants were not being checked for Legacy and Transition blocks.
Those invariants were captured in the issue description of #691.
In particular there were four specific categories of block invariant validation:
Validations for Legacy blocks
Validations for Transition blocks
Basic validations for Proper Savanna blocks
Complete QC validation for Proper Savanna blocks that have a QC block extension present
The changes of #691 are now validating all of these categories of block invariants (as appropriate for the specific block type).
Additionally, there was some clean up in the functions that handle these different categories of validation. However, the Basic validation for Proper Savanna blocks (which is cheap to validation) and the complete QC validation for Proper Savanna blocks that have a QC block extension present (which is more computationally expensive to validate) were still both carried out (in order) in the verify_qc_claim function.
This monolithic approach to checking the invariants of a Proper Savanna block almost led to a bug where posting the verify_qc_claim check to a separate thread (as an optimization) would have prevented a check for a critical block invariant that was assumed as a pre-condition in later parts of the code that was indirectly called when constructing the block_state; see https://github.com/AntelopeIO/spring/pull/538/files#r1717405576.
So in addition to any other (protocol compatible) refactoring of these different categories of checks within the implementation that may be desired to do for code clarity reasons, this issue may present an opportunity (for reasons that will be more clear when the issue gets into the actual bug described by this issue) to separate out the checks for basic validation for Proper Savanna blocks from the complete QC validation for Proper Savanna blocks that have a QC block extension present. And maybe even to go further and optimize the validation by carrying out the complete QC validation for Proper Savanna blocks that have a QC block extension present in parallel with the construction of the block_state. However, it is still critical to finish the basic validation for Proper Savanna blocks prior to starting the construction the block_state and to finish the complete QC validation for Proper Savanna blocks that have a QC block extension present prior to calling integrate_received_qc_to_block.
But the main goal of this issue is to address a bug with the way these block invariant checks are handled during replay.
For optimization reasons, it is desirable to not do the complete QC validation during replay normally since it is computationally expensive. It is debatable whether checking block invariants of Legacy blocks are useful to do during replay (but they should also be cheap). But the block_state construction that needs to happen during replay assumes some pre-conditions that must be true and that are only validated when checking the block invariants of Transition blocks and doing the basic checks for Proper Savanna blocks. The pre-conditions would only be checked if nodeos is compiled in DEBUG mode; otherwise undefined behavior can happen. So it would be wise to not simply assume that the block logs on disk are correct and instead always do those checks during replay. And since the block invariant checks of Legacy blocks are cheap as well, it makes sense to do always do all the block invariant checks of Legacy, Transition, and Proper Savanna blocks (other than the complete QC validation) during replay.
The complete QC validation should not normally be done during replay. But there is an additional bug where the complete QC validation is not done during a replay with --force-all-checks. The --force-all-checks option is there so that the operator can do a replay without having any trust in the integrity of the blocks log they have in their data folder. It should be no different in terms of trust than if they received the blocks from their peers in the P2P network. So it is critical to actually validate the QCs attached in the blocks (along with all other block validation checks that are normally done when syncing) when running a replay with --force-all-checks.
The text was updated successfully, but these errors were encountered:
arhag
changed the title
Always check basic block invariants involving QCs for all block types; handle replay with --force-all-checks appropriately
Handle replay with --force-all-checks appropriately
Sep 4, 2024
arhag
changed the title
Handle replay with --force-all-checks appropriately
Handle block invariant checks during replay appropriately and in particular ensure --force-all-checks is respected
Sep 4, 2024
The bug in #691 revealed that there is a more sensible way of handling the checks for block invariants involving QCs.
Specifically,
verify_qc_claim
was trying to do too much, and it revealed that important block invariants were not being checked for Legacy and Transition blocks.Those invariants were captured in the issue description of #691.
In particular there were four specific categories of block invariant validation:
The changes of #691 are now validating all of these categories of block invariants (as appropriate for the specific block type).
Additionally, there was some clean up in the functions that handle these different categories of validation. However, the Basic validation for Proper Savanna blocks (which is cheap to validation) and the complete QC validation for Proper Savanna blocks that have a QC block extension present (which is more computationally expensive to validate) were still both carried out (in order) in the
verify_qc_claim
function.This monolithic approach to checking the invariants of a Proper Savanna block almost led to a bug where posting the
verify_qc_claim
check to a separate thread (as an optimization) would have prevented a check for a critical block invariant that was assumed as a pre-condition in later parts of the code that was indirectly called when constructing theblock_state
; see https://github.com/AntelopeIO/spring/pull/538/files#r1717405576.So in addition to any other (protocol compatible) refactoring of these different categories of checks within the implementation that may be desired to do for code clarity reasons, this issue may present an opportunity (for reasons that will be more clear when the issue gets into the actual bug described by this issue) to separate out the checks for basic validation for Proper Savanna blocks from the complete QC validation for Proper Savanna blocks that have a QC block extension present. And maybe even to go further and optimize the validation by carrying out the complete QC validation for Proper Savanna blocks that have a QC block extension present in parallel with the construction of the
block_state
. However, it is still critical to finish the basic validation for Proper Savanna blocks prior to starting the construction theblock_state
and to finish the complete QC validation for Proper Savanna blocks that have a QC block extension present prior to callingintegrate_received_qc_to_block
.But the main goal of this issue is to address a bug with the way these block invariant checks are handled during replay.
For optimization reasons, it is desirable to not do the complete QC validation during replay normally since it is computationally expensive. It is debatable whether checking block invariants of Legacy blocks are useful to do during replay (but they should also be cheap). But the
block_state
construction that needs to happen during replay assumes some pre-conditions that must be true and that are only validated when checking the block invariants of Transition blocks and doing the basic checks for Proper Savanna blocks. The pre-conditions would only be checked if nodeos is compiled in DEBUG mode; otherwise undefined behavior can happen. So it would be wise to not simply assume that the block logs on disk are correct and instead always do those checks during replay. And since the block invariant checks of Legacy blocks are cheap as well, it makes sense to do always do all the block invariant checks of Legacy, Transition, and Proper Savanna blocks (other than the complete QC validation) during replay.The complete QC validation should not normally be done during replay. But there is an additional bug where the complete QC validation is not done during a replay with
--force-all-checks
. The--force-all-checks
option is there so that the operator can do a replay without having any trust in the integrity of the blocks log they have in their data folder. It should be no different in terms of trust than if they received the blocks from their peers in the P2P network. So it is critical to actually validate the QCs attached in the blocks (along with all other block validation checks that are normally done when syncing) when running a replay with--force-all-checks
.The text was updated successfully, but these errors were encountered: