Remove canonical_head_block_root from PersistedBeaconChain #1784

michaelsproul · 2020-10-19T01:40:35Z

Description

In #1639 the canonical_head_block_root field of PersistedBeaconChain was rendered obsolete by the use of fork choice to derive the head block on startup. We intended to remove it entirely when we did the breaking schema change for v0.3.0, but that PR (#1638) got closed and forgotten about ☹️

As a practice migration, I think we should remove the field entirely in a future release, i.e. automatically update the user's database from the old schema (with the block root) to the new (without) on startup.

The text was updated successfully, but these errors were encountered:

## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.

## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <[email protected]>

michaelsproul added A1 database labels Oct 19, 2020

This was referenced Oct 19, 2020

[Merged by Bors] - Fix head tracker concurrency bugs #1771

Closed

[Merged by Bors] - Implement database temp states to reduce memory usage #1798

Closed

paulhauner added A1 and removed A1 labels Nov 8, 2020

michaelsproul mentioned this issue Dec 8, 2020

Handle ungraceful shutdown on first startup #2067

Open

michaelsproul mentioned this issue Mar 5, 2021

[Merged by Bors] - Implement checkpoint sync #2244

Closed

7 tasks

michaelsproul closed this as completed Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove canonical_head_block_root from PersistedBeaconChain #1784

Remove canonical_head_block_root from PersistedBeaconChain #1784

michaelsproul commented Oct 19, 2020

Remove canonical_head_block_root from PersistedBeaconChain #1784

Remove canonical_head_block_root from PersistedBeaconChain #1784

Comments

michaelsproul commented Oct 19, 2020

Description