-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Keep existing blocks when restoring a Snapshot #8643
Conversation
ethcore/src/snapshot/service.rs
Outdated
|
||
// Writting changes to DB and logging every now and then | ||
if block_number % 1_000 == 0 { | ||
next_db.write_buffered(batch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write_buffered
will keep them in memory. we need to flush periodically
ethcore/src/snapshot/service.rs
Outdated
let block_receipts = block_receipts.receipts; | ||
|
||
next_chain.insert_unordered_block(&mut batch, &raw_block, block_receipts, parent_diff, false, true); | ||
parent_diff = Some(diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_diff
is meant to be a total difficulty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups, right...
ethcore/src/snapshot/service.rs
Outdated
// Try to include every block that will need to be downloaded from the current chain | ||
// Break when no more blocks are available from it. | ||
match (next_chain_info.ancient_block_number, next_chain_info.first_block_number) { | ||
(Some(next_ancient_block), Some(next_first_block)) if next_ancient_block + 1 < next_first_block => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ancient_block_number
is always going to be 1
(or 0
, can't remember) after a restoration, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the blocks we want to import from the old chain: anything available from genesis to first_block_number
, but maintaining the invariant that only a single gap exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's right, it's supposed to always be 1
. It does ensure that there is still only one gap of blocks.
ethcore/src/snapshot/service.rs
Outdated
next_ancient_block, next_first_block, | ||
); | ||
|
||
let mut block_number = next_ancient_block + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really needs to check that the first ancient block we pull from the old client has the right parent_hash
.
ethcore/src/snapshot/service.rs
Outdated
} | ||
// Break if we already imported some blocks in the current batch and there | ||
// are no more left | ||
} else if parent_diff.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only break if parent_diff
is Some
? if there are no more blocks we can import it seems like it should always break the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's right, didn't push this change yet.
ethcore/src/snapshot/service.rs
Outdated
while block_number < next_first_block { | ||
let chain = cur_chain.read(); | ||
|
||
if let Some(block_hash) = chain.block_hash(block_number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have an exclusive lock on the chain's DB. this is racing with client's normal block import. (in practice this isn't a problem right now but let's not lean on assumptions from outside the module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it lock the DB for the time of the restoration? The blocks won't change during it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should rather reference blocks by hash as opposed to number. Hash -> Block
is stable, Number -> Block
varies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is easiest when we import blocks in reverse because the parent_hash
is always available. But then we should always make sure that next_first_block.parent_hash
corresponds to the hash of a block that we import as its parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So could it be just a check that for each block, the parent's hash is the one expected, ie. the one we got from the previous block ? If there is a mismatch, then just stop right there.
Since GitHub marks that comment as hidden I'll continue here:
sure, but that's a little weird because we know that those blocks are still probably in the DB, just that there was a reorg in the meantime |
But do we keep blocks in DB that has been reorged? |
yes. because there could be a reorg back at any point, there is no reason to discard them. |
@rphmeier I update the PR so that it starts at the best available ancient block, and it iterates backwards from the parent's hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very important pr, but imo still requires some polishing :)
ethcore/src/client/client.rs
Outdated
@@ -845,6 +845,11 @@ impl Client { | |||
*self.exit_handler.lock() = Some(Box::new(f)); | |||
} | |||
|
|||
/// Returns the chain reference | |||
pub fn chain(&self) -> &RwLock<Arc<BlockChain>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bad practice to expose &RwLock
. There is no guarantee this will not deadlock
ethcore/src/snapshot/service.rs
Outdated
@@ -16,6 +16,7 @@ | |||
|
|||
//! Snapshot network service implementation. | |||
|
|||
// use std::cmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be removed
ethcore/src/snapshot/service.rs
Outdated
@@ -220,7 +221,7 @@ pub struct ServiceParams { | |||
/// Usually "<chain hash>/snapshot" | |||
pub snapshot_root: PathBuf, | |||
/// A handle for database restoration. | |||
pub db_restore: Arc<DatabaseRestore>, | |||
pub client: Arc<Client>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's contradictory to @0x7CFE refactor of ethcore
. We don't want any modules to require Client
, but only the interface that is actually being used
@@ -103,6 +102,9 @@ fn restored_is_equivalent() { | |||
|
|||
#[test] | |||
fn guards_delete_folders() { | |||
let gas_prices = vec![1.into(), 2.into(), 3.into(), 999.into()]; | |||
let client = generate_dummy_client_with_spec_and_data(Spec::new_null, 400, 5, &gas_prices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, this helper function generates client with 400 blocks. you should check in this test if they were actually migrated correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. This test only tests if the guarded folders are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from #6350
When restoring from a snapshot parity should try and re-import ancient blocks existing in the current database starting from genesis.
We need a test for that ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah sure, I thought you were commenting on the guards_delete_folders
test!
@@ -838,6 +827,80 @@ impl BlockChain { | |||
} | |||
} | |||
|
|||
/// Update the best ancient block to the given hash, after checking that | |||
/// it's directly linked to the currently known best ancient block | |||
pub fn update_best_ancient_block(&self, hash: &H256) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called only from a single place. Is it possible for hash to not be linked to any known ancient block? If yes, what does it mean? Should it be handled somehow? Currently the result of this function execution is unknown which makes it very difficult to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function will go from the given hash, and will ensure there is a link between the block at the given hash and the last know best ancient block. Thus, it only update the best ancient block if there is a link.
Please reopen when ready |
Rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this merged and tested in wild, it's too long overdue already.
let mut block_hash = *hash; | ||
let mut is_linked = false; | ||
|
||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ @ngotchac is this addressed?
ethcore/sync/src/chain/supplier.rs
Outdated
if let Some(mut receipts_bytes) = io.chain().encoded_block_receipts(&rlp.val_at::<H256>(i)?) { | ||
if let Some(receipts) = io.chain().block_receipts(&rlp.val_at::<H256>(i)?) { | ||
let mut receipts_bytes = ::rlp::encode(&receipts).into_vec(); | ||
// if let Some(mut receipts_bytes) = io.chain().encoded_block_receipts(&rlp.val_at::<H256>(i)?) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally took the time to go through this. Looks good!
Co-Authored-By: ngotchac <[email protected]>
🎉 !!! |
* Rename db_restore => client * First step: make it compile! * Second step: working implementation! * Refactoring * Fix tests * PR Grumbles * PR Grumbles WIP * Migrate ancient blocks interating backward * Early return in block migration if snapshot is aborted * Remove RwLock getter (PR Grumble I) * Remove dependency on `Client`: only used Traits * Add test for recovering aborted snapshot recovery * Add test for migrating old blocks * Fix build * PR Grumble I * PR Grumble II * PR Grumble III * PR Grumble IV * PR Grumble V * PR Grumble VI * Fix one test * Fix test * PR Grumble * PR Grumbles * PR Grumbles II * Fix tests * Release RwLock earlier * Revert Cargo.lock * Update _update ancient block_ logic: set local in `commit` * Update typo in ethcore/src/snapshot/service.rs Co-Authored-By: ngotchac <[email protected]>
* version: bump beta to 2.2.2 * Add experimental RPCs flag (#9928) * WiP * Enable experimental RPCs. * Keep existing blocks when restoring a Snapshot (#8643) * Rename db_restore => client * First step: make it compile! * Second step: working implementation! * Refactoring * Fix tests * PR Grumbles * PR Grumbles WIP * Migrate ancient blocks interating backward * Early return in block migration if snapshot is aborted * Remove RwLock getter (PR Grumble I) * Remove dependency on `Client`: only used Traits * Add test for recovering aborted snapshot recovery * Add test for migrating old blocks * Fix build * PR Grumble I * PR Grumble II * PR Grumble III * PR Grumble IV * PR Grumble V * PR Grumble VI * Fix one test * Fix test * PR Grumble * PR Grumbles * PR Grumbles II * Fix tests * Release RwLock earlier * Revert Cargo.lock * Update _update ancient block_ logic: set local in `commit` * Update typo in ethcore/src/snapshot/service.rs Co-Authored-By: ngotchac <[email protected]> * Adjust requests costs for light client (#9925) * PIP Table Cost relative to average peers instead of max peers * Add tracing in PIP new_cost_table * Update stat peer_count * Use number of leeching peers for Light serve costs * Fix test::light_params_load_share_depends_on_max_peers (wrong type) * Remove (now) useless test * Remove `load_share` from LightParams.Config Prevent div. by 0 * Add LEECHER_COUNT_FACTOR * PR Grumble: u64 to u32 for f64 casting * Prevent u32 overflow for avg_peer_count * Add tests for LightSync::Statistics * Fix empty steps (#9939) * Don't send empty step twice or empty step then block. * Perform basic validation of locally sealed blocks. * Don't include empty step twice. * prevent silent errors in daemon mode, closes #9367 (#9946) * Fix a deadlock (#9952) * Update informant: - decimal in Mgas/s - print every 5s (not randomly between 5s and 10s) * Fix dead-lock in `blockchain.rs` * Update locks ordering * Fix light client informant while syncing (#9932) * Add `is_idle` to LightSync to check importing status * Use SyncStateWrapper to make sure is_idle gets updates * Update is_major_import to use verified queue size as well * Add comment for `is_idle` * Add Debug to `SyncStateWrapper` * `fn get` -> `fn into_inner` * ci: rearrange pipeline by logic (#9970) * ci: rearrange pipeline by logic * ci: rename docs script * fix docker build (#9971) * Deny unknown fields for chainspec (#9972) * Add deny_unknown_fields to chainspec * Add tests and fix existing one * Remove serde_ignored dependency for chainspec * Fix rpc test eth chain spec * Fix starting_nonce_test spec * Improve block and transaction propagation (#9954) * Refactor sync to add priority tasks. * Send priority tasks notifications. * Propagate blocks, optimize transactions. * Implement transaction propagation. Use sync_channel. * Tone down info. * Prevent deadlock by not waiting forever for sync lock. * Fix lock order. * Don't use sync_channel to prevent deadlocks. * Fix tests. * Fix unstable peers and slowness in sync (#9967) * Don't sync all peers after each response * Update formating * Fix tests: add `continue_sync` to `Sync_step` * Update ethcore/sync/src/chain/mod.rs Co-Authored-By: ngotchac <[email protected]> * fix rpc middlewares * fix Cargo.lock * json: resolve merge in spec * rpc: fix starting_nonce_test * ci: allow nightl job to fail
Closes #6350
In order to restore existing blocks, it just iterates over the blocks of the current DB before swapping it with the snapshot one, from 1 (or first needed block) until all the blocks have been imported.
To test the feature, one can sync without warp-sync for a few thousand blocks, then restart with
--warp-barrier
Blocks restoration seemed pretty fast (~15 seconds for 150_000 blocks), but if too slow this could be improved by skipping caches.