Treat blocks, that already downloaded, as synced #11264

grbIzl · 2019-11-15T11:12:55Z

Important: it's recreated #10864 I've decided to re-create it from the scratch, because it had a lot of not relevant commits, that were made during its testing. Now the fix is verified (in duet with #10895) and ready for review.

(Copypasted description)

This PR aims to fix the following scenario in the block sync:

Node downloads the next block number N and starts importing it. Due to block's complexity or some configuration issues block's import takes more time (I've seen this issue, when import took 5-10 seconds).
Meanwhile the next sync round has begun and Parity tries to sync block N once more time, obviously import of this block has no success with error AlreadyQueued. As a result the sync round has completed with 0 blocks synced
As no new blocks were synced during sync round Parity starts descending in block number, treating this situation as error during blocks download: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L429 and repeats sync for (already downloaded) blocks
Attempt to sync block number N-1 returns the same error and causes attempt to download block N-2 and so on

See the sequence diagram:

This downfall only finishes, when (if) block number N is imported and Parity catches up the current best block
As a result huge DDOS of GetBlock requests outcomes from the node. This also can affect current block's verification and exacerbates the issue.
Suggested solution:
Treat downloaded blocks on step 2 (marked * in the diagram) and further, that exist already in the chain or in the queue, as actually synced on this round and (as result) do not start error handling procedure with blocks descending.

Comments:

Relates to #10724.
Also such DDOS can cause the hanging of other node (see #10821 ), that replies to these massive GetBlocks requests.
This issue has similarities to the described in #9531 but has different scenario and root causes are not connected.

dvdplm · 2019-11-17T09:23:21Z

Thank you for the great PR description.

Attempt to sync block number N-1 returns the same error

I don't understand this part. Even if block N takes a very long time to import, why does this mean that N-1 will also fail? Actually, isn't N-1 already imported?

This downfall only finishes, when (if) block number N is imported and Parity catches up the current best block

What are the conditions that could lead block N to not finish the import at all?

grbIzl · 2019-11-18T10:58:28Z

I don't understand this part. Even if block N takes a very long time to import, why does this mean that N-1 will also fail? Actually, isn't N-1 already imported?

I've created a sequence diagram in order to illustrate the scenario.

What are the conditions that could lead block N to not finish the import at all?

Any block import can fail.

dvdplm · 2019-11-18T11:30:57Z

I've created a sequence diagram in order to illustrate the scenario.

Neat, thank you. Questions:

Any block import can fail.

Sure, but specifically in the scope of the bug this is fixing, what is causing the import to be so slow, do we know?

grbIzl · 2019-11-18T11:57:47Z

Should this be N?

I think, I've named last synced block parameter not correctly. It's actually last_imported_block.The diagram is modified accordingly. See retraction logic here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432

Sure, but specifically in the scope of the bug this is fixing, what is causing the import to be so slow, do we know?

I think, that import time is pretty standard (depends on block ofc). The reason of such behavior is not the slowness of blocks' import, but network latency. Or it would be better to say, lack of network latency. In regular configurations (when archive runs on the separate node) requesting and downloading takes more time, so this retraction doesn't happens so fast. In the issue, archive and full node run on the same machine, that makes the issue easier to reproduce.

ascjones

From memory this code is difficult to reason about (and test) - so the sequence diagram is super useful.

Indeed it looks as if this will solve the immediate issue, so we just need to make sure there are no unintended consequences.

So the question is: can we think of a scenario where having imported.len() == 0 for AlreadyQueued is required, in order to trigger a retraction?

grbIzl · 2019-12-03T11:50:26Z

So the question is: can we think of a scenario where having imported.len() == 0 for AlreadyQueued is required, in order to trigger a retraction?

From my point of view this was a bug in the general logic of retraction. imported vector is converted directly into imported_this_round and the main point, where it's used and considered, is here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432 So let's look into this place of code: do we want to start retraction, if on this round we do download blocks but they are in importing queue already? It's kind of intermediate state for block sync, its status is not defined yet: do we need to proceed further or do we need to retract back? I suggest, that because of its "intermediateness" we should stay on this place until blocks are imported. IMHO it's the only solution for such state.

ascjones

LGTM 👍

ordian · 2019-12-03T15:00:39Z

@grbIzl please add patch labels if you think it should be backported

Treat only blocks in queue as synced

1f9e119

grbIzl added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Nov 15, 2019

grbIzl mentioned this pull request Nov 15, 2019

Treat blocks, that already downloaded, as synced #10864

Closed

ordian requested a review from ascjones November 15, 2019 11:35

ordian added this to the 2.7 milestone Nov 15, 2019

ascjones reviewed Nov 28, 2019

View reviewed changes

ascjones approved these changes Dec 3, 2019

View reviewed changes

ordian approved these changes Dec 3, 2019

View reviewed changes

ordian merged commit 2895e3b into master Dec 3, 2019

ordian deleted the PreventBlocksReimport2 branch December 3, 2019 14:59

ordian added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat blocks, that already downloaded, as synced #11264

Treat blocks, that already downloaded, as synced #11264

grbIzl commented Nov 15, 2019 •

edited

Loading

dvdplm commented Nov 17, 2019

grbIzl commented Nov 18, 2019

dvdplm commented Nov 18, 2019

grbIzl commented Nov 18, 2019

ascjones left a comment •

edited

Loading

grbIzl commented Dec 3, 2019

ascjones left a comment

ordian commented Dec 3, 2019

Treat blocks, that already downloaded, as synced #11264

Treat blocks, that already downloaded, as synced #11264

Conversation

grbIzl commented Nov 15, 2019 • edited Loading

dvdplm commented Nov 17, 2019

grbIzl commented Nov 18, 2019

dvdplm commented Nov 18, 2019

grbIzl commented Nov 18, 2019

ascjones left a comment • edited Loading

Choose a reason for hiding this comment

grbIzl commented Dec 3, 2019

ascjones left a comment

Choose a reason for hiding this comment

ordian commented Dec 3, 2019

grbIzl commented Nov 15, 2019 •

edited

Loading

ascjones left a comment •

edited

Loading