-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Treat blocks, that already downloaded, as synced #11264
Conversation
Thank you for the great PR description.
I don't understand this part. Even if block N takes a very long time to import, why does this mean that N-1 will also fail? Actually, isn't N-1 already imported?
What are the conditions that could lead block N to not finish the import at all? |
I've created a sequence diagram in order to illustrate the scenario.
Any block import can fail. |
I think, I've named last synced block parameter not correctly. It's actually last_imported_block.The diagram is modified accordingly. See retraction logic here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432
I think, that import time is pretty standard (depends on block ofc). The reason of such behavior is not the slowness of blocks' import, but network latency. Or it would be better to say, lack of network latency. In regular configurations (when archive runs on the separate node) requesting and downloading takes more time, so this retraction doesn't happens so fast. In the issue, archive and full node run on the same machine, that makes the issue easier to reproduce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From memory this code is difficult to reason about (and test) - so the sequence diagram is super useful.
Indeed it looks as if this will solve the immediate issue, so we just need to make sure there are no unintended consequences.
So the question is: can we think of a scenario where having imported.len() == 0
for AlreadyQueued
is required, in order to trigger a retraction?
From my point of view this was a bug in the general logic of retraction. imported vector is converted directly into imported_this_round and the main point, where it's used and considered, is here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432 So let's look into this place of code: do we want to start retraction, if on this round we do download blocks but they are in importing queue already? It's kind of intermediate state for block sync, its status is not defined yet: do we need to proceed further or do we need to retract back? I suggest, that because of its "intermediateness" we should stay on this place until blocks are imported. IMHO it's the only solution for such state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@grbIzl please add patch labels if you think it should be backported |
Important: it's recreated #10864 I've decided to re-create it from the scratch, because it had a lot of not relevant commits, that were made during its testing. Now the fix is verified (in duet with #10895) and ready for review.
(Copypasted description)
This PR aims to fix the following scenario in the block sync:
See the sequence diagram:
This downfall only finishes, when (if) block number N is imported and Parity catches up the current best block
As a result huge DDOS of GetBlock requests outcomes from the node. This also can affect current block's verification and exacerbates the issue.
Suggested solution:
Treat downloaded blocks on step 2 (marked * in the diagram) and further, that exist already in the chain or in the queue, as actually synced on this round and (as result) do not start error handling procedure with blocks descending.
Comments:
Relates to #10724.
Also such DDOS can cause the hanging of other node (see #10821 ), that replies to these massive GetBlocks requests.
This issue has similarities to the described in #9531 but has different scenario and root causes are not connected.