Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Fix pubsub new_blocks notifications to include all blocks #9987

Merged
merged 20 commits into from
Dec 19, 2018

Conversation

mattrutherford
Copy link
Contributor

Remove duplicate check for is_empty. Check total queue size is < 10 to send new blocks notifications (to indicate not doing major sync). Tried initially with queue size of 3 (as used in pub fn is_major_importing_or_waiting in rpc/src/v1/helpers/block_import.rs) but during overnight testing it missed a block; increasing to 10 returned all canonical blocks in the same test setup.

Closes 9865

@parity-cla-bot
Copy link

It looks like @mattrutherford signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

@mattrutherford mattrutherford added the A0-pleasereview 🤓 Pull request needs code review. label Nov 29, 2018
@5chdn 5chdn added the M4-core ⛓ Core client code / Rust. label Nov 29, 2018
@5chdn 5chdn added this to the 2.3 milestone Nov 29, 2018
Copy link
Collaborator

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That second if is_empty is indeed redundant, but I don't think the check !imported_blocks.is_empty() && is_empty is duplicated. The first one checks whether the imported block list is empty, and the second one checks whether the remaining queue is empty. In this PR, the behavior for determine whether the client is syncing is changed via:

  • Previously, we check whether the block queue is empty after import max_round_blocks_to_import blocks. If not, we say that the client is syncing.
  • In this PR, we check whether the block queue has a length of less than 10 after import max_round_blocks_to_import blocks. If not, we say that the client is syncing.

However, as you can see, the changed behavior in this PR is mostly identical to setting --max-round-blocks-to-import config to a higher number.

To fix the reported issue, maybe we can just change the default of --max-round-blocks-to-import to a higher number like 22? I think the previous definition of "syncing" is nicer because it avoids an extra parameter.

duration,
);
});
if client.queue_info().total_queue_size() < 10 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether calling queue_info here is expensive. We fetch many other not-used info by this and it acquires three atomics. And I suppose this function is something that needs to be really fast, so it may matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it includes a bunch of stuff that we don't need. It currently requires 3 Mutex locks - I have a separate suggestion for a way to negate the need to lock WIP, which could mitigate risk here.

duration,
);
});
if client.queue_info().total_queue_size() < 10 {
Copy link
Collaborator

@sorpaas sorpaas Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-coding of 10 won't fit slower network conditions. Can we parameterize this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean 10 might be too high? Maybe checking for distance from most recent network block might be a better way to do this? Do we have a reference for that?

@mattrutherford
Copy link
Contributor Author

I don't think increasing max_round_blocks_to_import will fix this because we miss around 1 in 50 block notifications in normal operation.

@sorpaas
Copy link
Collaborator

sorpaas commented Nov 29, 2018

@mattrutherford Do you mind to explain? I still don't see why we would get different results compared with increasing max_round_blocks_to_import.

@mattrutherford
Copy link
Contributor Author

I've put 2 printlns in the code to illustrate( first at the top of the fn and again at the bottom), so you can see against the logs and compare with the output of the nodejs test script . is_empty checks the size of VerificationQueue.processing, drain takes blocks out of VerificationQueue.verification Look what happens around a reorg. Perhaps we could use the len_caching_mutex I mentioned above to cache verification.verified.len() ?
block hash, block no, parent hash, blocknumber == past block no. + 1

0x38baf4054ce364111dffdc550a51bff185709dd244d2353c41f8137aa0bdc4e6, 6795365, 0xf821fb57215b18989c181d9a66bace98978924b1a29921eb8c40fd1c7ac76756, true
0x9936054797840cd6dc167852f464c8f6e77132a9e09c8e3afbb3ef86ad7d19b4, 6795366, 0x38baf4054ce364111dffdc550a51bff185709dd244d2353c41f8137aa0bdc4e6, true
0x3541f9b24701de2eb3ef24268b0ae54f60d8164c065173e2a6799c5250ee5799, 6795368, 0x2d0107139d12abed524b460175fe4295e2ff73b255e37e9d687322368abbf730, false
0x170dd36a5a5f81e758715b39e84699e9d9d173f7585801006c249ad2c0d9538f, 6795369, 0x3541f9b24701de2eb3ef24268b0ae54f60d8164c065173e2a6799c5250ee5799, true
0x49332458724822585e233c4d625ee966fffe46f5e9170a09eefb31ff7e86e073, 6795371, 0x8c11a9aae79da9304bf53f83c9de374cebc9ce372b461f5ddd522ec5a151473c, false
0xd530a13946a1ee86c32177b9efe32a899667530c00c6829222f3ef3756e2fe1c, 6795372, 0x49332458724822585e233c4d625ee966fffe46f5e9170a09eefb31ff7e86e073, true
0x5cb9338308c9dc13c8e19047e4ca21310d2fb26cc3981f895abefc4cd5105242, 6795373, 0xd530a13946a1ee86c32177b9efe32a899667530c00c6829222f3ef3756e2fe1c, true
2018-11-29 16:52:10 UTC Imported #6795362 0x8cf1…f3c8 (113 txs, 7.99 Mgas, 510 ms, 28.34 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:52:12 UTC Imported #6795363 0xfcc0…3340 (7 txs, 0.23 Mgas, 35 ms, 1.54 KiB)
2018-11-29 16:52:12 UTC   22/25 peers      6 MiB chain  107 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs
Queue size: 1, max_blocks_to_import: '12'
Queue size: 1, max_blocks_to_import: '12'
Queue size: 1, is_empty: 'false'
Queue size: 0, is_empty: 'true'
2018-11-29 16:52:42 UTC Imported #6795364 0xf821…6756 (51 txs, 8.00 Mgas, 392 ms, 11.16 KiB)
2018-11-29 16:52:46 UTC   22/25 peers      7 MiB chain  108 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs
Queue size: 1, max_blocks_to_import: '12'
2018-11-29 16:53:02 UTC Reorg to #6795365 0x38ba…c4e6 (0x6b8b…b9e1 #6795363 0xfcc0…3340 0xf821…6756)
Queue size: 0, is_empty: 'true'
2018-11-29 16:53:02 UTC Imported #6795365 0x38ba…c4e6 (0 txs, 0.00 Mgas, 17 ms, 0.53 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:53:13 UTC Imported #6795366 0x9936…19b4 (74 txs, 7.99 Mgas, 262 ms, 26.93 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 1, max_blocks_to_import: '12'
Queue size: 1, is_empty: 'false'
Queue size: 0, is_empty: 'true'
2018-11-29 16:53:19 UTC Imported #6795368 0x3541…5799 (48 txs, 7.85 Mgas, 406 ms, 11.93 KiB)
2018-11-29 16:53:19 UTC   21/25 peers      7 MiB chain  107 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:53:23 UTC Imported #6795368 0x01cd…27df (35 txs, 7.94 Mgas, 388 ms, 17.50 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:53:45 UTC Imported #6795369 0x170d…538f (23 txs, 8.00 Mgas, 53 ms, 24.96 KiB)
2018-11-29 16:53:51 UTC   22/25 peers      5 MiB chain  107 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs
Queue size: 1, max_blocks_to_import: '12'
Queue size: 1, is_empty: 'false'
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:54:11 UTC Imported #6795370 0xf9f5…24fe (152 txs, 8.00 Mgas, 422 ms, 29.77 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:54:18 UTC Imported #6795371 0x4933…e073 (114 txs, 7.99 Mgas, 521 ms, 27.65 KiB) + another 1 block(s) containing 144 tx(s)
2018-11-29 16:54:23 UTC   20/25 peers      5 MiB chain  108 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:54:32 UTC Imported #6795372 0xd530…fe1c (38 txs, 7.97 Mgas, 256 ms, 7.17 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:54:38 UTC Imported #6795373 0x5cb9…5242 (65 txs, 7.98 Mgas, 322 ms, 10.67 KiB)
Queue size: 1, max_blocks_to_import: '12'
Queue size: 0, is_empty: 'true'
2018-11-29 16:54:54 UTC Imported #6795374 0x258b…d755 (60 txs, 7.99 Mgas, 232 ms, 21.52 KiB)
2018-11-29 16:54:59 UTC   21/25 peers      3 MiB chain  107 MiB db  0 bytes queue    2 MiB sync  RPC:  7 conn,    0 req/s,   95 µs

@sorpaas
Copy link
Collaborator

sorpaas commented Nov 30, 2018

@mattrutherford From the log you provided, isn't it the case that when queue_size is empty, is_empty is always true? Are we really sure the bug is caused by is_empty check?

From the code, it looks like we add items to VerificationQueue::processing when adding blocks to the queue, and when we drained blocks from the queue via VerificationQueue::drain, we always call mark_as_good or mark_as_bad which clears items from VerificationQueue::processing. So it looks like when the queue is empty, we should always have processing to be empty. Otherwise we probably have a bug there.

Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not trigger the notification always and let the receives decide when to handle it?

@mattrutherford mattrutherford changed the title fix: new_blocks notifications sometimes not triggered during normal conditions [WIP] fix: new_blocks notifications sometimes not triggered during normal conditions Nov 30, 2018
@mattrutherford
Copy link
Contributor Author

@sorpaas Thanks - what you say makes sense. @tomusdrw Also I think this is good to allow more flexibility.

@mattrutherford
Copy link
Contributor Author

I decided to take the suggestion by @tomusdrw because I think it's pragmatic to let each implementation have the information and decide what appropriate action to take. All existing functionality (except for the updated pubsub impl) remains as before.

The current solution has been tested over a 48hr period and appears stable with no missed notifications, previously missing a block notification every ~10 min or so.

I think we should open a new issue to examine the way the queue works and decide if it's working as expected.

Another note - it may be possible to optimise NewBlocksand have the members be references rather than owned values, allowing each impl to clone when necessary.

@mattrutherford mattrutherford changed the title [WIP] fix: new_blocks notifications sometimes not triggered during normal conditions new_blocks notifications sometimes not triggered during normal conditions Dec 4, 2018
@mattrutherford mattrutherford force-pushed the mr-9865-missing-new-blocks-notifications branch from ddf19da to d78639c Compare December 4, 2018 10:23
Copy link
Collaborator

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left a small grumble.

Note that this does change behavior of RPC pubsub -- before we don't report blocks if we're syncing, but right now we do. Fortunately our RPC specs don't have any restrictions on that.

Also (I think totally optional before merging this PR, but nice to have) it may be good if we can test syncing older blocks with this branch, just to make sure performance is okay.

_proposed: Vec<Bytes>,
_duration: Duration,
) {
fn new_blocks(&self, new_blocks: NewBlocks) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a check below this line that if we have no log_subscribers and no head_subscribers then early return? We do init some structs below regardless of whether there're subscribes. They may make normal syncing slow.

@sorpaas sorpaas added A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. and removed A0-pleasereview 🤓 Pull request needs code review. labels Dec 5, 2018
@sorpaas
Copy link
Collaborator

sorpaas commented Dec 5, 2018

@mattrutherford Also please update PR description to make @5chdn's job easier!

proposed_blocks.clone(),
duration,
NewBlocks::new(
imported_blocks.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass references here instead of cloning the Vec every time?
Or maybe better pass Arc<NewBlocks> to avoid cloning the content if reference is not possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking about too (comment above). There might be some more improvements to make, but I was wondering where to draw the line with this PR. I think this is worth doing this however.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with a separate PR improving this, just make sure to create an issue for this and add patch labels.

rpc/src/v1/impls/eth_pubsub.rs Outdated Show resolved Hide resolved
Implement new struct to pass to new_blocks() with extra
parameter - processing_is_empty, which was previously
used to determine whether the notificaiton should be
sent. Now it's up to each implementation to decide.

Updated all implementations to behave as before,
except eth_pubsub, which ignores processing_is_empty.

Update original tests for new_blocks
@mattrutherford mattrutherford force-pushed the mr-9865-missing-new-blocks-notifications branch from 725b979 to 23cd3e7 Compare December 18, 2018 09:29
@mattrutherford mattrutherford changed the title new_blocks notifications sometimes not triggered during normal conditions Fix pubsub new_blocks notifications to include all blocks Dec 18, 2018
Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small suggestion on the processing_is_empty bool variable, but it's not critical.

ethcore/src/client/chain_notify.rs Outdated Show resolved Hide resolved
@sorpaas sorpaas added A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Dec 18, 2018
_proposed: Vec<Bytes>,
_duration: Duration,
) {
fn new_blocks( &self, _new_blocks: NewBlocks) {
Copy link
Collaborator

@sorpaas sorpaas Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn new_blocks( &self, _new_blocks: NewBlocks) {
fn new_blocks(&self, _new_blocks: NewBlocks) {

@mattrutherford mattrutherford removed the A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. label Dec 19, 2018
@mattrutherford mattrutherford merged commit 215602d into master Dec 19, 2018
@mattrutherford mattrutherford deleted the mr-9865-missing-new-blocks-notifications branch December 19, 2018 09:24
@5chdn 5chdn mentioned this pull request Dec 28, 2018
15 tasks
5chdn pushed a commit that referenced this pull request Jan 9, 2019
Fix: new blocks notifications sometimes missing in pubsub RPC

Implement new struct to pass to `new_blocks()` with extra parameter - `has_more_blocks_to_import`, which was previously used to determine whether the notification should be sent. Now it's up to each implementation to decide what to do.

Updated all implementations to behave as before, except `eth_pubsub`, which will send notification even when the queue is not empty.

Update tests.
5chdn added a commit that referenced this pull request Jan 9, 2019
* version: bump beta to v2.2.6

* Update pwasm-utils to 0.6.1 (#10134)

* version: mark upgrade critical on kovan

* Identity fix (#10128)

* fix #10125

fix service transaction version detection if --identity is enabled, change test to match how --identity actually works

* fix wrong var

* get the index of  v, not /

* idx, not idx.len()

* Update ethcore/sync/src/chain/propagator.rs

Co-Authored-By: joshua-mir <[email protected]>

* Update ethcore/sync/src/chain/propagator.rs

Co-Authored-By: joshua-mir <[email protected]>

* change version prefix to a const

* space

Co-Authored-By: joshua-mir <[email protected]>

* [Hit return to continue]

* ethcore: update hardcoded headers for foundation

* ethcore: update hardcoded headers for ropsten

* ethcore: update hardcoded headers for kovan

* ethcore: use consistant formatting

* ethcore: restore spaces after colon in chain spec

* ethcore: fix bootnodes in chain specs

* ethcore: fix bootnodes in chain specs

* ethcore: enforce newline at the end of chainspecs

* Follow-up to #10105 (#10107)

* HF in POA Sokol (2019-01-04) (#10077)

* HF in POA Sokol (2019-01-04)

poanetwork/poa-chain-spec#91

* Update POA Core bootnodes

* Autogen docs for the "Configuring Parity Ethereum" wiki page. (#10067)

* publish docs changes for autogen config docs

* Update publish-docs.sh

adding an environment variable so js knows not to download git master and just grab the local repo

* Update publish-docs.sh

made some changes making this unnecessary

* fix env variable

env variable passes to node properly now

* use yarn

* test pipeline, revert me

* fix test pipeline, revert me

* change runner tag

* change runner tag 2

* change runner tag

* global git config

* supress upload_files output

* Update .gitlab-ci.yml

reverting testing changes

* Replace tag if exists

Very unlikely to be important/useful

* Autogen docs for the "Configuring Parity Ethereum" wiki page. (#10067)

* publish docs changes for autogen config docs

* Update publish-docs.sh

adding an environment variable so js knows not to download git master and just grab the local repo

* Update publish-docs.sh

made some changes making this unnecessary

* fix env variable

env variable passes to node properly now

* use yarn

* test pipeline, revert me

* fix test pipeline, revert me

* change runner tag

* change runner tag 2

* change runner tag

* global git config

* supress upload_files output

* Update .gitlab-ci.yml

reverting testing changes

* Replace tag if exists

Very unlikely to be important/useful

* finality: dont require chain head to be in the chain (#10054)

* Fill transaction hash on ethGetLog of light client. (#9938)

* Fill transaction hash on ethGetLog of light client. This is enifficient
but we keep align with spec.

* Using better variables names.

* Expose old fast light get log as `parity_getLogsLight`.

* Update rpc/src/v1/helpers/light_fetch.rs

Fix indent.

Co-Authored-By: cheme <[email protected]>

* Factor common code between light_logs and logs.

* Remove useless check

* Rename parity light logs to 'parity_getLogsNoTransactionHash'.
Fix indent and extra white lines.

* Use vec instead of tree map to avoid inner function.

* better loop

* fix pubsub new_blocks notifications to include all blocks (#9987)

Fix: new blocks notifications sometimes missing in pubsub RPC

Implement new struct to pass to `new_blocks()` with extra parameter - `has_more_blocks_to_import`, which was previously used to determine whether the notification should be sent. Now it's up to each implementation to decide what to do.

Updated all implementations to behave as before, except `eth_pubsub`, which will send notification even when the queue is not empty.

Update tests.

* Revert part of 70a6db7

* HF in POA Core (2019-01-18) - Constantinople (#10155)

poanetwork/poa-chain-spec#100

* ci: re-enable snap publishing (#10142)

* ci: enable snap publishing~

* ci: add publish snap script

* ci: add snapcraft skeleton

* ci: group export statements

* ci: enable snaps on pr branch

* ci: enable snaps on pr branch

* ci: set default BUILD_ARCH

* ci: enable snaps on pr branch

* ci: enable snaps on pr branch

* ci: add libdb to snap

* ci: reinitiate gitlabci

* ci: reinitiate publish-snap script

* ci: fix yaml syntax

* cargo/gitlab env vars

* debug, revert me

* version?

* debug vars

* vars

* vars fix

* vars fix

* revert

* Update scripts/gitlab/publish-snap.sh

Co-Authored-By: 5chdn <[email protected]>

* ci: read track from cargo toml

* Make sure parent block is not in importing queue when importing ancient blocks (#10138)

* Make sure parent block is not in importing queue when importing ancient blocks

* Clear queue when an ancient import fails

* Lock only once in clear

* Add comments why queued check is needed

* Should push the value back to the queue

* Directly check in chain.read()

* Remove extra empty line

* Revert unused verification change
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants