-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use raw SHA256 as message-id and reduce size to 8 bytes #2044
Conversation
Suggestion, if we are changing the message ID function: should we truncate to e.g. 20 out of 32 bytes of the hash? It's neat to see the full |
specs/phase0/p2p-interface.md
Outdated
@@ -246,9 +246,8 @@ Likewise, clients MUST NOT emit or propagate messages larger than this limit. | |||
The `message-id` of a gossipsub message MUST be: | |||
|
|||
```python | |||
message-id: base64(SHA256(message.data)) | |||
message-id: SHA256(message.data). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the "." at the end was a typo
message-id: SHA256(message.data). | |
message-id: SHA256(message.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am used to finish a sentence (even after a colon) with a dot within the math environment if the sentence is not continued afterwards, but we can remove that of course if you prefer it.
I lean toward shortening the byte sequence Why 20 instead of say 16? |
Being a little conservative, and it doesn't have much value to be a power of 2 here. We can look at the spec how bad a collision would really be. If it's hard enough to mine within the time span of the gossip message cache, it may not even matter much, and could be short. Also need to consider what happens to scoring if there would be a forged collision. And look at the gossip to say how much each byte here really means in terms of traffic (could be significant or not). |
yeah, the seen_ttl is ~1 epoch. Def not worried on the |
You mean |
For reference there is a config parameter which specifies the maximum number of message_ids in an IHAVE message. The default we have at the moment is 5000 (and a peer can at most send 10 per heartbeat, but a good peer should only send 1 to each peer it decides to). So rough calculations give (including 1 byte protobuf varint for encoding) is like 5000*21 = 105kb per heartbeat per peer per topic. For a 16 byte message id its about 85kb. In total, I think across gossip we should expect around V/8 (aggregate messages) + V (subnets) messages per epoch in total. So lets say an extreme case of 4M vals, thats 4.5M messages per epoch (which is the time our duplicate cache remembers messages). So I think the probability of at least one collision per epoch with message-id bit size If this back of envelope calc is right:
Disclaimer, this is just a super quick calc, some of my logic may be wrong here.... If we do get a collision, I imagine we would just drop collision messages. Which ever message comes first will be accepted and its counterpart dropped by the duplicate cache, which may be different for different peers. Edit: @blacktemplar pointed out flaws in my math - edited to his suggestions |
Based on the calculations @AgeManning presented I would propose even using only 8 bytes. It results in a collision probability of The worst that can happen in case of a collision anyway is missing a block (but the probability for that is like 700,000 times smaller as a general collision) or in the most common case missing an attestation. Edit: Just did some worst case calculations on traffic: I only look at the attestation topics (aggregate + subnets) and assume 50 peers and the worst case that all peers are in all subnets. Then the average amount of messages in the cache would be:
In this calculation we can almost ignore the IHAVE size limit of 5000 per topic, since no topic gets close to 5000 except for the So this gives us with 50 peers and 8 byte ids a traffic (outgoing and ingoing) of
In numbers for different amounts of validators:
This is only the traffic for IHAVE messages. To compare that with the worst case traffic on the mesh assuming 8 peers per mesh we get a mesh traffic per second of
Using the maximum sizes calculated in https://benjaminion.xyz/eth2-annotated-spec/phase0/beacon-chain/ we get
which results in a mesh traffic of
Note that those calculations average over the time horizon. In reality most messages will arrive at the beginning of each slot, so the traffic at the beginning of a slot may be significantly higher. So I think 8 byte message ids result in a reasonable IHAVE traffic compared to mesh traffic. Note again that all the calculations above are the absolute (theoretical) worst case. Edit: Edited for correct numbers considering the Edit2: I just remembered that our message cache remembers messages for |
Awesome! Thanks @blacktemplar @AgeManning Interesting -- the ratio of IHAVE to actual messages is pretty high even with the 8 byte message ID. And this is assuming max attestation size, which at lower validator counts, you don't fully hit that size. I definitely agree going down to the 8 bytes here. Seeing a higher IHAVE cost or even at parity with the actual messages themselves is not desirable. Lets go ahead and get this into the PR I think we might consider putting a cap (maybe dynamic) on the number of message IDs included in the IHAVEs to further reduce the overhead of the metadata gossip. For example for the attestation subnets, you could probably get away with max IHAVE ids per message being Thoughts on reducing max sizes here? Oh and an aside, the number of target aggregates is |
I just updated my calculations based on your comment @djrtwo and based on a mistake on the number of heartbeats we propagate the same message. I think its hard to calculate the effects of a restrictive IHAVE size limit maybe its easier to model it and check how the latencies change. I think the question is do we need a size limit or are we happy with the traffic as is (note that my calculations all assume that a node and all its peers is susbscribed to all subnets which is only interesting for a node that has many validators). To answer your other questions:
The current spec (and implementation) defines the IHAVE limits as a spam protection measure and not really as a traffic reduction tool. Currently only the same limit can be used for each topic and they are not configurable individually. |
Nice. The ratio on meta-data messages to actual looks better.
Yeah, agreed. The bandwidth looks okay for our purposes right now and in the context of needing to ship, but some more modeling and potential avenues for optimizing bandwidth might be warranted down the line |
## Issue Addressed ethereum/consensus-specs#2044 ## Proposed Changes Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field. ## Additional Info We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.
Squashed commit of the following: commit 74ed1be Author: pawan <[email protected]> Date: Tue Sep 22 19:27:14 2020 +0530 fmt commit d61f956 Author: pawan <[email protected]> Date: Tue Sep 22 19:03:16 2020 +0530 Fix merge issues commit 282a973 Merge: 6759bd1 a97ec31 Author: pawan <[email protected]> Date: Tue Sep 22 18:30:10 2020 +0530 Merge branch 'master' into directory-restructure commit a97ec31 Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 22 07:29:34 2020 +0000 Subscribe to subnets an epoch in advance (#1600) ## Issue Addressed N/A ## Proposed Changes Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance. commit 7aceff4 Author: Michael Sproul <[email protected]> Date: Tue Sep 22 05:40:04 2020 +0000 Add `safe_sum` and use it in state_processing (#1620) ## Issue Addressed Closes #1098 ## Proposed Changes Add a `SafeArithIter` trait with a `safe_sum` method, and use it in `state_processing`. This seems to be the only place in `consensus` where it is relevant -- i.e. where we were using `sum` and the integer_arith lint is enabled. ## Additional Info This PR doesn't include any Clippy linting to prevent `sum` from being called. It seems there is no existing Clippy lint that suits our purpose, but I'm going to look into that and maybe schedule writing one as a lower-priority task. This theoretically _is_ a consensus breaking change, but it shouldn't impact Medalla (or any other testnet) because `slashings` shouldn't overflow! commit 4fca306 Author: Michael Sproul <[email protected]> Date: Tue Sep 22 05:40:02 2020 +0000 Update BLST, add force-adx support (#1595) ## Issue Addressed Closes #1504 Closes #1505 ## Proposed Changes * Update `blst` to the latest version, which is more portable and includes finer-grained compilation controls (see below). * Detect the case where a binary has been explicitly compiled with ADX support but it's missing at runtime, and report a nicer error than `SIGILL`. ## Known Issues * None. The previous issue with `make build-aarch64` (supranational/blst#27), has been resolved. ## Additional Info I think we should tweak our release process and our Docker builds so that we provide two options: Binaries: * `lighthouse`: compiled with `modern`/`force-adx`, for CPUs 2013 and newer * `lighthouse-portable`: compiled with `portable` for older CPUs Docker images: * `sigp/lighthouse:latest`: multi-arch image with `modern` x86_64 and vanilla aarch64 binary * `sigp/lighthouse:latest-portable`: multi-arch image with `portable` builds for x86_64 and aarch64 And relevant Docker images for the releases (as per #1574 (comment)), tagged `v0.x.y` and `v0.x.y-portable` commit d85d5a4 Author: Paul Hauner <[email protected]> Date: Tue Sep 22 04:45:15 2020 +0000 Bump to v0.2.11 (#1645) ## Issue Addressed NA ## Proposed Changes - Bump version to v0.2.11 - Run `cargo update`. ## Additional Info NA commit bd39cc8 Author: Paul Hauner <[email protected]> Date: Tue Sep 22 02:06:10 2020 +0000 Apply hotfix for inconsistent head (#1639) ## Issue Addressed - Resolves #1616 ## Proposed Changes If we look at the function which persists fork choice and the canonical head to disk: https://github.com/sigp/lighthouse/blob/1db8daae0c7bb34bf2e05644fa6bf313c2bea98e/beacon_node/beacon_chain/src/beacon_chain.rs#L234-L280 There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync. I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch. These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice. ## Additional Info This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions. commit 14ff385 Author: Pawan Dhananjay <[email protected]> Date: Tue Sep 22 01:12:36 2020 +0000 Add trusted peers (#1640) ## Issue Addressed Closes #1581 ## Proposed Changes Adds a new cli option for trusted peers who always have the maximum possible score. commit 5d17eb8 Author: Michael Sproul <[email protected]> Date: Mon Sep 21 11:53:53 2020 +0000 Update LevelDB to v0.8.6, removing patch (#1636) Removes our dependency on a fork of LevelDB now that skade/leveldb-sys#17 is merged commit 1db8daa Author: Age Manning <[email protected]> Date: Mon Sep 21 02:00:38 2020 +0000 Shift metadata to the global network variables (#1631) ## Issue Addressed N/A ## Proposed Changes Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse. ## Additional Info N/A commit 7b97c4a Author: Pawan Dhananjay <[email protected]> Date: Mon Sep 21 01:06:25 2020 +0000 Snappy additional sanity checks (#1625) ## Issue Addressed N/A ## Proposed Changes Adds the following check from the spec > A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header. commit 371e1c1 Author: Paul Hauner <[email protected]> Date: Fri Sep 18 06:41:29 2020 +0000 Bump version to v0.2.10 (#1630) ## Issue Addressed NA ## Proposed Changes Bump crate version so we can cut a new release with the fix from #1629. ## Additional Info NA commit a17f748 Author: Paul Hauner <[email protected]> Date: Fri Sep 18 05:14:31 2020 +0000 Fix bad assumption when checking finalized descendant (#1629) ## Issue Addressed - Resolves #1616 ## Proposed Changes Fixes a bug where we are unable to read the finalized block from fork choice. ## Detail I had made an assumption that the finalized block always has a parent root of `None`: https://github.com/sigp/lighthouse/blob/e5fc6bab485fa54d7e518b325f4eb9201ba5c6a1/consensus/fork_choice/src/fork_choice.rs#L749-L752 This was a faulty assumption, we don't set parent *roots* to `None`. Instead we *sometimes* set parent *indices* to `None`, depending if this pruning condition is satisfied: https://github.com/sigp/lighthouse/blob/e5fc6bab485fa54d7e518b325f4eb9201ba5c6a1/consensus/proto_array/src/proto_array.rs#L229-L232 The bug manifested itself like this: 1. We attempt to get the finalized block from fork choice 1. We try to check that the block is descendant of the finalized block (note: they're the same block). 1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root. 1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an *ancestor* not a *descendant*). 1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad. ## Additional Info In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt. commit 49ab414 Author: Age Manning <[email protected]> Date: Fri Sep 18 02:05:36 2020 +0000 Shift gossipsub validation (#1612) ## Issue Addressed N/A ## Proposed Changes This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid. ## Additional Info We should not merge this until all other clients have been sending empty fields for a while. See ethereum/consensus-specs#1981 for reference commit 2074bec Author: Age Manning <[email protected]> Date: Fri Sep 18 02:05:34 2020 +0000 Gossipsub message id to shortened bytes (#1607) ## Issue Addressed ethereum/consensus-specs#2044 ## Proposed Changes Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field. ## Additional Info We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update. commit e5fc6ba Author: Michael Sproul <[email protected]> Date: Mon Sep 14 10:58:15 2020 +0000 Remove redundant decompression in process_deposit (#1610) ## Issue Addressed Closes #1076 ## Proposed Changes Remove an extra unnecessary decompression of the deposit public key from `process_deposit`. The key is decompressed and used to verify the signature in `verify_deposit_signature`, making this initial decompression redundant. ## Additional Info This is _not_ a consensus-breaking change because keys which previously failed the early decompression check will not be found in the pubkey cache (they are invalid), and will be checked and rejected as part of `verify_deposit_signature`. commit c9596fc Author: Age Manning <[email protected]> Date: Sun Sep 13 23:58:49 2020 +0000 Temporary Sync Work-Around (#1615) ## Issue Addressed #1590 ## Proposed Changes This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready. commit c6abc56 Author: Age Manning <[email protected]> Date: Fri Sep 11 02:33:36 2020 +0000 Prevent large step-size parameters (#1583) ## Issue Addressed Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. ## Proposed Changes Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter. ## Additional Info commit 7f1b936 Author: blacktemplar <[email protected]> Date: Fri Sep 11 01:43:15 2020 +0000 ignore too early / too late attestations instead of penalizing them (#1608) ## Issue Addressed NA ## Proposed Changes This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`) commit 810de2f Author: Daniel Schonfeld <[email protected]> Date: Fri Sep 11 01:43:13 2020 +0000 Static testnet configs (#1603) ## Issue Addressed #1431 ## Proposed Changes Added an archived zip file with required files manually ## Additional Info 1) Used zip, instead of tar.gz to add a single dependency instead of two. 2) I left the download from github code for now, waiting to hear if you'd like it cleaned up or left to be used for some tooling needs. commit 0525876 Author: Pawan Dhananjay <[email protected]> Date: Fri Sep 11 00:52:27 2020 +0000 Dial cached enr's before making subnet discovery query (#1376) ## Issue Addressed Closes #1365 ## Proposed Changes Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query. commit d79366c Author: Age Manning <[email protected]> Date: Thu Sep 10 04:43:22 2020 +0000 Prevent printing binary in RPC errors (#1604) ## Issue Addressed #1566 ## Proposed Changes Prevents printing binary characters in the RPC error response from peers. commit b19cf02 Author: Age Manning <[email protected]> Date: Thu Sep 10 03:51:06 2020 +0000 Penalise bad peer behaviour (#1602) ## Issue Addressed #1386 ## Proposed Changes Penalises peers in our scoring system that produce invalid attestations or blocks. commit dfe5077 Author: Paul Hauner <[email protected]> Date: Thu Sep 10 00:24:41 2020 +0000 Remove references to rust-docs (#1601) ## Issue Addressed - Resolves #897 - Resolves #821 ## Proposed Changes Removes references to the rust docs that we're no long maintaining. ## Additional Info NA commit 0821e6b Author: Paul Hauner <[email protected]> Date: Wed Sep 9 02:28:35 2020 +0000 Bump version to v0.2.9 (#1598) ## Issue Addressed NA ## Proposed Changes - Bump version tags - Run `cargo update` ## Additional Info NA commit 6759bd1 Author: pawan <[email protected]> Date: Tue Sep 8 12:39:22 2020 +0530 Add `strict-slashing-protection` flag commit 1644717 Author: pawan <[email protected]> Date: Fri Sep 4 16:46:43 2020 +0530 Remove `get_default_base_dir` commit e4783bf Author: pawan <[email protected]> Date: Fri Sep 4 16:24:45 2020 +0530 Address review comments commit 89b832c Author: pawan <[email protected]> Date: Thu Sep 3 00:30:53 2020 +0530 Fix tests commit 5940a52 Author: pawan <[email protected]> Date: Wed Sep 2 21:27:57 2020 +0530 Print out path information in account manager commit 5ca1341 Author: pawan <[email protected]> Date: Wed Sep 2 20:49:29 2020 +0530 Use datadir flag for account manager commit 8c82769 Author: pawan <[email protected]> Date: Wed Sep 2 20:10:44 2020 +0530 Rename BASE_DIR_FLAG to WALLET_DIR_FLAG commit ad34a0f Author: pawan <[email protected]> Date: Wed Sep 2 18:17:45 2020 +0530 Add validator-dir cli option commit 04fd9f7 Merge: 0e3ccba 8718120 Author: pawan <[email protected]> Date: Wed Sep 2 14:53:27 2020 +0530 Merge branch 'master' into directory-restructure commit 0e3ccba Author: pawan <[email protected]> Date: Wed Sep 2 14:40:36 2020 +0530 Minor fixes commit ad22178 Author: pawan <[email protected]> Date: Tue Sep 1 19:14:44 2020 +0530 Migration for ValidatorDefinitions commit 1022f58 Author: pawan <[email protected]> Date: Tue Sep 1 16:02:07 2020 +0530 Fix lints and udeps commit 4ddbbfe Author: pawan <[email protected]> Date: Tue Sep 1 12:56:12 2020 +0530 Remove more hardocoded values commit 4170e4c Author: pawan <[email protected]> Date: Tue Sep 1 12:50:43 2020 +0530 Remove some hardcoded values commit 5c46823 Merge: ae533f3 8301a98 Author: pawan <[email protected]> Date: Tue Sep 1 12:02:35 2020 +0530 Merge branch 'master' into directory-restructure commit ae533f3 Author: pawan <[email protected]> Date: Mon Aug 17 20:47:20 2020 +0530 Docs fixes commit fe13b1b Author: pawan <[email protected]> Date: Mon Aug 17 19:58:09 2020 +0530 lint commit e4a7e8c Merge: 01e6166 9a97a0b Author: pawan <[email protected]> Date: Mon Aug 17 19:47:27 2020 +0530 Merge branch 'master' into directory-restructure commit 01e6166 Author: pawan <[email protected]> Date: Mon Aug 17 19:23:57 2020 +0530 All binaries get directory info from common::directory crate commit 8a7c130 Author: pawan <[email protected]> Date: Mon Aug 17 17:18:44 2020 +0530 Add migration code for restructure commit a4c33cd Author: pawan <[email protected]> Date: Mon Aug 17 15:51:24 2020 +0530 Fix bug in secrets dir commit f4b8529 Author: pawan <[email protected]> Date: Mon Aug 17 13:05:28 2020 +0530 Restrucure validator directory paths commit 54b4ae6 Author: pawan <[email protected]> Date: Mon Aug 17 12:20:40 2020 +0530 Restructure account_manager directory paths commit 53892af Author: pawan <[email protected]> Date: Thu Aug 13 19:43:44 2020 +0530 Move get_testnet_dir to clap utils commit d1d269a Author: pawan <[email protected]> Date: Thu Aug 13 19:33:34 2020 +0530 Move beacon directory
Resolves #2039
Note to implementers: message-id was also reduced to 8-bytes. See discussion on bandwidth analysis below