Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect message list limits when creating messages to send #218

Merged
merged 5 commits into from
Feb 2, 2022

Conversation

ajsutton
Copy link
Contributor

@ajsutton ajsutton commented Feb 1, 2022

Somewhat naive approach to ensuring created messages are within the message list limits. Before adding each message part to the new message all list limits are checked. If limits would be exceeded the message part is re-added to the pending pool.

Some concerns:

  1. Is it too expensive to perform these checks for every part we add?
  2. Is it possible to deduplicate the checks in validateMessageListLimits (single message) and validateMergedMessageListLimits (two messages)?
  3. Is there a better approach to handling parts that don't fit than just adding them back to the queue?
  4. Will something actually trigger sending the remaining parts in a later message or do they get stuck?
  5. How do we test this properly?

@Nashatyrev
Copy link
Collaborator

Some background from slack:
@ajsutton wrote:

I’ve been investigating a problem with gossip messages not making it through in a custom network for a while now.
Turns out that at startup Teku is activating 3 different forks (phase0, altair and bellatrix) because they’re activating within 2 epochs. Gossip topics are fork specific so teku is subscribing to 3 times as many topics as it would normally follow all at once at startup. Since it’s told to subscribe to all subnets that’s 64 individual attestation subnets plus the blocks, aggregates etc subnets * 3 forks. In total we’re subscribing to 217 topics.
To avoid DOS attacks, Teku limits the number of topics you can ask to subscribe to in a single message and the total number of topics a peer can be subscribed to at once. We set them to absurdly high values that were more than twice what you’d need to be subscribed to... 100 topics per subscription request and a total of 200 topics that can be subscribed to at once. Because the limits were absurd we didn’t actually get teku to split up requests so we wind up ignoring the only subscription message that’s sent on both sides and hence the two teku instances aren’t actually subscribed to any gossip at all.
It also violates the maxSubscriptions and maxGraft settings for libp2p gossip (both set to 200). libp2p probably should be splitting the messages it sends to stay under those limits automatically but isn’t.

@Nashatyrev wrote:

Did recently something slightly similar with mplex frames here: #205

@Nashatyrev
Copy link
Collaborator

Basically the approach looks good.
Yep, I also feel the intention to get rid the duplicating validate method. I'll check if the clone() and then validate 1 message works fast enough.
I would optimistically build the whole message and then validate. If validation fails then handle it with the proposed method. Looks like too large messages are exceptional cases mostly
I could handle testing as well

@Nashatyrev
Copy link
Collaborator

Nashatyrev commented Feb 1, 2022

BTW, does it make sense to simply increase subscription limits as soon as we hit such case?

@Nashatyrev
Copy link
Collaborator

What do you think about this variant? 965636b

  • Optimistically builds and validates the whole message initially (expected to be the vast majority of cases, so probably performance issues could be neglected)
  • Use the single validate method
  • Stops filling a single message once any list hit the limit. This looks safer since if we try to fill all lists, the following potential invalid case may happen: the topic subscription was not included in the first message (due to limit) while publication for this topic was
  • Sending all messages when split occurs. There is potential message rate limit issue, but I believe it is far from happening with the current setup
  • Added initial test. Going to add more when agree on approach

@ajsutton
Copy link
Contributor Author

ajsutton commented Feb 1, 2022

Yeah I really like that approach. You're right that it's quite unusual to exceed these message limits (only case I know of is when three forks are scheduled for subsequent epochs which only happens in devnets).

In terms of increasing the subscription limits, I'm assuming at the libp2p level the current limits must be somewhat agreed between clients though not really sure of that. At the Teku level yeah I think we can just increase the limits to make things fit without opening up any real risks - there are still a limited number of "relevant" topics that we allow subscriptions to and it won't require that big an increase on the current limits.

@Nashatyrev
Copy link
Collaborator

Fixed an issue here 9d39652. When new peer connected onPeerActive didn't use the collectPeerMessages() method
Also added some more tests

@Nashatyrev Nashatyrev marked this pull request as ready for review February 2, 2022 10:17
Copy link
Collaborator

@Nashatyrev Nashatyrev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Anton, looks good.

@ajsutton ajsutton merged commit 6a8de4c into libp2p:develop Feb 2, 2022
@ajsutton ajsutton deleted the limit-list-lengths branch February 2, 2022 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants