Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enforce service specific rate limits for Lightpush #667

Closed
chaitanyaprem opened this issue Aug 23, 2023 · 21 comments · Fixed by #1024
Closed

feat: Enforce service specific rate limits for Lightpush #667

chaitanyaprem opened this issue Aug 23, 2023 · 21 comments · Fixed by #1024
Assignees
Labels
E:3.2: Basic DoS protection in production See https://github.com/waku-org/pm/issues/70 for details status-waku-integ All issues relating to the Status Waku integration.

Comments

@chaitanyaprem
Copy link
Collaborator

Problem

As discussed here #594 (comment) , we should enforce some rate limits for heavy protocols such as store. Otherwise a node can be DoSed by store requests. Example: only allow to serve x request/second, or bytes/second. Beyond that, temporally blacklist.

Suggested solution

We can use go-libp2p scaling limit config. e.g: limits set by go-libp2p for the default services: https://github.com/libp2p/go-libp2p/blob/master/limits.go#L15 . We can control maximum number of streams allocated to each service and memory.
Provide configuration for these rate-limits and good defaults as well.

Alternatives considered

None

Acceptance criteria

Validate by pumping service traffic and verifying rate limits are applied as configured.

@chaitanyaprem chaitanyaprem changed the title feat: Enforce Service specific rate limits feat: Enforce service specific rate limits Aug 23, 2023
@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 11, 2023

Rate-limiting would be required for the following protocols:
Store
Filter
Lightpush
Peer exchange
Rendezvous

@jm-clius , @alrevuelta Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush. Do confirm this.

This task would require analyzing what sort of limits are to be applied, req/sec or max streams, max connections etc. We should also need to come up with required configurations that we want to expose to users.

@jm-clius , @fryorcraken : Is this something that is required for the launch of The Waku network? I think it would be required.

@fryorcraken
Copy link
Collaborator

RLN rules need to be applied on light push on server side so that the node relaying doesn't get descored on gossipsub for forwarding messages with incorrect or no proof (when RLN applies)

@jm-clius
Copy link

Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush.

I think this is about limiting the number of requests that can be opened for the protocol on the server side. I imagine we want this rate limit to apply at a higher layer than the message generation rate (which is what RLN limits). Perhaps some token bucket mechanism applying to all request-response protocols?

Is this something that is required for the launch of The Waku network?

I'd say yes, at least in a basic form. One thing here is that we may not want to make this overly configurable at first. Could choose some value (e.g. max 5 requests per second) and design something simply around that.

@chaitanyaprem
Copy link
Collaborator Author

RLN rules need to be applied on light push on server side so that the node relaying doesn't get descored on gossipsub for forwarding messages with incorrect or no proof (when RLN applies)

Makes sense. cc @richard-ramos in case it is not already being done. We can take this up as part of #655?

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 12, 2023

Is this something that is required for the launch of The Waku network?

I'd say yes, at least in a basic form. One thing here is that we may not want to make this overly configurable at first. Could choose some value (e.g. max 5 requests per second) and design something simply around that.

Agreed, we can start simple without any configurationsby enforcing below 2 types of limits.Do provide any comments.

  1. Number of connections/peers per service. (This can be derived as servicePeers/no of services-enabled.).
  2. Allowed msgRate from a peer per service (this can be set by default as 5 req/sec).

But, we may need to modify ratio of relay to servicePeers defined as per waku-org/nwaku#1813. This sub-item can be taken up as part of #679 .

@chaitanyaprem
Copy link
Collaborator Author

Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush.

I think this is about limiting the number of requests that can be opened for the protocol on the server side. I imagine we want this rate limit to apply at a higher layer than the message generation rate (which is what RLN limits). Perhaps some token bucket mechanism applying to all request-response protocols?

Yes, correct. We can use a token-bucket-filter to achieve such limit for other req/resp protocols. Refer to comment above on what limits we can apply.
But for lightPush, since a publisher will have to use RLN, just having RLN validation at lightPush server-side should achieve this. We may not need to have additional rate-limiting on top of RLN. Or am I missing some scenario here.

@chair28980 chair28980 assigned chair28980 and harsh-98 and unassigned chair28980 Sep 12, 2023
@richard-ramos
Copy link
Member

Makes sense. cc @richard-ramos in case it is not already being done. We can take this up as part of #655?

Already done! In go-libp2p-pubsub, msgs are evaluated with validators before a message is broadcasted, so relaying nodes will not be scored negatively due to lightpush messages containing invalid proofs / waku messages. However validating proofs does have some cost associated to it so perhaps it make senses to do some service rate limit to avoid lightpush clients DoS lighpush nodes?

@fryorcraken
Copy link
Collaborator

@jm-clius where does this fit best in the Network roadmap? waku-org/research#3

@chaitanyaprem
Copy link
Collaborator Author

@jm-clius where does this fit best in the Network roadmap? waku-org/research#3

Probably under DoS Protection track?

@jm-clius
Copy link

Yeah, I'd include this under 3.2 Basic DoS protection in production, even though the epic description would have to be expanded to include service protocol request rate-limiting. This is because the roadmap only tracks the essential items to protect the network and not individual nodes. This rate limiting is of course very good to have, though. @vpavlin @Ivansete-status we may want to track similar work for nwaku in an issue (rate limiting requests in request-response protocols, similar to how nimbus does it).

@harsh-98 harsh-98 added E:3.2: Basic DoS protection in production See https://github.com/waku-org/pm/issues/70 for details and removed E:2023-peer-mgmt labels Sep 13, 2023
@jm-clius
Copy link

After some thinking, I think we might need a new epic/milestone for issues like this that does not fall neatly into existing epics or the strictly necessary scope for the gen 0 network launch. Taking up this conversation elsewhere.

@Ivansete-status
Copy link

Good morning and interesting point! 🥇

In my opinion, I vote for having only one rate limit per node and not having different rate limits per service because:

  1. The fundamental limitation falls under the node's available resources (CPU, memory, file descriptors, etc.)
  2. Simpler to understand and explain.
  3. Simpler to test. For example, the limit of the Lightpush protocol might differ whether the Store protocol is being stressed or not.
  4. Simpler to make scaling up/down decisions if we plan to make nodes be part of a Kubernetes cluster or AWS ECS, for example.

Nevertheless, we cannot make any decision until we measure the actual limits.

@jm-clius, @vpavlin - I've created a draft issue in nwaku repo. Feel free to change or adapt anything you consider appropriate :) waku-org/nwaku#2032

@chaitanyaprem
Copy link
Collaborator Author

Good morning and interesting point! 🥇

In my opinion, I vote for having only one rate limit per node and not having different rate limits per service because:

1. The fundamental limitation falls under the node's available resources (CPU, memory, file descriptors, etc.)

2. Simpler to understand and explain.

3. Simpler to test. For example, the limit of the _Lightpush_ protocol might differ whether the _Store_ protocol is being stressed or not.

4. Simpler to make scaling up/down decisions if we plan to make nodes be part of a Kubernetes cluster or AWS ECS, for example.

Nevertheless, we cannot make any decision until we measure the actual limits.

@jm-clius, @vpavlin - I've created a draft issue in nwaku repo. Feel free to change or adapt anything you consider appropriate :) waku-org/nwaku#2032

Valid points @Ivansete-status .
But, wouldn't the usage of node's resources vary per service-protocol?
e.g: A store query would result in using different resources than a lightPush or a filter subscription. Hence, i was suggesting to specify different rate-limits per service.
To begin with these rate-limits could be the same (hardcoded in the code) for all services.
But, It will also give us an easier path to future to apply different rate-limits once we measure the resource usage per protocol based on various tests.

@Ivansete-status
Copy link

You are absolutely right @chaitanyaprem in the sense that it may require different resources.
IMHO, the approach of setting separate limits would make more sense if we had some kind of "microservice" architecture. But, given we have a single node attending multiple services, the load handled by one service will impact other services. Viewed from a different perspective: a node with only Relay could have a higher limit than a node with Relay, Store, etc.

I'm not saying that your approach is wrong. I think we should take measures of such limits, that will help us decide better the correct approach. For me, the simpler the better, though :)

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 15, 2023

Thinking about it in detail, i think following are the 2 reasons for having such rate-limits:

  1. to prevent service-nodes from getting DOS'ed as RLN won't be there to prevent it. For this we can start with a simple global service level rate-limit as you suggested. This should be sufficient to prevent nodes from getting DOS'ed.
  2. secondly, the nodes users may choose to run the services along with relay would be heterogeneous in nature. i.e each operator may want to allocate different usage buckets/patterns to services. It could be based on the app(s) they want to support in the network. This would lead to a need for having a fine-grain rate-limits to be set at service-level so that each operator can choose to set these values individually.
    e.g: If i(as a node operator) want to support more of filter and light-push than store for a particular shard..but also want to support store(with remaining resources), it would help in setting these limits differently. i.e my req/sec for store would be maybe 10%-20% of filter and lightpush.

The second point would also probably need a protocol upgrade to indicate to lightClients what is a service-node's capacity and how much can it allocate to the client as part of some capability exchange. This would probably give lightClient an idea whether to continue using this serviceNode or not. But anyways, that would be for later.

I think i agree with you for now, since the objective is only DOS prevention, we can go ahead with defining an overall rate-limit would suffice.

cc @harsh-98 , As per discussion above, we can just start with a simple rate-limit implementation for overall services for now to prevent DOS. In order to think of having a fine-grain rate-limit per service, we need to do some measurements and more analysis to come up with an approach.
Since this issue has all the history of discussion, this can be used for the future implementation. You can go ahead and create a new issue to do the simple rate-limit implementation for now and refer to this issue.

@chair28980
Copy link
Contributor

Review Q1 2024 -- not yet implemented in nwaku.

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Feb 1, 2024

This would be required for status desktop as it will be providing Filter and Lightpush services as per status-im/status-go#4655.

Scope of this can be limited to only Lightpush service as of now, since Filter is restricted by number of Filter subscriptions which is already set in status-go PR mentioned above.

Have a simple Leaky or Token bucket to restrict req/sec for Lightpush.

Some sort of rate-limiting for Store also has to be enabled if StoreNode functionality gets enabled in status-desktop.
@richard-ramos any idea if that is planned?

@chair28980 adding this to status-waku-integ board for tracking.

@chaitanyaprem chaitanyaprem added the status-waku-integ All issues relating to the Status Waku integration. label Feb 1, 2024
@chaitanyaprem chaitanyaprem changed the title feat: Enforce service specific rate limits feat: Enforce service specific rate limits for Lightpush and Store Feb 1, 2024
@chaitanyaprem chaitanyaprem changed the title feat: Enforce service specific rate limits for Lightpush and Store feat: Enforce service specific rate limits for Lightpush Feb 1, 2024
@richard-ramos richard-ramos self-assigned this Feb 1, 2024
@richard-ramos
Copy link
Member

richard-ramos commented Feb 1, 2024

Not planned for status. The functionality was removed from status-go. I'll attempt to implement a leaky bucket since it's a simple addition to the code.

@chair28980 chair28980 removed the status-waku-integ All issues relating to the Status Waku integration. label Feb 2, 2024
@chair28980
Copy link
Contributor

Removing from status-waku-integ per Richard's comment.

@chaitanyaprem
Copy link
Collaborator Author

Removing from status-waku-integ per Richard's comment.

I think @richard-ramos was talking about Store.

But for lightpush, since we had enabled it on desktop this would be required as part of status-waku-integ in order to protect desktop node from being DOS'ed with lightpush requests.

cc @chair28980

@chair28980
Copy link
Contributor

@chaitanyaprem thank you for the clarity :)

@chair28980 chair28980 added the status-waku-integ All issues relating to the Status Waku integration. label Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:3.2: Basic DoS protection in production See https://github.com/waku-org/pm/issues/70 for details status-waku-integ All issues relating to the Status Waku integration.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants