feat: Enforce service specific rate limits for Lightpush #667

chaitanyaprem · 2023-08-23T04:55:27Z

Problem

As discussed here #594 (comment) , we should enforce some rate limits for heavy protocols such as store. Otherwise a node can be DoSed by store requests. Example: only allow to serve x request/second, or bytes/second. Beyond that, temporally blacklist.

Alternatives considered

None

Acceptance criteria

Validate by pumping service traffic and verifying rate limits are applied as configured.

chaitanyaprem · 2023-09-11T07:09:01Z

Rate-limiting would be required for the following protocols:
Store
Filter
Lightpush
Peer exchange
Rendezvous

@jm-clius , @alrevuelta Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush. Do confirm this.

This task would require analyzing what sort of limits are to be applied, req/sec or max streams, max connections etc. We should also need to come up with required configurations that we want to expose to users.

@jm-clius , @fryorcraken : Is this something that is required for the launch of The Waku network? I think it would be required.

fryorcraken · 2023-09-11T10:12:18Z

RLN rules need to be applied on light push on server side so that the node relaying doesn't get descored on gossipsub for forwarding messages with incorrect or no proof (when RLN applies)

jm-clius · 2023-09-11T16:20:20Z

Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush.

I think this is about limiting the number of requests that can be opened for the protocol on the server side. I imagine we want this rate limit to apply at a higher layer than the message generation rate (which is what RLN limits). Perhaps some token bucket mechanism applying to all request-response protocols?

Is this something that is required for the launch of The Waku network?

I'd say yes, at least in a basic form. One thing here is that we may not want to make this overly configurable at first. Could choose some value (e.g. max 5 requests per second) and design something simply around that.

chaitanyaprem · 2023-09-12T02:51:33Z

RLN rules need to be applied on light push on server side so that the node relaying doesn't get descored on gossipsub for forwarding messages with incorrect or no proof (when RLN applies)

Makes sense. cc @richard-ramos in case it is not already being done. We can take this up as part of #655?

chaitanyaprem · 2023-09-12T03:03:14Z

Is this something that is required for the launch of The Waku network?

I'd say yes, at least in a basic form. One thing here is that we may not want to make this overly configurable at first. Could choose some value (e.g. max 5 requests per second) and design something simply around that.

Agreed, we can start simple without any configurationsby enforcing below 2 types of limits.Do provide any comments.

Number of connections/peers per service. (This can be derived as servicePeers/no of services-enabled.).
Allowed msgRate from a peer per service (this can be set by default as 5 req/sec).

But, we may need to modify ratio of relay to servicePeers defined as per waku-org/nwaku#1813. This sub-item can be taken up as part of #679 .

chaitanyaprem · 2023-09-12T03:10:18Z

Since RLN would be enabled, I don't think we would require to enforce additional rate-limiting on lightPush.

I think this is about limiting the number of requests that can be opened for the protocol on the server side. I imagine we want this rate limit to apply at a higher layer than the message generation rate (which is what RLN limits). Perhaps some token bucket mechanism applying to all request-response protocols?

Yes, correct. We can use a token-bucket-filter to achieve such limit for other req/resp protocols. Refer to comment above on what limits we can apply.
But for lightPush, since a publisher will have to use RLN, just having RLN validation at lightPush server-side should achieve this. We may not need to have additional rate-limiting on top of RLN. Or am I missing some scenario here.

richard-ramos · 2023-09-12T22:41:10Z

Makes sense. cc @richard-ramos in case it is not already being done. We can take this up as part of #655?

Already done! In go-libp2p-pubsub, msgs are evaluated with validators before a message is broadcasted, so relaying nodes will not be scored negatively due to lightpush messages containing invalid proofs / waku messages. However validating proofs does have some cost associated to it so perhaps it make senses to do some service rate limit to avoid lightpush clients DoS lighpush nodes?

fryorcraken · 2023-09-13T04:27:25Z

@jm-clius where does this fit best in the Network roadmap? waku-org/research#3

chaitanyaprem · 2023-09-13T04:41:30Z

@jm-clius where does this fit best in the Network roadmap? waku-org/research#3

Probably under DoS Protection track?

jm-clius · 2023-09-13T09:57:53Z

Yeah, I'd include this under 3.2 Basic DoS protection in production, even though the epic description would have to be expanded to include service protocol request rate-limiting. This is because the roadmap only tracks the essential items to protect the network and not individual nodes. This rate limiting is of course very good to have, though. @vpavlin @Ivansete-status we may want to track similar work for nwaku in an issue (rate limiting requests in request-response protocols, similar to how nimbus does it).

jm-clius · 2023-09-13T15:15:59Z

After some thinking, I think we might need a new epic/milestone for issues like this that does not fall neatly into existing epics or the strictly necessary scope for the gen 0 network launch. Taking up this conversation elsewhere.

Ivansete-status · 2023-09-14T07:47:34Z

Good morning and interesting point! 🥇

In my opinion, I vote for having only one rate limit per node and not having different rate limits per service because:

The fundamental limitation falls under the node's available resources (CPU, memory, file descriptors, etc.)
Simpler to understand and explain.
Simpler to test. For example, the limit of the Lightpush protocol might differ whether the Store protocol is being stressed or not.
Simpler to make scaling up/down decisions if we plan to make nodes be part of a Kubernetes cluster or AWS ECS, for example.

Nevertheless, we cannot make any decision until we measure the actual limits.

@jm-clius, @vpavlin - I've created a draft issue in nwaku repo. Feel free to change or adapt anything you consider appropriate :) waku-org/nwaku#2032

chaitanyaprem · 2023-09-14T09:49:35Z

Good morning and interesting point! 🥇

In my opinion, I vote for having only one rate limit per node and not having different rate limits per service because:
1. The fundamental limitation falls under the node's available resources (CPU, memory, file descriptors, etc.)

2. Simpler to understand and explain.

3. Simpler to test. For example, the limit of the _Lightpush_ protocol might differ whether the _Store_ protocol is being stressed or not.

4. Simpler to make scaling up/down decisions if we plan to make nodes be part of a Kubernetes cluster or AWS ECS, for example.
Nevertheless, we cannot make any decision until we measure the actual limits.

@jm-clius, @vpavlin - I've created a draft issue in nwaku repo. Feel free to change or adapt anything you consider appropriate :) waku-org/nwaku#2032

Valid points @Ivansete-status .
But, wouldn't the usage of node's resources vary per service-protocol?
e.g: A store query would result in using different resources than a lightPush or a filter subscription. Hence, i was suggesting to specify different rate-limits per service.
To begin with these rate-limits could be the same (hardcoded in the code) for all services.
But, It will also give us an easier path to future to apply different rate-limits once we measure the resource usage per protocol based on various tests.

Ivansete-status · 2023-09-14T12:04:55Z

You are absolutely right @chaitanyaprem in the sense that it may require different resources.
IMHO, the approach of setting separate limits would make more sense if we had some kind of "microservice" architecture. But, given we have a single node attending multiple services, the load handled by one service will impact other services. Viewed from a different perspective: a node with only Relay could have a higher limit than a node with Relay, Store, etc.

I'm not saying that your approach is wrong. I think we should take measures of such limits, that will help us decide better the correct approach. For me, the simpler the better, though :)

chaitanyaprem · 2023-09-15T04:36:08Z

Thinking about it in detail, i think following are the 2 reasons for having such rate-limits:

to prevent service-nodes from getting DOS'ed as RLN won't be there to prevent it. For this we can start with a simple global service level rate-limit as you suggested. This should be sufficient to prevent nodes from getting DOS'ed.
secondly, the nodes users may choose to run the services along with relay would be heterogeneous in nature. i.e each operator may want to allocate different usage buckets/patterns to services. It could be based on the app(s) they want to support in the network. This would lead to a need for having a fine-grain rate-limits to be set at service-level so that each operator can choose to set these values individually.
e.g: If i(as a node operator) want to support more of filter and light-push than store for a particular shard..but also want to support store(with remaining resources), it would help in setting these limits differently. i.e my req/sec for store would be maybe 10%-20% of filter and lightpush.

The second point would also probably need a protocol upgrade to indicate to lightClients what is a service-node's capacity and how much can it allocate to the client as part of some capability exchange. This would probably give lightClient an idea whether to continue using this serviceNode or not. But anyways, that would be for later.

I think i agree with you for now, since the objective is only DOS prevention, we can go ahead with defining an overall rate-limit would suffice.

cc @harsh-98 , As per discussion above, we can just start with a simple rate-limit implementation for overall services for now to prevent DOS. In order to think of having a fine-grain rate-limit per service, we need to do some measurements and more analysis to come up with an approach.
Since this issue has all the history of discussion, this can be used for the future implementation. You can go ahead and create a new issue to do the simple rate-limit implementation for now and refer to this issue.

chair28980 · 2023-12-05T14:01:36Z

Review Q1 2024 -- not yet implemented in nwaku.

chaitanyaprem · 2024-02-01T06:15:23Z

This would be required for status desktop as it will be providing Filter and Lightpush services as per status-im/status-go#4655.

Scope of this can be limited to only Lightpush service as of now, since Filter is restricted by number of Filter subscriptions which is already set in status-go PR mentioned above.

Have a simple Leaky or Token bucket to restrict req/sec for Lightpush.

Some sort of rate-limiting for Store also has to be enabled if StoreNode functionality gets enabled in status-desktop.
@richard-ramos any idea if that is planned?

@chair28980 adding this to status-waku-integ board for tracking.

richard-ramos · 2024-02-01T14:59:35Z

Not planned for status. The functionality was removed from status-go. I'll attempt to implement a leaky bucket since it's a simple addition to the code.

chair28980 · 2024-02-02T07:55:36Z

Removing from status-waku-integ per Richard's comment.

chaitanyaprem · 2024-02-02T07:59:51Z

Removing from status-waku-integ per Richard's comment.

I think @richard-ramos was talking about Store.

But for lightpush, since we had enabled it on desktop this would be required as part of status-waku-integ in order to protect desktop node from being DOS'ed with lightpush requests.

cc @chair28980

chair28980 · 2024-02-02T08:03:07Z

@chaitanyaprem thank you for the clarity :)

chaitanyaprem added the E:2023-peer-mgmt label Aug 23, 2023

chaitanyaprem changed the title ~~feat: Enforce Service specific rate limits~~ feat: Enforce service specific rate limits Aug 23, 2023

chaitanyaprem mentioned this issue Sep 11, 2023

Rate limit request/response protocols #606

Closed

chair28980 assigned chair28980 and harsh-98 and unassigned chair28980 Sep 12, 2023

harsh-98 added E:3.2: Basic DoS protection in production See https://github.com/waku-org/pm/issues/70 for details and removed E:2023-peer-mgmt labels Sep 13, 2023

Ivansete-status mentioned this issue Sep 14, 2023

feat: Enforce service specific rate limits waku-org/nwaku#2032

Closed

5 tasks

harsh-98 mentioned this issue Sep 15, 2023

feat: global rate limit for number of msg per peer #744

Closed

chair28980 unassigned harsh-98 Nov 28, 2023

chaitanyaprem mentioned this issue Dec 25, 2023

feat: Managing idle filter subscriptions waku-org/nwaku#2293

Open

2 tasks

chaitanyaprem mentioned this issue Jan 11, 2024

feat: unique peer selection for same content filters #1001

Closed

chaitanyaprem added the status-waku-integ All issues relating to the Status Waku integration. label Feb 1, 2024

chaitanyaprem changed the title ~~feat: Enforce service specific rate limits~~ feat: Enforce service specific rate limits for Lightpush and Store Feb 1, 2024

chaitanyaprem changed the title ~~feat: Enforce service specific rate limits for Lightpush and Store~~ feat: Enforce service specific rate limits for Lightpush Feb 1, 2024

richard-ramos self-assigned this Feb 1, 2024

richard-ramos mentioned this issue Feb 1, 2024

chore: add rate limiter option to lightpush #1024

Merged

chair28980 removed the status-waku-integ All issues relating to the Status Waku integration. label Feb 2, 2024

chair28980 added the status-waku-integ All issues relating to the Status Waku integration. label Feb 2, 2024

richard-ramos closed this as completed in #1024 Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enforce service specific rate limits for Lightpush #667

feat: Enforce service specific rate limits for Lightpush #667

chaitanyaprem commented Aug 23, 2023

chaitanyaprem commented Sep 11, 2023 •

edited

Loading

fryorcraken commented Sep 11, 2023

jm-clius commented Sep 11, 2023

chaitanyaprem commented Sep 12, 2023

chaitanyaprem commented Sep 12, 2023 •

edited

Loading

chaitanyaprem commented Sep 12, 2023

richard-ramos commented Sep 12, 2023

fryorcraken commented Sep 13, 2023

chaitanyaprem commented Sep 13, 2023

jm-clius commented Sep 13, 2023

jm-clius commented Sep 13, 2023

Ivansete-status commented Sep 14, 2023

chaitanyaprem commented Sep 14, 2023

Ivansete-status commented Sep 14, 2023

chaitanyaprem commented Sep 15, 2023 •

edited

Loading

chair28980 commented Dec 5, 2023

chaitanyaprem commented Feb 1, 2024 •

edited

Loading

richard-ramos commented Feb 1, 2024 •

edited

Loading

chair28980 commented Feb 2, 2024

chaitanyaprem commented Feb 2, 2024

chair28980 commented Feb 2, 2024

feat: Enforce service specific rate limits for Lightpush #667

feat: Enforce service specific rate limits for Lightpush #667

Comments

chaitanyaprem commented Aug 23, 2023

Problem

Suggested solution

Alternatives considered

Acceptance criteria

chaitanyaprem commented Sep 11, 2023 • edited Loading

fryorcraken commented Sep 11, 2023

jm-clius commented Sep 11, 2023

chaitanyaprem commented Sep 12, 2023

chaitanyaprem commented Sep 12, 2023 • edited Loading

chaitanyaprem commented Sep 12, 2023

richard-ramos commented Sep 12, 2023

fryorcraken commented Sep 13, 2023

chaitanyaprem commented Sep 13, 2023

jm-clius commented Sep 13, 2023

jm-clius commented Sep 13, 2023

Ivansete-status commented Sep 14, 2023

chaitanyaprem commented Sep 14, 2023

Ivansete-status commented Sep 14, 2023

chaitanyaprem commented Sep 15, 2023 • edited Loading

chair28980 commented Dec 5, 2023

chaitanyaprem commented Feb 1, 2024 • edited Loading

richard-ramos commented Feb 1, 2024 • edited Loading

chair28980 commented Feb 2, 2024

chaitanyaprem commented Feb 2, 2024

chair28980 commented Feb 2, 2024

chaitanyaprem commented Sep 11, 2023 •

edited

Loading

chaitanyaprem commented Sep 12, 2023 •

edited

Loading

chaitanyaprem commented Sep 15, 2023 •

edited

Loading

chaitanyaprem commented Feb 1, 2024 •

edited

Loading

richard-ramos commented Feb 1, 2024 •

edited

Loading