Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dos and monitoring docs #160

Merged
merged 11 commits into from
Aug 15, 2022
Merged

Add dos and monitoring docs #160

merged 11 commits into from
Aug 15, 2022

Conversation

MarcoPolo
Copy link
Contributor

Addresses libp2p/go-libp2p#1711.

Adds some documentation and links for libp2p monitoring and dos mitigation.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MarcoPolo for writing this up!


# General tactics that work across libp2p implementations

## Limit the number of concurrent streams your protocol needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I think a similar section on the number of connections would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/monitoring.md Outdated Show resolved Hide resolved
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @MarcoPolo . It's great starting to get to see how this will be exposed to users.

I'm copying in some of the comments I had in https://www.notion.so/pl-strflt/Guide-for-how-to-respond-to-resource-exhaustion-attacks-b10f55cc9a3d4917ae80c9b914e05e8c

The information above is really helpful/good. Looking at this fresh, I think we need to paint a big picture. I view these a layering that someone should think about. I’m apt to order chronologically based on the lifecycle of a libp2p application from design, development, testing, and operating.

  1. What are the steps I can take to protect my application? Basically what are all the defenses I can have in place before my application is deployed?
    1. (Should be thought about early) Architect application well with limiting blast radius as described above
    2. Give clear signal of problems as highlighted above by using canonicallog
    3. Setup/tune resource management for each libp2p node. Here we can link to resource manager docs. We have to make sure this is user-friendly. There are followups in More documentation around limits go-libp2p-resource-manager#68 to handle.
  2. How do I know the health of my node? This is the monitoring doc.
  3. What are the steps I can take when being attacked? If the steps above were taken, an attack shouldn’t fully compromise my node but it may degrade performance. Followup actions may be needed.
    1. Ban offending IPs. It’s best to automate this as shown above using fail2ban.
    2. Maybe something about using the denylist?

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
Comment on lines +63 to +64
Depending on your use case, it can help to limit the number of inbound
connections. You can use go-libp2p's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we give guidance on when to use this mechanism vs. the resource manager? I see this is a good hook for custom logic, but it seems like what Prysm is doing could be covered by go-libp2p resource manager right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we give guidance on when to use this mechanism vs. the resource manager?

No, but I could add something here.

I see this is a good hook for custom logic, but it seems like what Prysm is doing could be covered by go-libp2p resource manager right?

Not really. If you're trying to avoid an adversary that can connect to you and give you a ton of work to do all at once the rcmgr doesn't protect at all. This attack can easily be mitigated by rate limiting though.

Not all applications will want this rate limiting, or they may want to rate limit certain things (e.g. something in the protocol rather than in the connections). For example, if I'm Google I wouldn't want to rate limit any new connection to me. I would rather rate limit work per connection.

Should the rcmgr do this? I don't think so. It's not directly related to limiting the resources being used, and if it can be handled by a smaller component that already exists the better.

I hope that makes sense, but happy to expand more as well.

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoPolo for all the work here. I'm game to take one last look once comments are incorporated.

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
Comment on lines 22 to 23
Using a stream for a short period of time and then closing it is fine. It's
really the number of _concurrent_ streams that you need to be careful of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know more than me here. Would it be useful to use Identify as an example?

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
own resource usage. So limiting connections can have a leveraged effect on your
resource usage.

In go-libp2p the number of active connections is managed by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading through this doc I don't think it's clear for a user on when to use the connmgr or the resource manager for go-libp2p.

`ConnManager` will trim connections when you hit the high watermark number of
connections. You can protect certain connections with the `.Protect` method.

In rust-libp2p handlers should implement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a stream for a short period of time and then closing it is fine. It's
really the number of _concurrent_ streams that you need to be careful of.

## Limit the number of connections your application needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given connections happen before streams in an application's lifecycle, maybe move this above the section above?


Here are some more specific recommendations

## Limit the number of concurrent streams per connection your protocol needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give pointers on how to do this?

For go-libp2p this means using resource manager right?

For rust, this isn't at the connection level, but somewhere I think we should be linking to https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams . Maybe we say, "rust-libp2p relies on each protocol to limit the number of streams per connection in XXX. A global upperbound on negotiating/transient inbound streams can be set using https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should be more clear.

Here I'm talking about limiting the number of concurrent streams you need by design of the protocol, as opposed to using an existing protocol and trying to limit the streams at the end. For example imagine a RPC style protocol whose procedures are async and often take a long time to return (say > 1min). Here are two ways you could implement it:

  1. Open a stream for each RPC call, and keep that stream open until the rpc call returns.
  2. Open a stream for the start of the call then close it. The remote side will open a new stream with the answer.

Assume you make a lot of concurrent calls, method 1 would result in a large number of concurrent and mostly inactive streams. Method 2 would result in a fewer number of concurrent streams, and thus lower memory footprint.

If you add a limit here of say 10 streams, then method 1 will mean you can only have 10 concurrent RPC calls, while method 2 would let you have a much larger number of concurrent RPC calls.

Does that make sense? I should rephrase this to focus on the fact this is about protocol design (the inception stage of a p2p application) not about the deployed stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - makes sense.

Side: lets find/create a place to point to https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams . Maybe there's a section about transient/negotiating connections and resources and that those should guarded against too. Go and Rust both have some protections here.

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
@MarcoPolo
Copy link
Contributor Author

@BigLep Thanks for the review! I appreciate the help in getting us some good docs.

I believe I addressed all the comments. Another review would be very much appreciated.

@MarcoPolo MarcoPolo requested a review from BigLep August 12, 2022 20:58
Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @MarcoPolo - good stuff. A few last comments but feel free to ship once incorporated. Good times!

The `ConnManager` will trim connections when you hit the high watermark number of
connections, and try to keep the number of connections above the low watermark.
You can protect certain connections with the
[`.Protect`](https://github.com/libp2p/rust-libp2p/blob/ea487aebfe6eb672b05d2bec2d9d79bbd92450ba/protocols/kad/src/handler.rs#L562)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rust-libp2p link.

2. Open a stream for the start of the call then close it. The remote side will
open a new stream with the response.

Assume we make a lot of concurrent calls, method 1 would result in a large
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assume we make a lot of concurrent calls, method 1 would result in a large
Assume we make a lot of concurrent calls. Method 1 would result in a large

Comment on lines 205 to 206
implements their (Connection
Gater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implements their (Connection
Gater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43].
implements their (ConnectionGater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43].

Fixing the rendering issue here: https://bafybeid4zqncc4v5epc4urfvl5ajgnmqeksksk4xrgabdwaxxswpsigh6y.on.fleek.co/reference/dos-mitigation/#leverage-the-resource-manager-to-limit-resource-usage-go-libp2p-only

image

Copy link
Contributor Author

@MarcoPolo MarcoPolo Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a misuse of (foo)[bar] vs [foo](bar)

content/reference/dos-mitigation.md Outdated Show resolved Hide resolved
usage. So limiting connections can have a leveraged effect on your resource
usage.

In go-libp2p the number of active connections is managed by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpics here:

  1. We use connmgr and ConnManager
  2. When we're hyperlinking, I think it's good to remove the ticks so it's clear that it's a hyperlink. (see screenshot of rendering)
  3. We do "ConnManager" no space and "Resource Manager" with space.
  4. Maybe the first time we talk about Resource Manager here we make it a hyperlink?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do "ConnManager" no space and "Resource Manager" with space.

In go-libp2p-core they're called ConnManager and ResourceManager. Using Conn Manager feels weird and so does ResourceManager although happy to make one change or the other if you feel strongly.

When we're hyperlinking, I think it's good to remove the ticks so it's clear that it's a hyperlink. (see screenshot of rendering)

I think our template should support the fixed width + hyperlink. I'll see if I can fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our template should support the fixed width + hyperlink. I'll see if I can fix it.

Fixed this by adding a text-decoration: underline css property.

Screen Shot 2022-08-15 at 4 25 20 PM

Copy link
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - lets ship!

@MarcoPolo MarcoPolo merged commit 5b2b5d4 into master Aug 15, 2022
@MarcoPolo MarcoPolo deleted the marco/monitor-and-dos-docs branch August 15, 2022 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants