Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcmgr: consider removing memory tracking, except for muxers #1708

Open
marten-seemann opened this issue Jun 23, 2022 · 6 comments
Open

rcmgr: consider removing memory tracking, except for muxers #1708

marten-seemann opened this issue Jun 23, 2022 · 6 comments

Comments

@marten-seemann
Copy link
Contributor

Having looked through traces / rcmgr metrics, the only significant (tracked) consumer of memory in our stack are the stream multiplexers. The only memory allocations outside of the stream muxers are allocations of a few kB to parse protobufs.

One might argue that this only shows that more application protocols need to adopt the resource manager and in order to track every memory allocation. I'm not convinced by that argument:

  1. The interface provided by the resource manager is not really usable in practice. In most application protocols, forward progress is only possible when memory can be allocated. The resource manager doesn't provide a way to block until a memory allocation is possible (and it's questionable if that's a wise thing to do when under memory pressure)
  2. and more importantly: This seems like overreach from libp2p's side. We're not in the business of building a kernel. All that you can reasonably expect from a networking stack is that it solves your p2p problems (and doesn't kill your application while doing so). You can't ask it to solve your general resource allocation problems.

We should therefore consider dropping the memory tracking from all scopes, expect:

  • the peer scope: that's how we make sure that you can't increase your stream flow control windows to 8 MB on 500 streams
  • the system scope: to provide a global limit for all peer scopes

Thoughts, @MarcoPolo @vyzo @Stebalien?

@marten-seemann
Copy link
Contributor Author

Going one step further, one could argue that this is purely a yamux (and mplex?) problem, and that reasonable stream multiplexers will have a connection-level flow control window. For QUIC, we use 15 MB per connection:

MaxConnectionReceiveWindow: 15 * (1 << 20), // 15 MB
.

At reasonable numbers of connections (a few 100), this shouldn't cause any problems except on the very weakest of nodes.

@vyzo
Copy link
Contributor

vyzo commented Jun 23, 2022

thats a step backwards and premature optimizatio i think.

@Stebalien
Copy link
Member

We need to go the other way and actually build a resource manager that can be used by applications (ideally one that doesn't live in libp2p, but that is shared by go-ipfs, go-libp2p, and anything else). Libp2p itself needs blocking allocation and oversubscription so we can actually utilize system resources.

Going one step further, one could argue that this is purely a yamux (and mplex?) problem, and that reasonable stream multiplexers will have a connection-level flow control window
...
At reasonable numbers of connections (a few 100), this shouldn't cause any problems except on the very weakest of nodes.

We can create 200 connections in a single dht query (we need to fix that, but that's another issue). Bursting up to 1k connections is normal, and we should support thousands of connections.

We can't just say "well, each connection takes 15MB, so 1k connections requires 15GiB of memory". That's absurd and would make libp2p unusable.

@marten-seemann
Copy link
Contributor Author

We need to go the other way and actually build a resource manager that can be used by applications (ideally one that doesn't live in libp2p, but that is shared by go-ipfs, go-libp2p, and anything else).

I don't necessarily disagree with this. It would be nice to have a component that does this.
However, I'm a bit skeptical if this is practical (in Go):

  • asking a resource manager for memory allowance every time you make an allocation is highly disruptive. Half of your code will be interactions with the resource manager (the other half being error handling :P)
  • if it was feasible, someone would probably have built it already. We're not the first ones to run into resource limits
  • really, what you need for this is custom allocators. You can expect the developer to ask a resource manager for memory for every single operation that allocates

We can create 200 connections in a single dht query (we need to fix that, but that's another issue). Bursting up to 1k connections is normal, and we should support thousands of connections.

We already don't. Once we introduce autoscaling limits (#48), the connection limit for the weakest node (assumed to have no more than 1 GB of RAM) will be 128, scaling up to much higher values as available system memory increases.
Now 128 * 15 MB is still more than 1 GB (and you still want leave room for other allocations), but not by a lot. Not sure if we can somehow close that gap, which is why I suggested to keep memory accounting for muxers in the peer and the system scope.

@MarcoPolo
Copy link
Collaborator

prologue caveat: I don't think removing memory tracking is a high pri issue. It's fine as is, we can also tweak this later and hopefully with more experience (e.g. do folks end up using it and loving it? keep; after 1 year does nobody use it? probably prune).

I generally agree with Marten here. Especially the part where it doesn't make sense that libp2p tries to manage resources of the whole application. I, as an application developer, don't want to jump through hoops to have my application use libp2p.

However, libp2p should make sure that it uses a reasonable amount of memory itself (I think both marten & steven agree with this). Libp2p owns streams and owns connections. It should make sure the memory usage of those owned resources doesn't balloon. 15GiB of memory for 1k connections is crazy town. I Agree with Steven that 1k burst connections is normal, here's my node that is doing nothing and hit 1k connections. I haven't thought too much on what the correct solution is here, but it's probably a mix of letting the user limit their number of connections and globally scale resource usage down amongst all connections to stay within limits (maybe even be smart and if the signal latency is low, you can reduce your window size).

What makes more sense to me is to rely on existing tools and patterns for managing resources. However there doesn't seem to be a great story around this in Go. See: golang/go#14162, vitessio/vitess#8897. But maybe golang/go#48409 would help here.


if it was feasible, someone would probably have built it already. We're not the first ones to run into resource limits

Good point.

@MarcoPolo
Copy link
Collaborator

Quick clarification, we don't allocate the max window buffer at the start. The 1k connections won't use 15GiB of memory under normal circumstances. This would only happen if all 1k connections were sending us a lot of data with high latency, which is probably the exact case we want to protect against.

@marten-seemann marten-seemann transferred this issue from libp2p/go-libp2p-resource-manager Aug 19, 2022
@marten-seemann marten-seemann changed the title consider removing memory tracking, except for muxers rcmgr: consider removing memory tracking, except for muxers Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants