rcmgr: consider removing memory tracking, except for muxers #1708

marten-seemann · 2022-06-23T12:15:05Z

Having looked through traces / rcmgr metrics, the only significant (tracked) consumer of memory in our stack are the stream multiplexers. The only memory allocations outside of the stream muxers are allocations of a few kB to parse protobufs.

One might argue that this only shows that more application protocols need to adopt the resource manager and in order to track every memory allocation. I'm not convinced by that argument:

The interface provided by the resource manager is not really usable in practice. In most application protocols, forward progress is only possible when memory can be allocated. The resource manager doesn't provide a way to block until a memory allocation is possible (and it's questionable if that's a wise thing to do when under memory pressure)
and more importantly: This seems like overreach from libp2p's side. We're not in the business of building a kernel. All that you can reasonably expect from a networking stack is that it solves your p2p problems (and doesn't kill your application while doing so). You can't ask it to solve your general resource allocation problems.

We should therefore consider dropping the memory tracking from all scopes, expect:

the peer scope: that's how we make sure that you can't increase your stream flow control windows to 8 MB on 500 streams
the system scope: to provide a global limit for all peer scopes

Thoughts, @MarcoPolo @vyzo @Stebalien?

marten-seemann · 2022-06-23T12:22:03Z

Going one step further, one could argue that this is purely a yamux (and mplex?) problem, and that reasonable stream multiplexers will have a connection-level flow control window. For QUIC, we use 15 MB per connection:

go-libp2p/p2p/transport/quic/transport.go

Line 45 in 86aad88

MaxConnectionReceiveWindow: 15 * (1 << 20), // 15 MB

.

At reasonable numbers of connections (a few 100), this shouldn't cause any problems except on the very weakest of nodes.

vyzo · 2022-06-23T12:42:15Z

thats a step backwards and premature optimizatio i think.

Stebalien · 2022-06-23T15:58:14Z

We need to go the other way and actually build a resource manager that can be used by applications (ideally one that doesn't live in libp2p, but that is shared by go-ipfs, go-libp2p, and anything else). Libp2p itself needs blocking allocation and oversubscription so we can actually utilize system resources.

Going one step further, one could argue that this is purely a yamux (and mplex?) problem, and that reasonable stream multiplexers will have a connection-level flow control window
...
At reasonable numbers of connections (a few 100), this shouldn't cause any problems except on the very weakest of nodes.

We can create 200 connections in a single dht query (we need to fix that, but that's another issue). Bursting up to 1k connections is normal, and we should support thousands of connections.

We can't just say "well, each connection takes 15MB, so 1k connections requires 15GiB of memory". That's absurd and would make libp2p unusable.

marten-seemann · 2022-06-23T17:01:07Z

We need to go the other way and actually build a resource manager that can be used by applications (ideally one that doesn't live in libp2p, but that is shared by go-ipfs, go-libp2p, and anything else).

I don't necessarily disagree with this. It would be nice to have a component that does this.
However, I'm a bit skeptical if this is practical (in Go):

asking a resource manager for memory allowance every time you make an allocation is highly disruptive. Half of your code will be interactions with the resource manager (the other half being error handling :P)
if it was feasible, someone would probably have built it already. We're not the first ones to run into resource limits
really, what you need for this is custom allocators. You can expect the developer to ask a resource manager for memory for every single operation that allocates

We can create 200 connections in a single dht query (we need to fix that, but that's another issue). Bursting up to 1k connections is normal, and we should support thousands of connections.

We already don't. Once we introduce autoscaling limits (#48), the connection limit for the weakest node (assumed to have no more than 1 GB of RAM) will be 128, scaling up to much higher values as available system memory increases.
Now 128 * 15 MB is still more than 1 GB (and you still want leave room for other allocations), but not by a lot. Not sure if we can somehow close that gap, which is why I suggested to keep memory accounting for muxers in the peer and the system scope.

MarcoPolo · 2022-06-23T17:09:12Z

prologue caveat: I don't think removing memory tracking is a high pri issue. It's fine as is, we can also tweak this later and hopefully with more experience (e.g. do folks end up using it and loving it? keep; after 1 year does nobody use it? probably prune).

I generally agree with Marten here. Especially the part where it doesn't make sense that libp2p tries to manage resources of the whole application. I, as an application developer, don't want to jump through hoops to have my application use libp2p.

However, libp2p should make sure that it uses a reasonable amount of memory itself (I think both marten & steven agree with this). Libp2p owns streams and owns connections. It should make sure the memory usage of those owned resources doesn't balloon. 15GiB of memory for 1k connections is crazy town. I Agree with Steven that 1k burst connections is normal, here's my node that is doing nothing and hit 1k connections. I haven't thought too much on what the correct solution is here, but it's probably a mix of letting the user limit their number of connections and globally scale resource usage down amongst all connections to stay within limits (maybe even be smart and if the signal latency is low, you can reduce your window size).

What makes more sense to me is to rely on existing tools and patterns for managing resources. However there doesn't seem to be a great story around this in Go. See: golang/go#14162, vitessio/vitess#8897. But maybe golang/go#48409 would help here.

if it was feasible, someone would probably have built it already. We're not the first ones to run into resource limits

Good point.

MarcoPolo · 2022-06-23T17:19:07Z

Quick clarification, we don't allocate the max window buffer at the start. The 1k connections won't use 15GiB of memory under normal circumstances. This would only happen if all 1k connections were sending us a lot of data with high latency, which is probably the exact case we want to protect against.

marten-seemann mentioned this issue Aug 19, 2022

rcmgr: connection-level memory base limits are too small for performant QUIC transfers #1706

Closed

marten-seemann transferred this issue from libp2p/go-libp2p-resource-manager Aug 19, 2022

marten-seemann changed the title ~~consider removing memory tracking, except for muxers~~ rcmgr: consider removing memory tracking, except for muxers Aug 19, 2022

marten-seemann mentioned this issue Jan 24, 2023

rcmgr: Connections and Streams should also reserve memory #2010

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rcmgr: consider removing memory tracking, except for muxers #1708

rcmgr: consider removing memory tracking, except for muxers #1708

marten-seemann commented Jun 23, 2022

marten-seemann commented Jun 23, 2022

vyzo commented Jun 23, 2022

Stebalien commented Jun 23, 2022

marten-seemann commented Jun 23, 2022

MarcoPolo commented Jun 23, 2022

MarcoPolo commented Jun 23, 2022

rcmgr: consider removing memory tracking, except for muxers #1708

rcmgr: consider removing memory tracking, except for muxers #1708

Comments

marten-seemann commented Jun 23, 2022

marten-seemann commented Jun 23, 2022

vyzo commented Jun 23, 2022

Stebalien commented Jun 23, 2022

marten-seemann commented Jun 23, 2022

MarcoPolo commented Jun 23, 2022

MarcoPolo commented Jun 23, 2022