Increase fd limits (and warn as we approach them) #7237

rustyrussell · 2024-04-19T07:45:48Z

On many systems, the default fd limit is 1024, and we've got users with over 900 channels. But that's a soft limit, and you can simply ask for an increase!

Also, we should log when we are running low, so people don't have to ask me what to do!

cdecker

ACK adbf9bc

endothermicdev · 2024-05-06T22:49:47Z

Added the connecting map to the memleak detection callback in connectd. Also included a driveby fix for tracepoints memory initialization.

rustyrussell · 2024-05-07T01:48:12Z

Fixing up test assumption that hard limit > soft limit (on CI it seems to be 65536 for both!).

endothermicdev · 2024-05-07T18:54:10Z

This works fine for me locally, but the CI seems unable to raise the fd limit after pytest constrains it. I noticed if I run pytest as root, it exhibits the same behavior.

benthecarman · 2024-05-07T20:32:43Z

would be nice to test this in docker, kubernetes, and the like to make sure it doesn't have the same constraints

rustyrussell · 2024-05-09T03:14:40Z

would be nice to test this in docker, kubernetes, and the like to make sure it doesn't have the same constraints

Agreeed. Meanwhile, this doesn't make it worse if we can't increase fd limits, and at least now we'll get a warning if it seems limiting.

So we can log when we hit fd limits on accept/recvmsg. Signed-off-by: Rusty Russell <[email protected]>

This can happen if we're totally out of fds, but previously we gave no log message indicating this! Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

I thought I was going to want to have a convenient way of counting these, but it turns out unnecessary. Still, this is slightly more efficient and simple, so I am including it. Signed-off-by: Rusty Russell <[email protected]>

We use a crude heuristic: if we were trying to contact them, it's a "deliberate" connection, and should be preserved. Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors. Signed-off-by: Rusty Russell <[email protected]>

… channels. 1024 is a common limit, and people are starting to hit that many channels, so we should increase it: twice the number of channels seems reasonable, though we only do this at restart time. Changelog-Changed: lightningd: we now try to increase the number of file descriptors, if it's less than twice the number of channels at startup (and log if we cannot!). Signed-off-by: Rusty Russell <[email protected]>

They're cheap. The 2x channels heuristic is nice, but does assume they restart every so often. If someone hits 64k connections I would like to know anyway! Signed-off-by: Rusty Russell <[email protected]>

We expect: UNUSUAL.*WARNING: we have 1 channels but can file descriptors limited to 65536 We get: lightningd: WARNING: we have 1 channels but can file descriptors limited to 32768! This is strange, since the first line is from Python's hard limit. Presumably something is restricting the fd limit of children Signed-off-by: Rusty Russell <[email protected]>

vincenzopalazzo

ACK 71c039e

Passing the github CI should also give us some clue about the limits in VMs?

rustyrussell added this to the v24.05 milestone Apr 19, 2024

cdecker approved these changes Apr 21, 2024

View reviewed changes

rustyrussell force-pushed the increase-fd-limits branch from 4640693 to f4c76a4 Compare May 6, 2024 03:21

endothermicdev force-pushed the increase-fd-limits branch from f4c76a4 to f858d34 Compare May 6, 2024 22:40

rustyrussell force-pushed the increase-fd-limits branch from f858d34 to e663658 Compare May 7, 2024 01:48

rustyrussell added 8 commits May 9, 2024 12:51

ccan: update to get extended ccan/io error callbacks.

b231ea6

So we can log when we hit fd limits on accept/recvmsg. Signed-off-by: Rusty Russell <[email protected]>

connectd: log if we fail an accept() call.

41ac377

This can happen if we're totally out of fds, but previously we gave no log message indicating this! Signed-off-by: Rusty Russell <[email protected]>

connectd: log when we fail to receive an fd from lightningd.

f7a0a85

Signed-off-by: Rusty Russell <[email protected]>

lightningd: log when we fail to get an fd from hsmd.

88ea4c1

Signed-off-by: Rusty Russell <[email protected]>

lightningd: hell, let's start with 64k fds.

0238cbf

They're cheap. The 2x channels heuristic is nice, but does assume they restart every so often. If someone hits 64k connections I would like to know anyway! Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the increase-fd-limits branch from e663658 to 995fefd Compare May 9, 2024 03:30

rustyrussell force-pushed the increase-fd-limits branch from 995fefd to 71c039e Compare May 9, 2024 04:36

vincenzopalazzo approved these changes May 9, 2024

View reviewed changes

rustyrussell merged commit f0d9fc6 into ElementsProject:master May 9, 2024
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase fd limits (and warn as we approach them) #7237

Increase fd limits (and warn as we approach them) #7237

rustyrussell commented Apr 19, 2024

cdecker left a comment

endothermicdev commented May 6, 2024

rustyrussell commented May 7, 2024

endothermicdev commented May 7, 2024

benthecarman commented May 7, 2024

rustyrussell commented May 9, 2024

vincenzopalazzo left a comment

Increase fd limits (and warn as we approach them) #7237

Increase fd limits (and warn as we approach them) #7237

Conversation

rustyrussell commented Apr 19, 2024

cdecker left a comment

Choose a reason for hiding this comment

endothermicdev commented May 6, 2024

rustyrussell commented May 7, 2024

endothermicdev commented May 7, 2024

benthecarman commented May 7, 2024

rustyrussell commented May 9, 2024

vincenzopalazzo left a comment

Choose a reason for hiding this comment