Skip to content

Commit

Permalink
connectd: increase queue length to 250,000.
Browse files Browse the repository at this point in the history
The original complaint which caused my investigation was the 100% CPU
consumption of connectd, which we traced to the queue to gossipd.

However, the issue is not really connectd's overproduction, but
gossipd's underconsumption, probably caused by its own queueing issues
with the trace messages to lightningd, which the prior patch fixed.

Nonetheless, gossipd *can* get busy, and if we were to ask multiple
nodes for full gossip, we could see a few hundred thousand messages
come it at once.  Hence I'm increasing the warning limit to 250,000
messages.

This commit is also where we attach the Changelog message, even
though it's really "common/msg_queue: use membuf for greater efficiency."
and "gossipd: fix excessive msg_queue length from status_trace()" which
solved the problem.

Here's the backtrace from a previous debug patch:

```
lightning_connectd: msg_queue length excessive (version v24.08.1-17-ga780ad4-modded)
0x5580534051f0 send_backtrace
        common/daemon.c:33
0x55805340bd5b do_enqueue
        common/msg_queue.c:66
0x55805340bde5 msg_enqueue
        common/msg_queue.c:82
0x5580534057ce daemon_conn_send
        common/daemon_conn.c:161
0x5580533fe3ff handle_gossip_in
        connectd/multiplex.c:624
0x5580533ff23b handle_message_locally
        connectd/multiplex.c:763
0x5580533ff2d6 read_body_from_peer_done
        connectd/multiplex.c:1112
```

Reported-by: https://github.com/JssDWt
Signed-off-by: Rusty Russell <[email protected]>
Changelog-Fixed: `connectd` and `gossipd` message queues are much more efficient.
  • Loading branch information
rustyrussell committed Nov 1, 2024
1 parent 1efa065 commit 183da39
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion common/msg_queue.c
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ static void do_enqueue(struct msg_queue *q, const u8 *add TAKES)

*msg = tal_dup_talarr(q, u8, add);

if (!warned_once && msg_queue_length(q) > 100000) {
if (!warned_once && msg_queue_length(q) > 250000) {
/* Can cause re-entry, so set flag first! */
warned_once = true;
send_backtrace("excessive queue length");
Expand Down

0 comments on commit 183da39

Please sign in to comment.