-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channel updates and node announcements not propagating #7276
Comments
I have gathered a view of other nodes on this node's channels, especially the last update. It appears 'some' gossip has been trickling in to other nodes, especially around 14-15 April, but very sparse. So it's not that all gossip has not propagated. Most of the gossip has not propagated. Does cln periodically update the channel policy? Can you point me to the place in the code where that happens? |
I have also issue that my CLN channel updates are not propagating to the network. In my case I traced it to issue with LND nodes in the network that mark some of my channels as "zombie" channels and refuse to propagate any gossip about them even if they are healthy channels. It should hopefully be fixed by LND in 0.18 release. |
Querying the node announcement (with a little program written with LDK) from this node directly just now gives me a node announcement with the timestamp |
Mh! looking at this while thinking how to debug the following one lightningdevkit/rust-lightning#3075 I think @rustyrussell or @endothermicdev know the code base well, but if they do not have time I can start looking into it because at some point I should understand well |
I customized the LDK program to just send a gossip query to the node and log the returned node announcement message of the node in question. It is really the only node announcement returned by the node and doesn't have to do with ordering of the messages. |
Oh sorry I did not noted that you created a custom script to pool gossips :) I was talking about the general problem. |
It does appear to be sending channel announcement first, then channel update, then node announcement though. |
Looking at this a little bit more. If I query the gossip directly on the node itself, almost all channels have an update after $ lightning-cli listchannels source=02442d4249f9a93464aaf8cd8d522faa869356707b5f1537a8d6def2af50058c5b | jq '.channels[] | .last_update | strftime("%Y-%m-%dT%H:%M:%S %Z")'
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:36 UTC"
"2024-05-30T10:09:58 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:02 UTC"
"2024-05-28T18:19:02 UTC"
"2024-05-28T18:19:36 UTC"
"2024-05-28T18:19:03 UTC"
"2024-05-28T18:19:05 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:03 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:18:00 UTC"
"2024-05-28T18:18:00 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-29T18:57:07 UTC"
"2024-05-28T18:20:02 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-31T21:49:11 UTC"
"2024-05-28T18:19:02 UTC"
"2024-05-28T18:19:03 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-28T18:19:06 UTC"
"2024-05-31T23:35:12 UTC"
"2024-05-28T18:19:03 UTC"
"2024-05-28T18:18:00 UTC"
"2024-05-18T17:58:17 UTC"
"2024-06-02T10:31:33 UTC"
"2024-06-02T10:31:33 UTC" If I send a gossip timestamp filter to this node with The oldest age of channels on the node is 127 days. So I think the opening timestamp is used to return channel updates, rather than the update date. |
Reading this code here where the node announcement is broadcasted: lightning/lightningd/channel_gossip.c Line 1101 in 72079cc
It seems to me that if the node announcement hasn't been changed it is not get signed and sent again with a new timestamp. |
Connecting out to a peer should send the peer gossip messages, especially when the peer requests it with a GOSSIP_TIMESTAMP_FILTER, but it doesn't happen (cln 24.05):
|
How does a node announcement get "too old"? |
OK, let's clear some things up:
Woah, that's weird! But we definitely filter by update timestamp: I just checked. However, in the latest versions of CLN we only start from the beginning of the gossip_store if the timestamp is 0, otherwise we start from the first entry which is more recent than two hours rather then sweeping the entire store. i.e. only "recent" ones. (We still check the timestamps are in the range you ask for, just don't search everything). I'm surprised you're seeing anything more than all or nothing though. And even before you send the filter message, we will spew all our own gossip. |
When a peer connects, we always send all our own gossip (even if they had set the timestamp filters to filter it out). But we weren't forcing it out to them when it changed, so this logic only applied to unstable or frequently-restarting nodes. So now, we tell all the peers whenever we tell gossipd about our new gossip. Fixes: ElementsProject#7276 Signed-off-by: Rusty Russell <[email protected]> Changelog-Changed: Protocol: We now send current peers our changed gossip (even if they set timestamp_filter otherwise), not just on reconnect.
OK. I think I've found it, and figured out why you are seeing this. We force our gossip on peers when they reconnect, but not if they're already connected. Obviously we should do this, and it can be noticed on nodes with stable connections. |
When a peer connects, we always send all our own gossip (even if they had set the timestamp filters to filter it out). But we weren't forcing it out to them when it changed, so this logic only applied to unstable or frequently-restarting nodes. So now, we tell all the peers whenever we tell gossipd about our new gossip. Fixes: ElementsProject#7276 Signed-off-by: Rusty Russell <[email protected]> Changelog-Changed: Protocol: We now send current peers our changed gossip (even if they set timestamp_filter otherwise), not just on reconnect.
I'm not seeing this when connecting to the peer with an LDK made program. It could be this is our specific node that has many many private channels and only a few public ones, running into the Sharing the gossip is most important to peers we have public channels with, because they are most likely to forward that gossip. Perhaps they can get an 'advantage' in the spam redundancy model?
I see, I had in mind that a 2 week old node announcement would get you removed from the graph. I'd have to double check that. I at least verified that LND does not discard the old node announcement if it doesn't have it yet when receiving it. I'll check the graph pruning logic too.
This is great news!
So this explains the behavior of the timestamp filtering? First they are filtered by entry date (which is the channel entry date?) and then by timestamp? |
@rustyrussell another observation is that restarts/disconnects didn't change anything. Only deleting the gossip_store file was a solution to the gossip propagation issues. Do you have an explanation for that? |
Issue and Steps to Reproduce
lightningd appears to fail to propagate its own channel updates through the network.
Yesterday, one of our nodes hadn't sent out a channel update or node announcement in over 22 days.
This corresponds with the latest restart of the node.
So it appears gossip is only exchanged when the node is restarted.
During this period there were several updates made to channel policies, but they also had not been propagated through the network.
Today the node was restarted and (some, but not all) gossip made it to other nodes.
I need some guidance how to troubleshoot this issue.
getinfo
outputThe text was updated successfully, but these errors were encountered: