failing link: unable to update commitment: cannot add duplicate keystone with error: internal error #6485

C-Otto · 2022-04-30T19:21:12Z

Background

The error message "failing link: unable to update commitment: cannot add duplicate keystone with error: internal error" appeared in my logs, for reasons I don't understand. Afterwards the channel was unusable and my peer (running CLN) immediately force-closed the channel at 01:26.

The force-close transaction contains one outgoing HTLC (timeout, according to lncli closedchannels) with size 200003 sat.

My peer says:

I’m getting hit by several force closes for all kind of reasons in the last couple days. Ours is just “internal error” in my log

Relevant snippet from my logs:

01:26:09.438 [INF] PEER: NodeKey(PUBKEY) loading ChannelPoint(CHANPOINT)
01:26:09.439 [DBG] CNCT: New ChainEventSubscription(id=15) for ChannelPoint(CHANPOINT)
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): starting
01:26:09.439 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(CHANPOINT)
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): HTLC manager started, bandwidth=3663215408 mSAT
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): attempting to re-synchronize
01:26:09.439 [INF] PEER: Negotiated chan series queries with PUBKEY
01:26:09.519 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: PUBKEY@IP2:48304
01:26:09.519 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: PUBKEY@IP2:48304
01:26:09.839 [INF] HSWC: ChannelLink(CHANPOINT): received re-establishment message from remote side
01:26:09.851 [DBG] HSWC: ChannelLink(CHANPOINT): loaded 0 fwd pks
01:26:11.237 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6133225)->(Chan ID=CHAN_ID, HTLC ID=4619)
01:26:11.991 [DBG] HSWC: ChannelLink(CHANPOINT): removing Add packet (Chan ID=0:0:0, HTLC ID=6133225) from mailbox
01:26:13.366 [DBG] HSWC: ChannelLink(CHANPOINT): settle-fail-filter &{1 [0]}
01:26:13.366 [DBG] HSWC: ChannelLink(CHANPOINT): Failed to send 500059997 mSAT
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): attempting state step with trigger=chainTrigger from state=StateDefault
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): new block (height=734163) examining active HTLC's
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): checking commit chain actions at height=734163, in_htlc_count=0, out_htlc_count=2
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): no actions for chain trigger, terminating
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): terminating at state=StateDefault
01:26:16.485 [DBG] HSWC: ChannelLink(CHANPOINT): settle-fail-filter &{1 [0]}
01:26:16.485 [DBG] HSWC: ChannelLink(CHANPOINT): Failed to send 1000047000 mSAT
01:26:26.905 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6133240)->(Chan ID=CHAN_ID, HTLC ID=4620)

01:26:26.956 [ERR] HSWC: ChannelLink(CHANPOINT): failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

01:26:26.956 [INF] HSWC: ChannelLink(CHANPOINT): exited
01:26:26.957 [INF] HSWC: ChannelLink(CHANPOINT): stopping

Your environment

lnd 0.14.3-beta-rc1
Linux server 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux
bitcoind v23

Steps to reproduce

Have non-anchor channel with CLN behind tor. Have somewhat flaky connection. Send HTLCs to peer.

The text was updated successfully, but these errors were encountered:

Roasbeef · 2022-05-02T17:49:50Z

lnd 0.14.3-beta-rc1

Any reason this node is running the production release of this version?

I wonder if this was actually an inadvertent HTLC replay attempt....

C-Otto · 2022-05-02T17:54:23Z

I checked the commits following rc1 and didn't see anything that interested me, so I didn't update (I might be wrong, of course, updating as we speak). What do you mean with "production release"? I'm running my own version with a few patches (logging, non-random channel picking for parallel channels).

Crypt-iQ · 2022-05-02T19:12:21Z

This could happen if c-lightning was sending settles earlier than expected, but I don't think they're doing that: #6246

Do you have more in-depth logs you can share leading up to this point? Are PEER debug logs off?

C-Otto · 2022-05-02T19:53:26Z

I have debuglevel=info,CNCT=debug,DISC=warn,HSWC=debug,NANN=debug,WTCL=warn. What exactly do you need? The logs above show everything I have for the CHANPOINT (EDIT: In that timeframe).

Crypt-iQ · 2022-05-02T20:22:34Z

Did your logs rotate or something? The channel has a history of 4620 htlc's so hopefully there are more logs from before the reestablish message was received

C-Otto · 2022-05-02T20:24:45Z

I have lots of logs, no worries. Just tell me what you need, I won't send over gigabytes of logs, though :)

Crypt-iQ · 2022-05-02T20:31:55Z

I think you should have at least one log line that is identical to the one pasted above (HTLC ID=4620 is what matters):

queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=<can be anything>)->(Chan ID=CHAN_ID, HTLC ID=4620)

If you could send ~1min before the earliest identical log line until the point pasted in the OP, that would help narrow things down. I don't need the channel ids/payment hashes/ips/pubkeys etc, HSWC logs only.

C-Otto · 2022-05-02T20:37:02Z

Yes.

22:10:16.275 [INF] HSWC: ChannelLink(CHANPOINT): HTLC manager started, bandwidth=4163275405 mSAT
22:10:16.275 [INF] HSWC: ChannelLink(CHANPOINT): attempting to re-synchronize
22:10:16.276 [INF] PEER: Negotiated chan series queries with PUBKEY
22:10:18.611 [INF] HSWC: ChannelLink(CHANPOINT): received re-establishment message from remote side
22:10:18.618 [DBG] HSWC: ChannelLink(CHANPOINT): loaded 46 fwd pks
22:10:18.618 [DBG] HSWC: ChannelLink(CHANPOINT): removing completed fwd pkg for height=13475
[...]
22:10:19.218 [DBG] HSWC: ChannelLink(CHANPOINT): removing completed fwd pkg for height=13520
22:10:32.588 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6124133)->(Chan ID=CHANID, HTLC ID=4618)
22:10:32.659 [DBG] HSWC: ChannelLink(CHANPOINT): removing Add packet (Chan ID=0:0:0, HTLC ID=6124133) from mailbox
22:11:08.935 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=OTHER_CHANID, HTLC ID=7856)->(Chan ID=CHANID, HTLC ID=4619)
22:11:39.744 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6124179)->(Chan ID=CHANID, HTLC ID=4620)
22:11:44.869 [DBG] CNCT: ChannelArbitrator(CHANPOINT): attempting state step with trigger=chainTrigger from state=StateDefault
22:11:44.869 [DBG] CNCT: ChannelArbitrator(CHANPOINT): new block (height=734142) examining active HTLC's
22:11:44.869 [DBG] CNCT: ChannelArbitrator(CHANPOINT): checking commit chain actions at height=734142, in_htlc_count=0, out_htlc_count=1
22:11:44.869 [DBG] CNCT: ChannelArbitrator(CHANPOINT): no actions for chain trigger, terminating
22:11:44.869 [DBG] CNCT: ChannelArbitrator(CHANPOINT): terminating at state=StateDefault
22:12:04.253 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6124191)->(Chan ID=CHANID, HTLC ID=4621)
22:12:08.995 [ERR] HSWC: ChannelLink(CHANPOINT): failing link: unable to complete dance with error: remote unresponsive
22:12:08.995 [INF] HSWC: ChannelLink(CHANPOINT): exited
22:12:08.995 [INF] HSWC: ChannelLink(CHANPOINT): stopping

C-Otto · 2022-05-02T20:41:25Z

In between those (22:12:08 and 01:26:09) I don't have any HSWC log message for that peer/channel.

Crypt-iQ · 2022-05-02T21:03:11Z

Do you see this and if so, what time? I might be missing a space or something - this will be under OTHER_CHANID's chanpoint

queueing removal of SETTLE closed circuit: (Chan ID=OTHER_CHANID, HTLC ID=7856)->(Chan ID=CHANID, HTLC ID=4619)

C-Otto · 2022-05-02T21:10:43Z

22:11:08.935 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=OTHER_CHANID, HTLC ID=7856)->(Chan ID=CHAN_ID, HTLC ID=4619)
22:12:08.996 [DBG] HSWC: ChannelLink(OTHER_CHANPOINT): queueing removal of FAIL closed circuit: (Chan ID=OTHER_CHANID, HTLC ID=7856)->(Chan ID=0:0:0, HTLC ID=0)
22:12:09.018 [DBG] HSWC: ChannelLink(OTHER_CHANPOINT): removing Fail/Settle packet (Chan ID=OTHER_CHANID, HTLC ID=7856) from mailbox

Crypt-iQ · 2022-05-02T21:27:28Z

Ok, I might have an idea of what's going on. If I'm right this is an lnd problem, not a c-l problem

Roasbeef · 2022-05-03T19:27:21Z

Looks like this was introduced inadvertently in this PR (according to @Crypt-iQ): #4183

Roasbeef · 2022-05-04T00:36:20Z

Noted a related thread in #6482 where in some cases we may not be properly cancelling inbound HTLCs if we attempt to send a commitment but the remote peer never replies. This is a bit trickier since we've technically already sent out that valid commitment, so we need to be playing that HTLC (may lead to a force close since we want to be able to safely time out that incoming HTLC).

Crypt-iQ · 2022-05-04T16:04:52Z

We are cancelling back htlc's properly in #6482

Crypt-iQ · 2022-05-04T17:07:14Z

Looked into it more, and #4183 did not introduce this, but I think it made it harder to recover from when peered with another lnd node.

Filouman · 2022-05-22T12:19:48Z

I have seen 4 or 5 of these force closures in the last few days! All of them with CLN nodes.

Latest one:
2022-05-20T17:09:20.572Z UNUSUAL 03641a88d80a2a85bbecd770577aca9b5495616e9fef63d66ef2631b7cca1d395d-chan#4496: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR error channel bea859af4b46d48f52d39b8e0e54b94c7e9bcb15785bccb0dbbc294759775115: internal error

Please let me know if I can offer any additional info that may help.

zerofeerouting · 2022-05-23T15:05:24Z

I have had about 20 of these over the last couple of days. This is a really pressing issue.

Roasbeef · 2022-05-23T17:51:14Z

Worth nothing that we'll send an error there, but won't actually attempt to force close the channel to see if a reconnection fixes things. Some CL nodes force close on any error message received. #5602 will enable us to send the new warning message instead, but some older nodes also still rely on that error message.

We have a fix here (for the issue that leads to us failing to process a certain HTLC action): #6518.

Roasbeef · 2022-05-23T17:54:36Z

AFAICT, something might have been introduced on the CL end inadvertently during their recent re-write of some peer/channel handling code. Also is everyone here running CL on this point release that fixed a force close issue?

Filouman · 2022-05-23T17:57:24Z

Is there any relation between the 'Internal Error' and 'Unable to Read Message' errors? Have been seeing both of these regularly over the past few days with my CLN peers.

#6564

Roasbeef · 2022-05-23T17:58:33Z

Unable to read message happens when the other peer "forcibly" hangs up the connection, that might have been in response to an lnd node sending an error message. Some CL nodes (or all of them still?) will auto force close on any error message.

zerofeerouting · 2022-05-23T18:03:08Z

I run the latest CLN (0.11.1) - my node has force closed multiple channels when my peers told me they had an internal error.

According to BOLT 2 (https://github.com/lightning/bolts/blob/master/01-messaging.md#requirements-2)

The receiving node:

upon receiving error:
    if channel_id is all zero:
        MUST fail all channels with the sending node.
    otherwise:
        MUST fail the channel referred to by channel_id, if that channel is with the sending node.

So the CLN behaviour is up to specs AFAICT.

Roasbeef · 2022-05-23T22:26:28Z

I run the latest CLN (0.11.1) - my node has force closed multiple channels when my peers told me they had an internal error.

Ok good to know, just trying to pinpoint things here as I know there was a p2p overhaul for them and we haven't had any significant changes in this area lately. We're reviewing and testing #6518 to see if this addresses the issue. It also seems related to other reported issue when us writing to a zombie TCP socket (other side no longer listening on it) would cause eventually force closures.

btweenthebars · 2022-05-24T00:15:00Z

Is there any relation between the 'Internal Error' and 'Unable to Read Message' errors? Have been seeing both of these regularly over the past few days with my CLN peers.

#6564

Is there any "Keystone" error in your log ? I doubt they are the same issue

Filouman · 2022-05-24T01:49:07Z

Is there any "Keystone" error in your log ? I doubt they are the same issue

This one?

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

btweenthebars · 2022-05-24T03:57:23Z

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

Did it happen to the same peers that you have the "Unable to Read Message" error ?

heyambob mentioned this issue Apr 30, 2022

force close on peer with "internal error" no further explanations ElementsProject/lightning#5102

Closed

Crypt-iQ added commitments Commitment transactions containing the state of the channel force closes labels May 2, 2022

Crypt-iQ self-assigned this May 2, 2022

This was referenced May 3, 2022

htlcswitch: sync local payment hand-off to link #4183

Merged

Avoid force-close: ping peer before sending HTLC #6482

Closed

Crypt-iQ mentioned this issue May 11, 2022

htlcswitch: remove synchronous link handoff, special-case keystone err #6518

Merged

Roasbeef mentioned this issue May 23, 2022

unable to read message from CLN peer, send "internal error", force close ensues #6564

Closed

Roasbeef closed this as completed in #6518 May 23, 2022

zerofeerouting mentioned this issue May 30, 2022

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

Closed

heyambob mentioned this issue Jun 1, 2022

v0.11.1, crash in connectd ElementsProject/lightning#5282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error #6485

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error #6485

C-Otto commented Apr 30, 2022 •

edited

Loading

Roasbeef commented May 2, 2022

C-Otto commented May 2, 2022 •

edited

Loading

Crypt-iQ commented May 2, 2022

C-Otto commented May 2, 2022 •

edited

Loading

Crypt-iQ commented May 2, 2022

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022 •

edited

Loading

C-Otto commented May 2, 2022

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022 •

edited

Loading

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022

Roasbeef commented May 3, 2022

Roasbeef commented May 4, 2022

Crypt-iQ commented May 4, 2022

Crypt-iQ commented May 4, 2022

Filouman commented May 22, 2022

zerofeerouting commented May 23, 2022 •

edited

Loading

Roasbeef commented May 23, 2022

Roasbeef commented May 23, 2022

Filouman commented May 23, 2022

Roasbeef commented May 23, 2022

zerofeerouting commented May 23, 2022

Roasbeef commented May 23, 2022

btweenthebars commented May 24, 2022

Filouman commented May 24, 2022

btweenthebars commented May 24, 2022

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error #6485

failing link: unable to update commitment: cannot add duplicate keystone with error: internal error #6485

Comments

C-Otto commented Apr 30, 2022 • edited Loading

Background

Your environment

Steps to reproduce

Roasbeef commented May 2, 2022

C-Otto commented May 2, 2022 • edited Loading

Crypt-iQ commented May 2, 2022

C-Otto commented May 2, 2022 • edited Loading

Crypt-iQ commented May 2, 2022

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022 • edited Loading

C-Otto commented May 2, 2022

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022 • edited Loading

C-Otto commented May 2, 2022

Crypt-iQ commented May 2, 2022

Roasbeef commented May 3, 2022

Roasbeef commented May 4, 2022

Crypt-iQ commented May 4, 2022

Crypt-iQ commented May 4, 2022

Filouman commented May 22, 2022

zerofeerouting commented May 23, 2022 • edited Loading

Roasbeef commented May 23, 2022

Roasbeef commented May 23, 2022

Filouman commented May 23, 2022

Roasbeef commented May 23, 2022

zerofeerouting commented May 23, 2022

Roasbeef commented May 23, 2022

btweenthebars commented May 24, 2022

Filouman commented May 24, 2022

btweenthebars commented May 24, 2022

C-Otto commented Apr 30, 2022 •

edited

Loading

C-Otto commented May 2, 2022 •

edited

Loading

C-Otto commented May 2, 2022 •

edited

Loading

Crypt-iQ commented May 2, 2022 •

edited

Loading

Crypt-iQ commented May 2, 2022 •

edited

Loading

zerofeerouting commented May 23, 2022 •

edited

Loading