Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retransmission subsystem in channel link #231

Conversation

andrewshvv
Copy link
Contributor

  • ask Laolu about counter persistence, and add it.
  • check the generateRevocation function and make sure that we are passing right indexes.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 73.298% when pulling 49aa23a263861a1a9c116ecdbac36406fd9284cf on AndrewSamokhvalov:retransmission_in_channellink into 2e6800e on lightningnetwork:master.

@andrewshvv andrewshvv force-pushed the retransmission_in_channellink branch 2 times, most recently from 0584e7e to e2eaf0c Compare July 11, 2017 18:44
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 73.52% when pulling e2eaf0cddaa4d720d7443941f54cb43d7bc3b937 on AndrewSamokhvalov:retransmission_in_channellink into 2e6800e on lightningnetwork:master.

@andrewshvv andrewshvv changed the title Add retransmission subsytem in channell ink Add retransmission subsystem in channel link Jul 11, 2017
@andrewshvv andrewshvv force-pushed the retransmission_in_channellink branch 2 times, most recently from 122bd8f to ade9065 Compare July 11, 2017 19:56
@coveralls
Copy link

coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.5%) to 73.528% when pulling ade906540d38e3d4bbd360d47db0b2c8ed7819c5 on AndrewSamokhvalov:retransmission_in_channellink into 2e6800e on lightningnetwork:master.

Roasbeef added a commit that referenced this pull request Aug 1, 2017
…dust

Note that this commit is temporary, and should be reverted once #231 is
merged. The reason we need to do this for now, is that we don’t
properly track the exact state of the remote party’s commitment. In
this test case, the resulting HTLC’s added are dust to one party, but
non-dust to another. So upon restart, the states (balance wise) has
diverged.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! Looking forward to finally getting in proper channel message retransmission into our next major release.

The set of changes in this diff are very clean, however they're incomplete. You'll find most of the details within the body of the review, but at a high level, the following is missing:

  • We need to track the delta between the prior state, and the state we created for the remote party. This is required as we must retransmit all update_* messages, not only just the HTLC additions.
  • We also need to store the exact state of the current commitment chain tail for the remote party. This is required as due to the possibility of asymmetric dust thresholds for both parties, upon restart, it's possible for our commitment states to have diverged.

@@ -48,23 +46,92 @@ func messageToString(msg lnwire.Message) string {
return spew.Sdump(msg)
}

// expectedMessage struct hols the message which travels from one peer to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hols -> holds

// Skip logging of extend revocation window messages.
switch m := m.(type) {
case *lnwire.RevokeAndAck:
var zeroHash chainhash.Hash
if bytes.Equal(zeroHash[:], m.Revocation[:]) {
return
return false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above fragment can now be removed as of the latest master, as there is no longer a revocation window to be extended.

// createLogFunc is a helper function which returns the function which will be
// used for logging message are received from another peer.
func createLogFunc(name string, channelID lnwire.ChannelID) messageInterceptor {
return func(m lnwire.Message) {
if getChanID(m) == channelID {
return func(m lnwire.Message) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of the first parameter? The godoc comment on this function should be modified to detail it.

}

expectedMessage := expectToReceive[0]
expectToReceive = expectToReceive[1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The element at expectToReceive[0] should be set to nil, otherwise we risk a memory leak.


// Record is used to set the function which will be triggered when new
// lnwire message was received.
func (s *mockServer) record(f messageInterceptor) {
s.recordFuncs = append(s.recordFuncs, f)
func (s *mockServer) intersect(f messageInterceptor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The godoc comment above wasn't updated.

Also I think you meant to call this function intercept instead?

logCommitTimer: time.NewTimer(300 * time.Millisecond),
overflowQueue: newWaitingQueue(),
quit: make(chan struct{}),
cfg: cfg,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reference #194 in the commit body? This commit resolves the issue encountered, thanks!

//
// NOTE: This MUST be run as a goroutine.
func (s *Switch) htlcForwarder() {
defer s.wg.Done()
func (s *Switch) handleControl() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What deadlock? Can you add some detail to the commit's message explaining the scenario that caused you to make this change? Thanks!

// down active onion routed payments. Each active channel is modeled as
// networked device with metadata such as the available payment bandwidth, and
// total link capacity.
// handleControl...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a godoc comment here.

case req := <-s.linkControl:
switch cmd := req.(type) {
case *addLinkCmd:
cmd.err <- s.addLink(cmd.link)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we'll need to introduce mutexes internally for this new set of changes to be concurrent safe.

@@ -751,6 +751,7 @@ func (s *Switch) addLink(link ChannelLink) error {
s.interfaceIndex[peerPub][link] = struct{}{}

if err := link.Start(); err != nil {
s.removeLink(link.ChanID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

In this commit the reestablish message have been added, which serves as
channel state synchronization message. Before exchanging the messages
for particular channel peers have to send it to each other as the
first message in order to be sure that non of the updates have been
lost because of the previous disconnect.
Add message interceptor which checks the order and may skip the
messages which were denoted to be skipeed.
This commit where added as a measure to avoid the panic during several
server simultanoius fault. The panic happened becuase *t.Testing
structure is not concurrent safe.
Add js-like future object which might be used to wait for the response
to be received or return the error otherwise.
In order to be able to properly restart switch several times we should
have the sequential process of channel link stop. In other words if we
stopped the switch we should be sure that all channel links have been
stopped too. Addition of the goroutine during the force close was added
because of the deadlock:

Trace:
1. link:force_close_notification
2. link:wipe_channel
3. peer:switch_remove_link
4. switch:stop_link
5. link:wait <-- deadlock
In this commit BOLT№2 retranmission logic for the channel link have
been added. Now if channel link have been initialised with the
'SyncState' field than it will send the lnwire.ChannelReestablish
message and will be waiting for receiving the same message from remote
side. Exchange of this message allow both sides understand which
updates they should exchange with each other in order sync their
states.
After addition of the channel reestablish message exchange we couldn't
use the Bandwidth() function, at least in the test framework.
After addition of the retransmission logic in the channel link, we
should make the onion blobs persistant, the proper way to do this is
include the onion blobs in the payment descriptor rather than storing
them in the distinct struct in the channel link.
In this commit htlc channeldb representation have been augmented
with onion blob field, and (de)serialisaion functions have been changed
to make the onion blob persistant.
@Roasbeef
Copy link
Member

Closing at this has been taken over and fully implemented.

@Roasbeef Roasbeef closed this Nov 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants