Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txnsync: delete-able incoming message queue #3126

Merged
merged 12 commits into from
Oct 25, 2021

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Oct 22, 2021

Summary

Running betanet, we've seen errors such as

unable to enqueue incoming message into peer incoming message backlog. disconnecting from peer.

as well as multiple

Peer 162.202.32.72:56543 disconnected: SlowConnection

The goal of this PR is to remove from the pending incoming message queue entries that are associated with network peers that have been disconnected ( by the network package ), or have been scheduled for disconnection ( by the transaction synch package ).

Removing these would increase the "available" space in the incoming message queue and would prevent redundant disconnections.

Test Plan

  • unit tests updated.
  • ran s1 network with and without load, successfully.

@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2021

Codecov Report

Merging #3126 (92a8c4c) into master (b93ad64) will increase coverage by 0.07%.
The diff coverage is 88.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3126      +/-   ##
==========================================
+ Coverage   43.69%   43.77%   +0.07%     
==========================================
  Files         390      391       +1     
  Lines       86700    86877     +177     
==========================================
+ Hits        37887    38033     +146     
- Misses      42788    42807      +19     
- Partials     6025     6037      +12     
Impacted Files Coverage Δ
txnsync/incoming.go 94.82% <78.94%> (-3.54%) ⬇️
txnsync/incomingMsgQ.go 89.44% <89.44%> (ø)
txnsync/mainloop.go 87.32% <100.00%> (+1.47%) ⬆️
txnsync/outgoing.go 93.10% <100.00%> (+0.09%) ⬆️
crypto/merkletrie/trie.go 66.42% <0.00%> (-2.19%) ⬇️
crypto/merkletrie/node.go 91.62% <0.00%> (-1.87%) ⬇️
data/abi/abi_type.go 92.03% <0.00%> (-1.00%) ⬇️
ledger/acctupdates.go 64.25% <0.00%> (-0.60%) ⬇️
catchup/service.go 69.84% <0.00%> (ø)
data/transactions/verify/txn.go 44.29% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b93ad64...92a8c4c. Read the comment docs.

@tsachiherman tsachiherman requested a review from a user October 22, 2021 14:25
@tsachiherman tsachiherman marked this pull request as ready for review October 22, 2021 14:32
ghost
ghost previously approved these changes Oct 22, 2021
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I had a couple of questions.

txnsync/incomingMsgQ.go Outdated Show resolved Hide resolved
}
goto writeOutboundMessage
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the mutex be locked while waiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you find a code-path that would get to

imq.enqueuedPeersCond.Wait()

without the lock being taken ?
the imq.enqueuedPeersCond would internally unlock the mutex, wait for signal and then take back the lock.

txnsync/incomingMsgQ.go Outdated Show resolved Hide resolved
outboundPeerCh chan incomingMessage
enqueuedPeersMap map[*Peer]*queuedMsgEntry
messages queuedMsgList
freelist queuedMsgList
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the need for having both the messages and freelist lists? Can we just use the list.List which also holds the length of the linked list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking about that; however, list.List is not that optimal ( both because of the interface{} casting, as well as for the dynamic allocation ).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use one message linked list and keep a counter for the number of items instead of having 2 lists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's doable - but it does make the testing for that a bit less straight-forward. I'd like to see that this solution addresses the issue first; if it does, we can merge the two lists into a single list + counter.

@tsachiherman tsachiherman merged commit ca3e877 into algorand:master Oct 25, 2021
@tsachiherman tsachiherman deleted the tsachi/deletable_msgq branch October 25, 2021 19:17
onetechnical pushed a commit that referenced this pull request Oct 25, 2021
## Summary

Running betanet, we've seen errors such as
```
unable to enqueue incoming message into peer incoming message backlog. disconnecting from peer.
```
as well as multiple
```
Peer 162.202.32.72:56543 disconnected: SlowConnection
```

The goal of this PR is to remove from the pending incoming message queue entries that are associated with network peers that have been disconnected ( by the network package ), or have been scheduled for disconnection ( by the transaction synch package ).

Removing these would increase the "available" space in the incoming message queue and would prevent redundant disconnections.

## Test Plan

- [x] unit tests updated.
- [x] ran s1 network with and without load, successfully.
tsachiherman added a commit to tsachiherman/go-algorand that referenced this pull request Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants