Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kad/executor: Add timeout for writting frames #277

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Oct 29, 2024

This PR adds a WRITE_TIMEOUT for sending frames to remote peers.

The WRITE_TIMEOUT is set to 15 seconds to mirror the READ_TIMEOUT.

Closes: #231

@lexnv lexnv added the enhancement New feature or request label Oct 29, 2024
@lexnv lexnv self-assigned this Oct 29, 2024
@@ -91,16 +93,24 @@ impl QueryExecutor {
/// Send message to remote peer.
pub fn send_message(&mut self, peer: PeerId, message: Bytes, mut substream: Substream) {
self.futures.push(Box::pin(async move {
match substream.send_framed(message).await {
Ok(_) => QueryContext {
match tokio::time::timeout(WRITE_TIMEOUT, substream.send_framed(message)).await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you already considered this, but a timeout is typically not needed for write operations, because send_framed(message)).await will return an error if ACK is not received for sent data on the stream level.

On the other hand, the timeout can complicate sending bigger payloads on slow connections, so I would be extra careful when adding it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, it should be safe to add a 15-sec timeout for writing if we already have a 15-sec timeout for reading 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed that ACK on the stream level implementation 🤔 Yep, I would keep the timeout more as a safety net in case we do anything wrong on the lower level implementations 🙏

@lexnv lexnv merged commit 9ad20ff into master Oct 30, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/kad-executor-timeout branch October 30, 2024 09:21
lexnv added a commit that referenced this pull request Nov 4, 2024
## [0.8.0] - 2024-11-01

This release adds support for content provider advertisement and
discovery to Kademlia protocol implementation (see libp2p
[spec](https://github.com/libp2p/specs/blob/master/kad-dht/README.md#content-provider-advertisement-and-discovery)).
Additionally, the release includes several improvements and memory leak
fixes to enhance the stability and performance of the litep2p library.

### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](#246))
- kad: Providers part 6: stop providing
([#245](#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](#236))
- kad: Providers part 4: refresh local providers
([#235](#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](#255))
- kad/executor: Add timeout for writting frames
([#277](#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](#233))
- peer_state: Robust state machine transitions
([#251](#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](#250))
- kad: Remove unused serde cfg
([#262](#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](#272))
- transport: Fix pending dials memory leak
([#271](#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](#273))
- kad: Fix not retrieving local records
([#221](#221))

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 5, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later


### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kad: Include a timeout for QueryExecutor::send_message
2 participants