Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

address_store: Improve address tracking and add eviction algorithm #250

Merged
merged 31 commits into from
Oct 25, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Sep 24, 2024

This PR improves the transport manager's address tracking to keep a healthier view of addresses.

General changes:

  • Fixes a bug where listener addresses were tracked instead of dialing addresses (incoming connections may be established with ephemeral ports and there's no guarantee the remote is listening on them)
  • PeerIdMismatch error coming from the noise handshake redirects the address to the appropriate peer for healthier addresses
  • Addresses are tracked first, regardless of the peer state to ensure we update our view of the addresses (reachable or not)

Address Store changes:

  • The store is bounded to a maximum of 64 tracked addresses
  • Removing and reinsertion of addresses into the store is prone to error and instead the address store updates addresses in place
  • Introduces an eviction algorithm for tracking addresses with higher score

Testing Done

  • added extra tests to the address store
  • tested with subp2p-explorer for discovering kusama

This PR is part of a bigger refactor to keep track of healthier addresses.
It is essentially a breakdown of #248 for an easier review process.

@lexnv lexnv added the enhancement New feature or request label Sep 24, 2024
@lexnv lexnv self-assigned this Sep 24, 2024
Comment on lines 40 to 41
/// Score for a connection with a peer using a different ID than expected.
pub const DIFFERENT_PEER_ID: i32 = 50i32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically we can't establish a connection to this peer, so I would make the score the same as CONNECTION_FAILURE.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense! Have simplified the code kept around 1 connection failure and 1 address failure which punishes more, thanks for the suggestion! 🙏

src/transport/manager/address.rs Outdated Show resolved Hide resolved
src/transport/manager/address.rs Outdated Show resolved Hide resolved
src/transport/manager/address.rs Outdated Show resolved Hide resolved
src/transport/manager/address.rs Outdated Show resolved Hide resolved
continue;
};

// This can correspond to the provided peerID or to a different one.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generate an error if we try to add a peer's address with a different peer_id, because likely what is happening is not what the user / client code wants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added an warning here, good point 🙏


return 0usize;
// Add the provided peer ID to the address.
let address = address.with(Protocol::P2p(multihash::Multihash::from(peer.clone())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this create an invalid address if the address already contains the peer id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have refactored the continue with an if else branch, looked a bit confusing indeed.
I think we check that the last element is protocol::p2p, let me know if this is sounds right

src/transport/manager/mod.rs Outdated Show resolved Hide resolved
src/transport/manager/mod.rs Show resolved Hide resolved
src/transport/manager/address.rs Outdated Show resolved Hide resolved
@paritytech paritytech deleted a comment from dmitry-markin Oct 25, 2024
// that already failed (with negative score).
if num_addresses >= self.max_capacity {
let Some(min_record) = self.addresses.values().min().cloned() else {
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we lose the address due to returning if the values() are empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The num_addresses is initialized to self.addresses.len() 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed the num_addreses variable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance of getting None here? May be expect() with a comment then?

src/transport/manager/address.rs Outdated Show resolved Hide resolved
return 0usize;
// It is important to keep track of all addresses to have a healthy
// address store to dial from.
peer_addresses.entry(peer_id).or_insert_with(HashSet::new).insert(address);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not discard the address with invalid peer ID? If the address with invalid peer ID ended up here, something bad has already happened. In theory, we could rewrite the peer ID part, but ideally we should have returned an error to the user long before coming here with invalid peer ID.

This would also simplify the code below by inserting only one peer in the map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oki doki that makes sense. I think I'm convinced that tracking all peer addresses is not the best thing here, since the user explicitly provided addresses of a single "peer". I've discarded addresses that belong to a different peer,this should make the code a bit easier to follow, thanks for the suggestion 🙏

// that already failed (with negative score).
if num_addresses >= self.max_capacity {
let Some(min_record) = self.addresses.values().min().cloned() else {
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance of getting None here? May be expect() with a comment then?

src/transport/manager/handle.rs Outdated Show resolved Hide resolved
src/transport/manager/handle.rs Outdated Show resolved Hide resolved
lexnv and others added 3 commits October 25, 2024 16:45
@lexnv lexnv merged commit 2ef3751 into master Oct 25, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/enhanve-address-store branch October 25, 2024 14:03
lexnv added a commit that referenced this pull request Oct 28, 2024
This PR refactors the peer state. Previously the `AddressStore` and
`PeerState` where intertwined. The `AddressStore` contained an optional
`ConnectionID`, while the `PeerState` states contained a mandatory
score.

This PR separates the address store and the peer state, while moving the
logic of the peer state transitioning to a separate module. While at it,
have added documentation and testing for state machine transitions.

Changes include:
- Secondary connection from the PeerContext is merged into the
`PeerState`
- Connection Ids are removed from the Address Store
- Transport manager is refactored to use the new state machine, this
keeps a minimal code around in the manager

### Testing Done
- added extra tests to the peer state
- tested with subp2p-explorer for discovering kusama

This builds upon #250 for a
bigger refactor effort to track addresses in a healthier

Aims at improving:
- #239
- #238
- #237

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
lexnv added a commit that referenced this pull request Nov 4, 2024
## [0.8.0] - 2024-11-01

This release adds support for content provider advertisement and
discovery to Kademlia protocol implementation (see libp2p
[spec](https://github.com/libp2p/specs/blob/master/kad-dht/README.md#content-provider-advertisement-and-discovery)).
Additionally, the release includes several improvements and memory leak
fixes to enhance the stability and performance of the litep2p library.

### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](#246))
- kad: Providers part 6: stop providing
([#245](#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](#236))
- kad: Providers part 4: refresh local providers
([#235](#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](#255))
- kad/executor: Add timeout for writting frames
([#277](#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](#233))
- peer_state: Robust state machine transitions
([#251](#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](#250))
- kad: Remove unused serde cfg
([#262](#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](#272))
- transport: Fix pending dials memory leak
([#271](#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](#273))
- kad: Fix not retrieving local records
([#221](#221))

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 5, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later


### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants