Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reputation updates are handled inefficiently #2257

Open
altonen opened this issue Nov 9, 2023 · 7 comments
Open

Reputation updates are handled inefficiently #2257

altonen opened this issue Nov 9, 2023 · 7 comments
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@altonen
Copy link
Contributor

altonen commented Nov 9, 2023

While debugging something else, I looked into how reputation updates are handled in sc-network. Currently each protocol reports reputation updates over NetworkPeers::report_peer(). This sends a message over an unbounded channel to NetworkWorker which updates the reputation of the peer.

With 200 connections, the channel length peaked at 7000+ messages, most of them reputation updates coming from GRANDPA/transactions. This puts a lot of pressure on NetworkWorker which is already busy with notifications/requests. Converting NetworkService to update the reputation by directly calling PeerStore was also not optimal because there is a lot of contention on the mutex protecting PeerStore which caused the node to become visibly laggier as both GRANDPA and transactions are busy updating the reputation.

Code that reports peers should be refactored to be more aware of the cost of updating the reputation and it should try batching the updates if possible. Alternatively, introducing per-protocol reputations (side-stepping contention on the global mutex) which would be combined into one reputation by PeerStore at periodic intervals could help with the issue.

@altonen altonen added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Nov 9, 2023
@burdges
Copy link

burdges commented Nov 9, 2023

We should probably re-evaluate how critical reputation is. It's way easier to track negatives than positives.

Atomic increment operations sound faster than anything complex like Mutex here.

@altonen
Copy link
Contributor Author

altonen commented Nov 10, 2023

I agree that we should re-evaluate it because right now for example banning is not working as effectively as it could. But I believe this could be an issue short-term so we should have some fix for it, even if it's not perfect or is meant only as a temporary workaround before the system is redesigned.

I didn't understand your point about atomic increments though. Reputations are updated in multiple different protocols for different peers so the set of connected peers has to be synchronized between the protocols somehow.

@eskimor
Copy link
Member

eskimor commented Nov 10, 2023

Indeed. In parachain consensus we have peers in the reserved set. I wondered for quite some time how much point reputations really have and whether we should not just ensure that we are processing all (validator) connections fairly and be done with it. E.g. if a validator is flooding us, but we make sure that it only really affects our abilities to process messages from this very validator, then let him flood - who cares?

@burdges
Copy link

burdges commented Nov 10, 2023

Atomic increments need only a &Atomic<T> with T: Sync. They do block the writing thread on other writers, but never call lock explicitly. We do not necessarily care if readers get slightly outdated reputations, so we could maybe read form these atomic types without doing any synchronization. We do need to worry this results in only slightly outdated reputations though, which could be handled in other ways, but then complexity becomes a concern, so maybe a RwLock makes more sense. I'm not sure how a Mutex ever made sense here.

@burdges
Copy link

burdges commented Nov 19, 2023

Afaik it's premature to rip out reputation. It'd just be better if we really understood what reputation does.

Also, we know cheaper flavors which maybe suffice:

  • A reputation system could perhaps track only negative events locally, which makes it cheap or free normally, and cheaper than attacks when under attack.
  • A reputation system could pull positive events only from the chain state or our upcoming rewards system, not track them locally, which costs less than tracking every message.

@altonen
Copy link
Contributor Author

altonen commented Nov 20, 2023

I agree and I'm not suggesting we alter the behavior before we understand what we want from it. Like I mentioned in OP, if we wish to keep the current system, our protocols should be refactored to not update the reputation on every message. For example, the transaction protocol sends two reputation updates per transaction: one to initially decrease the reputation and one to refund/reward after the TX has been validated. I think on the Polkadot side reputations updates are already aggregated to some extent which helps with this situation.

I think we should find uses cases for reputations if we want to have that system. Banning is useful but other than that we only use the reputations for outbound connections for non-reserved peers. Maybe that's enough use for reputations?

bkontur pushed a commit that referenced this issue May 16, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 17, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 17, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 17, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 20, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 21, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 22, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue May 23, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
@nazar-pc
Copy link
Contributor

There is no channel after #2392 was merged, is this still a practical issue or can this be closed now?

bkontur pushed a commit that referenced this issue May 30, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jun 4, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jun 5, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jun 7, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 4, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 17, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 23, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 26, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 27, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 29, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 30, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
bkontur pushed a commit that referenced this issue Jul 31, 2024
* allow delivery confirmations on closed outbound lanes

* fix benchmarks compilation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
Status: Backlog 🗒
Development

No branches or pull requests

4 participants