Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.17.0-rc1: ResourceMgr defaults clash with AcceleratedDHTClient #9405

Closed
Tracked by #8761
lidel opened this issue Nov 14, 2022 · 2 comments
Closed
Tracked by #8761

0.17.0-rc1: ResourceMgr defaults clash with AcceleratedDHTClient #9405

lidel opened this issue Nov 14, 2022 · 2 comments
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization topic/resource-manager Issues related to Swarm.ResourceMgr (resource manager)

Comments

@lidel
Copy link
Member

lidel commented Nov 14, 2022

Version

0.17.0-rc1

Config

> ipfs config --json Swarm.ConnMgr # these are defaults from 0.17.0-rc1
{
  "GracePeriod": "20s",
  "HighWater": 900,
  "LowWater": 600,
  "Type": "basic"
}

$ ipfs config --json Experimental.AcceleratedDHTClient
true

Description

Enabling Experimental.AcceleratedDHTClient with ResourceMgr does not work with defaults from 0.17.0-rc1, user gets vague ERROR message which is then overrun by resourcemanager errors:

2022-11-14T17:17:03.911Z	ERROR	fullrtdht	fullrt/dht.go:309	Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for the "dht-crawler" subsystem for details.
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 168 times with error "transient: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 177 times with error "transient: cannot reserve outbound connection: resource limit exceeded".
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-14T17:17:23.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 42 times with error "system: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:23.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-14T17:17:33.136Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 80627 times with error "system: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:33.136Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr

Given that we suggest enabling this setting to everyone running a Server, it feels we should do more here, either in docs or general UX.

Potential fix?

  • proper fix: make accelerated DHT client adjust its work based on limits + if limit is lower than X, print one-time WARNING recommending increasing the connection limits for best performance.
  • short term, or if proper fix is not feasible: we could update https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#accelerated-dht-client with example how to raise relevant limits, and link to it from the error message:
    2022-11-14T17:17:03.911Z	ERROR	fullrtdht	fullrt/dht.go:309	Accelerated DHT client was unable    to fully refresh its routing table due to ResourceMgr limits, which may degrade content routing. Consider increasing resource limits. See debug logs for the "dht-crawler" subsystem for details, and 
    
@lidel lidel added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization topic/resource-manager Issues related to Swarm.ResourceMgr (resource manager) labels Nov 14, 2022
@BigLep
Copy link
Contributor

BigLep commented Nov 15, 2022

Thanks for reporting @lidel . I assume the resulting 2*900=1800 System.Conns is not enough and that the resource manager enforces hard limits, vs. the ConnMgr is more tolerant to being blown past this number.

@ajnavarro : can you please share the specific config you were using in your testing as part of #9338 ?

@ajnavarro
Copy link
Member

I missed testing with acceleratedDHT client at some point on the last changes. I have a fix for it here: #9407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization topic/resource-manager Issues related to Swarm.ResourceMgr (resource manager)
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants