You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have two Thruk servers both running 3.10
They both connect to the same set of Nagios servers but recently we have been reciving timeouts while connecting to certain large Nagios backends
I can see the issue is livestatus running on the nagios servers hitting their connection limits:
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
However when we increase these limits it simply increases the number of connections until it hits them again
The livestatus socket is currently set to:
Define access restriction defaults
cps = 2000 3
instances = 500
per_source = 500
Thruk Version
Thruk 3.10
running on RHEL 7.6 / 7.7
Running as a VM at:
16 cores
31gb of ram
This has started happening at the same time to both boxes which are in two different datacentres connecting to multiple backends in different regions - no other changes have been made to the hardware/software/network in this time
Is there a timeout limit on livestatus somewhere that we are hitting?
Is there a maximum number of connections that Thruk can handle?
The servers affected are the largest on the platform
The text was updated successfully, but these errors were encountered:
When connecting to multiple large instances, i'd recommend using LMD anyway. It makes things faster. Also it would reduce the number of connections to the remote backends drastically.
Have you tried using LMD?
Describe the bug
We have two Thruk servers both running 3.10
They both connect to the same set of Nagios servers but recently we have been reciving timeouts while connecting to certain large Nagios backends
I can see the issue is livestatus running on the nagios servers hitting their connection limits:
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
Aug 07 08:34:54 nagserver.domain.com xinetd[2040]: FAIL: livestatus per_source_limit from=thrukserver
However when we increase these limits it simply increases the number of connections until it hits them again
The livestatus socket is currently set to:
Define access restriction defaults
Thruk Version
Thruk 3.10
running on RHEL 7.6 / 7.7
Running as a VM at:
16 cores
31gb of ram
This has started happening at the same time to both boxes which are in two different datacentres connecting to multiple backends in different regions - no other changes have been made to the hardware/software/network in this time
Is there a timeout limit on livestatus somewhere that we are hitting?
Is there a maximum number of connections that Thruk can handle?
The servers affected are the largest on the platform
The text was updated successfully, but these errors were encountered: