-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consuming more CPU resources since upgraded to 2.7.1 #3928
Comments
I'd assume that 2.6 -> 2.7 lowered the resources needed and with "Revert "Fix double query execution" #3905" they increased again. |
Is anybody working on this issue? |
No, not at the moment - vacation time. How was the performance with 2.6.x for instance? Are there any slow queries being logged? Anything else which would indicate why CPU load is higher than before, e.g. specific process load monitoring & graphs? https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#analyse-your-linuxunix-environment |
I have the same problem with my instance since upgraded to 2.7.1. CPU Usage increased a lot. If i click on "Overview => Services" it takes round about 14 seconds to load all the objects which was really fast before. Running
|
This is the content of my slow query log. (long_query_time = 4) |
Last month CPU graph: htop screenshot: vmstat outputs:
iostat output:
sar outputs:
|
Ok, definitely MySQL. Apart from that, I can see Dashing consuming quite a few resources as well. So, your next bet will be MySQL performance analysis, e.g. with mysql_health queries and logs. I can only hint here as I would do on https://community.icinga.com as I'm busy with Icinga 2 Core. |
Maybe this info from mysql could help:
mytop output: |
It seems that nobody is working on this issue, at least not marked as bug. I have downgraded to old version 2.6.3 till this problem is fixed. |
Affected version: 2.7.1 I rolled back to 2.6.3 because 2.7.0 has a bug with sorts .. |
I also skipped 2.7.0 because of a bug with permissions. 2.7.1 is affected. 2.6.3 not. I have 1003 hosts and 3071 services |
Same here in docker swarm environment: |
In my case SELECT hostgroupsummary.hostgroup_alias, hostgroupsummary.hostgroup_name, SUM(CASE ..... explain SELECT hostgroupsummary.hostgroup_alias, hostgroupsummary.hostgroup_name, SUM(CASE WHEN host_state = 1 AND host_handled = 1 THEN 1 ELSE 0 END) AS hosts_down_handled, SUM(CASE WHEN host_state = 1 AND host_handled = 0 THEN 1 ELSE 0 END) AS hosts_down_unhandled, SUM(CASE WHEN host_state = 99 THEN 1 ELSE 0 END) AS hosts_pending, SUM(CASE WHEN host_state IS NOT NULL THEN 1 ELSE 0 END) AS hosts_total, SUM(CASE WHEN host_state = 2 AND host_handled = 1 THEN 1 ELSE 0 END) AS hosts_unreachable_handled, SUM(CASE WHEN host_state = 2 AND host_handled = 0 THEN 1 ELSE 0 END) AS hosts_unreachable_unhandled, SUM(CASE WHEN host_state = 0 THEN 1 ELSE 0 END) AS hosts_up, SUM(CASE WHEN service_state = 2 AND service_handled = 1 THEN 1 ELSE 0 END) AS services_critical_handled, SUM(CASE WHEN service_state = 2 AND service_handled = 0 THEN 1 ELSE 0 END) AS services_critical_unhandled, SUM(CASE WHEN service_state = 0 THEN 1 ELSE 0 END) AS services_ok, SUM(CASE WHEN service_state = 99 THEN 1 ELSE 0 END) AS services_pending, SUM(CASE WHEN service_state IS NOT NULL THEN 1 ELSE 0 END) AS services_total, SUM(CASE WHEN service_state = 3 AND service_handled = 1 THEN 1 ELSE 0 END) AS services_unknown_handled, SUM(CASE WHEN service_state = 3 AND service_handled = 0 THEN 1 ELSE 0 END) AS services_unknown_unhandled, SUM(CASE WHEN service_state = 1 AND service_handled = 1 THEN 1 ELSE 0 END) AS services_warning_handled, SUM(CASE WHEN service_state = 1 AND service_handled = 0 THEN 1 ELSE 0 END) AS services_warning_unhandled FROM (SELECT hg.alias COLLATE latin1_general_ci AS hostgroup_alias, hgo.name1 AS hostgroup_name, CASE WHEN (hs.problem_has_been_acknowledged + hs.scheduled_downtime_depth) > 0 THEN 1 ELSE 0 END AS host_handled, CASE WHEN hs.has_been_checked = 0 OR hs.has_been_checked IS NULL THEN 16 ELSE CASE WHEN hs.current_state = 0 THEN 1 ELSE CASE WHEN hs.current_state = 1 THEN 64 WHEN hs.current_state = 2 THEN 32 ELSE 256 END + CASE WHEN hs.problem_has_been_acknowledged = 1 THEN 2 WHEN hs.scheduled_downtime_depth > 0 THEN 1 ELSE 256 END END END AS host_severity, CASE WHEN hs.has_been_checked = 0 OR (hs.has_been_checked IS NULL AND hs.hoststatus_id IS NOT NULL) THEN 99 ELSE hs.current_state END AS host_state, NULL AS service_handled, 0 AS service_severity, NULL AS service_state FROM icinga_objects AS hgo INNER JOIN icinga_hostgroups AS hg ON hg.hostgroup_object_id = hgo.object_id AND hgo.is_active = 1 AND hgo.objecttype_id = 3 LEFT JOIN icinga_hostgroup_members AS hgm ON hgm.hostgroup_id = hg.hostgroup_id LEFT JOIN icinga_objects AS ho ON hgm.host_object_id = ho.object_id AND ho.is_active = 1 AND ho.objecttype_id = 1 LEFT JOIN icinga_hoststatus AS hs ON hs.host_object_id = ho.object_id WHERE ( ( (EXISTS (SELECT 1 FROM icinga_objects AS sub_hgo INNER JOIN icinga_hostgroups AS sub_hg ON sub_hg.hostgroup_object_id = sub_hgo.object_id AND sub_hgo.is_active = 1 AND sub_hgo.objecttype_id = 3 LEFT JOIN icinga_hostgroup_members AS sub_hgm ON sub_hgm.hostgroup_id = sub_hg.hostgroup_id LEFT JOIN icinga_objects AS sub_ho ON sub_hgm.host_object_id = sub_ho.object_id AND sub_ho.is_active = 1 AND sub_ho.objecttype_id = 1 WHERE ( ((TRUE) AND sub_hgm.host_object_id = ho.object_id) OR ho.object_id IS NULL)) AND (TRUE)) ) ) GROUP BY hgo.object_id, hg.hostgroup_id, hs.hoststatus_id UNION ALL SELECT hg.alias COLLATE latin1_general_ci AS hostgroup_alias, hgo.name1 AS hostgroup_name, NULL AS host_handled, 0 AS host_severity, NULL AS host_state, CASE WHEN (ss.problem_has_been_acknowledged + ss.scheduled_downtime_depth + COALESCE(hs.current_state, 0)) > 0 THEN 1 ELSE 0 END AS service_handled, CASE WHEN ss.current_state = 0 THEN CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 16 ELSE 0 END + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2 ELSE CASE WHEN ss.scheduled_downtime_depth > 0 THEN 1 ELSE 4 END END ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 16 WHEN ss.current_state = 1 THEN 32 WHEN ss.current_state = 2 THEN 128 WHEN ss.current_state = 3 THEN 64 ELSE 256 END + CASE WHEN hs.current_state > 0 THEN 1024 ELSE CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512 ELSE CASE WHEN ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 END END END END AS service_severity, CASE WHEN ss.has_been_checked = 0 OR (ss.has_been_checked IS NULL AND ss.servicestatus_id IS NOT NULL) THEN 99 ELSE ss.current_state END AS service_state FROM icinga_objects AS hgo INNER JOIN icinga_hostgroups AS hg ON hg.hostgroup_object_id = hgo.object_id AND hgo.is_active = 1 AND hgo.objecttype_id = 3 LEFT JOIN icinga_hostgroup_members AS hgm ON hgm.hostgroup_id = hg.hostgroup_id LEFT JOIN icinga_objects AS ho ON hgm.host_object_id = ho.object_id AND ho.is_active = 1 AND ho.objecttype_id = 1 LEFT JOIN icinga_hosts AS h ON h.host_object_id = ho.object_id LEFT JOIN icinga_services AS s ON s.host_object_id = h.host_object_id LEFT JOIN icinga_objects AS so ON so.object_id = s.service_object_id AND so.is_active = 1 AND so.objecttype_id = 2 LEFT JOIN icinga_hoststatus AS hs ON hs.host_object_id = ho.object_id LEFT JOIN icinga_servicestatus AS ss ON ss.service_object_id = so.object_id WHERE ( ( (EXISTS (SELECT 1 FROM icinga_objects AS sub_hgo INNER JOIN icinga_hostgroups AS sub_hg ON sub_hg.hostgroup_object_id = sub_hgo.object_id AND sub_hgo.is_active = 1 AND sub_hgo.objecttype_id = 3 LEFT JOIN icinga_hostgroup_members AS sub_hgm ON sub_hgm.hostgroup_id = sub_hg.hostgroup_id LEFT JOIN icinga_objects AS sub_ho ON sub_hgm.host_object_id = sub_ho.object_id AND sub_ho.is_active = 1 AND sub_ho.objecttype_id = 1 WHERE ( ((TRUE) AND sub_hgm.host_object_id = ho.object_id) OR ho.object_id IS NULL)) AND (TRUE)) ) ) GROUP BY hgo.object_id, hg.hostgroup_id, hs.hoststatus_id, ss.servicestatus_id) AS hostgroupsummary GROUP BY hostgroup_name, hostgroup_alias ORDER BY hostgroup_alias ASC LIMIT 25; JOIN/UNION/GROUP/SORT hell , in my case 8032040 rows to sort (Using temporary; Using filesort) |
So I think I'm able to reproduce this. But to be really sure, please post what filters you've set up for the |
I have two users. One has this as filter: |
Thank you. This confirms what I've noticed. |
Why was it removed from 2.7.2 milestone? |
I upgraded to 2.7.1 and some users complained that their dashsboards stopped working. They got an error message:
I found one entry in the error log on the server: [Mon Sep 23 09:52:04.852549 2019] [proxy_fcgi:error] [pid 8588] (70007)The timeout specified has expired: [client 159.216.207.30:30879] AH01075: Error dispatching request to :, referer: Downgrading/reverting to 2.6.3 solved the issue for the users. |
@bunghi By accident. I seem to have caught it during an bulk update 🤔 I'm now on it. |
Hey @nilmerg |
Is there any estimated date for 2.7.2 version to be released and available on the repositories? |
No, we still wait for more feedback. |
How can I apply the PR to my environment if i don't use git but official repo instead? |
Switch to directory Run the following: When it asks to patch file |
Patch output:
Should I worry about this message? can't find file to patch at input line 487 |
This went perfectly fine. The message was what you correctly answered with the path I've suggested. A restart is not necessary, the patch should be live now. |
I was filtering by hostgroup_name and had very long loading times for specific searches. |
This patch lowers refreshtime of my hostgroup filter view from 3-4 to 1 sec (1500 Hosts 8500 Services) |
Hello, Have you ever seen this error after patching? HostGroups:
Servicegroups:
|
@gbin2265 Is this a PostgreSQL database by chance? |
Yes, Postgres 9.5 (Redhat Software Collections) We also had the error "It seems that the PHP FPM service is not running..." and that is why we did the upgrade. After the upgrade, everything goes much faster , except we have now sql errors on hostsgroups / severvicegroups |
Thanks. Fixed it and updated the PR. |
We have done the fix and everything works perfect and faster! |
Don't know if this still belongs to this bug, but I found the following example which is still much slower with an active hostgroup_name filter set for this user (tested with patch). address string:
user with hostgroup_name filter:
user without filter:
Query via mysql:
without filter:
Query with hostgroup_name filter:
Query without filter:
Best regards, |
The queries are fine. The benchmark with the hostgroup filter however.. Are you sure you've experienced this with the patch applied? Because, both benchmarks should be identical in terms of what's measured. Though, the first one measures something that should not occur with the patch being applied. |
Sorry, you are right. I messed up a little bit with all the tests. ^^ Filter:
Address:
without patch - with hostgroup_name filter:
After that:
with patch - with hostgroup_name filter:
with patch - without hostgroup_name filter:
patched - with hostgroup_name filter - query via mysql client:
|
Well, about 70 seconds less with the patch. I'd call this an improvement. 🤔 And in case it's not faster than 50 seconds with v2.6.3 I'd say your database host is the issue. 😉 |
Please don't get me wrong. It's a very good patch and I'm thankful for your work. :) The only thing I don't understand is that the query via the mysql client needs ~10 seconds, but via the Web UI it needs ~50 seconds. |
Take a look at the benchmarks. The query alone is the query result iteration. That's what you'll do in the client, minus all the php code execution. On top of that the counting of all results and the query for the summary bar at the bottom. |
Describe the bug
Since I upgraded IcingaWeb2 from version 2.7.0 to 2.7.1 the CPU usage is very high and WebUI is slower than before. Checking with some monitoring tools I see that mysqld process is consuming a lot.
I'm wondering if I'm the only one with this problem..
Here is a system graph. Upgraded on Monday morning, restarted on Tuesday to see if it helps..
To Reproduce
Provide a link to a live example, or an unambiguous set of steps to reproduce this issue. Include configuration, logs, etc. to reproduce, if relevant.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Your Environment
Include as many relevant details about the environment you experienced the problem in
Icinga Web 2 version and modules (System - About):
Web browser used:
Google Chrome Version 76.0.3809.100 (Official Build) (64-bit)
Icinga 2 version used (
icinga2 --version
):php --version
):Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: