-
Notifications
You must be signed in to change notification settings - Fork 51
Tasklet IRQs
ksoftirqd using 100% CPU on RHEL 7.1 and 7.2
03/01/2016
The ksoftirq daemon may be observed using high amounts of CPU after upgrading to RHEL 7.1 or RHEL 7.2.
The LinuxKI Toolset was used to collect additional detail about the softirqs by adding the irq subsystem as part of the KI dump collection scripts (runki -s irq or runki -e all). Analysis of the KI data, specifically the CPU/RunQ Analysis Report (kiinfo -kirunq) shows CPU 0 spending a lot of time processing softirqs:
Global CPU Counters
cpu node Total Busy sys usr idle hardirq_sys hardirq_user hardirq_idle softirq_sys softirq_user softirq_idle
0 [ 0] : 100.00% 0.00% 27.32% 0.00% 0.00% 0.00% 0.00% 0.00% 72.68% 0.00%
1 [ 0] : 29.42% 27.75% 1.00% 69.29% 0.07% 0.00% 0.11% 0.59% 0.01% 1.17%
2 [ 0] : 3.13% 1.64% 1.43% 96.50% 0.01% 0.01% 0.03% 0.02% 0.03% 0.34%
3 [ 0] : 3.92% 1.55% 2.30% 95.67% 0.01% 0.01% 0.03% 0.02% 0.03% 0.38%
4 [ 0] : 12.57% 5.43% 7.09% 87.16% 0.00% 0.00% 0.03% 0.02% 0.02% 0.24%
5 [ 0] : 3.68% 2.24% 1.34% 94.51% 0.01% 0.01% 0.35% 0.07% 0.03% 1.45%
Total 25.42% 6.44% 6.74% 73.89% 0.01% 0.00% 0.09% 0.12% 12.11% 0.60%
Further analysis of the CPU/Runq Analysis Report shows that the TASKLET softirq is the primary cause of the high CPU usage:
Soft IRQ events
===============
IRQ Name Count ElpTime
6 TASKLET 66396240 14.497936
3 NET_RX 175353 0.659607
4 BLOCK 41437 0.214926
1 TIMER 28060 0.010501
7 SCHED 566 0.002296
9 RCU 526 0.000309
8 HRTIMER 131 0.000159
2 NET_TX 18 0.000046
Total: 66642331 15.385780
This is likely due to a defect in the ioatdma module in RHEL 7.1 and RHEL 7.2. For more information, please review the following RedHat document:
RHEL 7.1 - ksoftirqd thread reports high CPU utilization due to bug in ioatdma driver
To resolve the issue, please upgrade the RHEL version to one of the following:
- Red Hat Enterprise Linux 7.1 : Upgrade to kernel-3.10.0-229.14.1.el7 from RHSA-2015-1778 or later
- Red Hat Enterprise Linux 7.2 : Upgrade to kernel-3.10.0-327.el7 from RHSA-2015-2152
- LinuxKI Mainpage
- LinuxKI Basic Documentation
- LinuxKI 7.10 - New!!
- LinuxKI Video Series
-
LinuxKI Warnings
- High System CPU utilization during memory allocations, deallocations, and page faults
- RunQ delays for critical processes can impact performance in a variety of ways
- Performance degradation on Microsoft Windows due to TCP interrupt timeouts
- Microsoft SQLServer scaling issues caused by SQL auto statistics
- Excessive page faults on KVM host
- Large IOs (>1MB) causing performance degradation on servers with PCIe Smart Array Controllers
- Oracle column tracking causing high CPU usage by Oracle processes
- Side-Channel Attack mitigation
- High SYS CPU time by processes reading /proc/stat such
- hugetlb_fault lock contention
- Excessive CPU time in pcc_cpufreq driver
- Excessive poll() calls by Oracle
- High wait time in md_flush()
- High BLOCK SoftIRQ times
- Network Latency Tuned profile
- Power vs. Performance
- Unaligned Direct IO
- NUMA Balancing
- NUMA Off
- SAP DB2 semget
- Semaphore Lock Scaling
- Tasklet IRQs
- Unterminated ixgbe NICs
- Poor Direct IO Reads
- RHEL 7.3 / SLES 12SP2 Multipath bug
- Barrier Writes