Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kworker processes consume CPU using kernel 5.10.x and wifi #4780

Open
Roland74 opened this issue Dec 18, 2021 · 23 comments
Open

kworker processes consume CPU using kernel 5.10.x and wifi #4780

Roland74 opened this issue Dec 18, 2021 · 23 comments

Comments

@Roland74
Copy link

Roland74 commented Dec 18, 2021

Is this the right place for my bug report?
Presumably, because kworker is kernel related.

Describe the bug
In short words: The kworker processes consume to much CPU with newer kernels 5.10.x using wifi

After updating my Raspberry PI 1B and 3B from kernel 5.4.83 to the newer kernel 5.10.x the kworker processes continuously consumes more than 1,5% CPU (on Raspberry 3b) and more than 3% CPU on Raspberry 1B and load does not go to 0 anymore although the system has nothing to do (should be idle).

I found out that the problems occures only while using wlan (wifi). When using lan with eth0 the load goes down to 0.00 again.
I found a workaround for using the wlan and beeing not connected to the lan. When I take down eth0 with
/sbin/ifconfig eth0 down
then the kworker process calms down and the load goes to zero again.

To easily reproduce the problem I can bring eth0 up with
/sbin/ifconfig eth0 up

Then the kworker starts consuming CPU again. My Raspberry 1b shows e.g.

top - 10:48:30 up 12 days, 13:39,  1 user,  load average: 0.73, 0.34, 0.19
Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  3.4 sy,  0.0 ni, 95.2 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
MiB Mem :    477.6 total,    146.7 free,     86.8 used,    244.1 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    317.5 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 7829 root      20   0       0      0      0 D   3.9   0.0   0:04.90 kworker/0:2+events_power_efficient
 7831 roland    20   0   10188   3008   2524 R   1.6   0.6   0:00.16 top
 6905 root      20   0       0      0      0 I   0.3   0.0   0:14.43 kworker/u2:2-phy0
 7813 roland    20   0   12188   4188   3392 S   0.3   0.9   0:00.18 sshd
 7830 root      20   0       0      0      0 I   0.3   0.0   0:00.04 kworker/0:4-events
    1 root      20   0   33700   8072   6420 S   0.0   1.7   1:38.99 systemd
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.84 kthreadd

Then after taking eth0 down the kworker process does not consume CPU
/sbin/ifconfig eth0 down
=> afterwards top shows e.g.:

top - 10:52:46 up 12 days, 13:43,  1 user,  load average: 0.17, 0.24, 0.18
Tasks:  75 total,   1 running,  74 sleeping,   0 stopped,   0 zombie
%Cpu(s):   1.0/1.3     2[||                                                                                                  ]
MiB Mem :    477.6 total,    146.6 free,     86.9 used,    244.2 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    317.4 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  476 mysql     20   0  718184  66124  15108 S   0.0  13.5  50:46.06 mysqld
    9 root      20   0       0      0      0 S   0.0   0.0   4:30.50 ksoftirqd/0
  477 root      20   0  211308  23776  19128 S   0.0   4.9   3:15.12 apache2
   18 root      20   0       0      0      0 S   0.0   0.0   2:11.88 kcompactd0
    1 root      20   0   33700   8072   6420 S   0.0   1.7   1:38.99 systemd
   97 root      20   0   35512   8452   7600 S   0.0   1.7   0:46.70 systemd-journal

To reproduce you can switch on and of the kworker CPU consumption with
/sbin/ifconfig eth0 up
and
/sbin/ifconfig eth0 down

Actual behaviour
After booting the raspberry PIs using Wifi the kworker processes always consumes CPU.

Expected behaviour
It would be great if the kworker processes are calm without the need to call "/sbin/ifconfig eth0 down"

System
My systems are a Raspberry PI 1B and a Raspberry Pi 3B (32 and 64 Bit Raspbian) with kernel from 5.10.x. The older 5.4 kernel did not have this problem.
Here an example from the Raspberry PI 1B:
cat /etc/rpi-issue
Raspberry Pi reference 2019-09-26
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 80d486687ea77d31fc3fc13cf3a2f8b464e129be, stage2

uname -a
Linux antares 5.10.63+ #1488 Thu Nov 18 16:14:04 GMT 2021 armv6l GNU/Linux

Firmware version (vcgencmd version)
Dec 1 2021 15:07:23
Copyright (c) 2012 Broadcom
version 71bd3109023a0c8575585ba87cbb374d2eeb038f (clean) (release) (start_cd)

CPU:
/proc/cpuinfo
=>
processor : 0
model name : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS : 697.95
Features : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7

Hardware : BCM2835
Revision : 100000e
Serial : 00000000894b0b9f
Model : Raspberry Pi Model B Rev 2

Now Raspberry PI 3b e.g. in 64 Bit mode:
cat /etc/rpi-issue
Raspberry Pi reference 2021-10-30
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, c12b1df4ed6416fb0df33ba1731c5b13c1bdbdf8, stage2

kernel:
uname -a
Linux wega 5.10.63-v8+ #1488 SMP PREEMPT Thu Nov 18 16:16:16 GMT 2021 aarch64 GNU/Linux

firmware:
Nov 18 2021 16:18:09
Copyright (c) 2012 Broadcom
version d9b293558b4cef6aabedcc53c178e7604de90788 (clean) (release) (start)

CPU:
$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 4
Model name: Cortex-A53
Stepping: r0p4
CPU max MHz: 1200.0000
CPU min MHz: 600.0000
BogoMIPS: 38.40
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm crc32 cpuid
$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 38.40
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 1
BogoMIPS : 38.40
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 2
BogoMIPS : 38.40
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 3
BogoMIPS : 38.40
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

Hardware : BCM2835
Revision : a02082
Serial : 00000000c79dc44c
Model : Raspberry Pi 3 Model B Rev 1.2

I can bring more details form the Raspberry 3B, too (e.g. in 32 Bit mode) but I assume that all versions of Raspberry 1 to at least 3B are affected. At least I observe this problems with 32 Bit and 64 Bit kernels with all kernel versions from 5.10 to the latest version 5.10.63
I already opened an issue on https://forums.raspberrypi.com/viewtopic.php?p=1902788#p1902788

It would be great if you can help and fix this problem.
Kind regards,
Roland

@bou6
Copy link

bou6 commented Jan 11, 2022

Hi @Roland74 ,
I am not able to reproduce the Problem Rpi 3b+, kernel 5.10.
Can you please do following steps:

@Roland74
Copy link
Author

Roland74 commented Jan 18, 2022

H,i Bou6,
thank you for your answer. It's strange and a pity that my Rpi 1 and Rpi3b has the problem (on 32 Bit and 64 Bit) and that your Rpi 3B+ does not have these problems. So your load is 0.00 and top does not show the kworker processes consuming CPU on your Rpi 3b+ when you are using wlan?

Here is the output of dmesg -Hw while I called a few times echo l > /proc/sysrq-triggerecho l > /proc/sysrq-trigger`.

dmesg.txt

I am not sure how many time I should call echo l > /proc/sysrq-trigger and if I have calle it enough times?
Here is another trial in the morning doing dmesg -Hw > dmesg_new.txt
dmesg_new.txt

I installed perf 5.10. Here are the top lines of sudo perf report:
perf.txt

Should I call it more often or can I do something else?
Hope that helps to find the problem.

Kind regards,
Roland

@bou6
Copy link

bou6 commented Jan 19, 2022

using echo l > /proc/sysrq-trigger, you are requesting a backtrace from the the kernel, so you need to trigger it when there is a load. In the backtrace you can see what the PC is executing and figure out the reason of the load, In the attached files PC is always executing arch_cpu_idle, this why I cannot understand the reason of the problem.

@Roland74
Copy link
Author

Roland74 commented Jan 20, 2022

Hello Bou6,
thank you for your answer. As you see in my examples, the load is not huge caused by the problem but the kworker processes takes a few percent of the CPU and the load is not going to zero (the kworker processes continuously consumes more than 1,5% CPU (on Raspberry 3b) and more than 3% CPU on Raspberry 1B and load does not go to 0 anymore although the system has nothing to do (should be idle))

So my question is, is the load dropping to zero at your Raspberry Pi 3b+ when you are using the wlan instead of lan?

One of my problems is, that I am not used to reading the dmesg backtrace information and the information from perf. It seems that the output of perf could be more helpful.


+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] ret_from_fork                                                                                              ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] kthread                                                                                                    ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] worker_thread                                                                                              ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] process_one_work                                                                                           ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] phy_state_machine                                                                                          ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] phy_check_link_status                                                                                      ▒
+   24.64%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] lan87xx_read_status                                                                                        ▒
+   24.35%    24.35%  swapper          [kernel.kallsyms]                     [k] _raw_read_unlock_irqrestore                                                                                ▒
+   23.60%     0.02%  kworker/0:2+eve  [kernel.kallsyms]                     [k] mdiobus_read                                                                                               ▒
+   23.58%     0.03%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __mdiobus_read                                                                                             ▒
+   23.55%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] smsc95xx_mdiobus_read                                                                                      ▒
+   23.55%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __smsc95xx_mdio_read                                                                                       ▒
+   22.74%     0.05%  kworker/0:2+eve  [kernel.kallsyms]                     [k] usb_control_msg                                                                                            ▒
+   22.26%     0.08%  kworker/0:2+eve  [kernel.kallsyms]                     [k] usb_start_wait_urb                                                                                         ▒
+   18.06%     0.02%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __smsc95xx_read_reg                                                                                        ▒
+   18.04%     0.05%  kworker/0:2+eve  [kernel.kallsyms]                     [k] usbnet_read_cmd                                                                                            ▒
+   17.37%     0.07%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __usbnet_read_cmd                                                                                          ▒
+   14.63%    11.75%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __softirqentry_text_start                                                                                  ▒
+   14.63%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __irq_svc                                                                                                  ▒
+   14.63%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] bcm2836_arm_irqchip_handle_irq                                                                             ▒
+   14.63%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __handle_domain_irq                                                                                        ▒
+   14.63%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] irq_exit                                                                                                   ▒
+   14.47%     0.94%  kworker/0:2+eve  [kernel.kallsyms]                     [k] wait_for_completion_timeout                                                                                ▒
+   12.75%     0.24%  kworker/0:2+eve  [kernel.kallsyms]                     [k] schedule_timeout                                                                                           ▒
+   12.67%     0.00%  kworker/0:2+eve  [kernel.kallsyms]                     [k] __smsc95xx_phy_wait_not_busy

Hope that helps a little bit more.
Kind regards,
Roland

@pelwell
Copy link
Contributor

pelwell commented Jan 20, 2022

Which perf command did you run? I found the following gives sensible results:

$ sudo perf record -F 99 -a -g -- sleep 30
$ sudo perf report --stdio

@Roland74
Copy link
Author

Roland74 commented Feb 2, 2022

Hello pelwell,
thank you for your answer I used the command described in https://askubuntu.com/questions/33640/kworker-what-is-it-and-why-is-it-hogging-so-much-cpu#answer-421916

$ sudo perf record -g -a sleep 10
$ sudo perf report

Then I copied the lines from the screen.
I will try out your perf commands soon. My problem is, that I'm not an expert in kernel or kworker problems and so I expect that it would be more helpful if a kernel expert can reproduce my problems.

Today I've planned to try to reproduce it on the raspberry PI of a friend together with him. He usually uses lan so we just need to turn on wlan and use it. Then maybe we can come closer to the problem. Let's see.
Kind regards,
Roland

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Hello pelwell,
I scrolled through the tree output of "perf report --stdio" and found several lines with "---lan87xx_read_status". Maybe this gives a hint but as I mentioned I am completely new to the perf command. Here some of the lines that could be interesting


     6.31%     0.00%  kworker/3:1+eve  [kernel.kallsyms]         [k] phy_check_link_status
            |
            ---phy_check_link_status
               lan87xx_read_status
               |
               |--5.50%--mdiobus_read
               |          __mdiobus_read
               |          smsc95xx_mdiobus_read
               |          __smsc95xx_mdio_read
               |          |
               |          |--3.04%--__smsc95xx_phy_wait_not_busy
               |          |          __smsc95xx_read_reg
               |          |          usbnet_read_cmd
               |          |          |
               |          |          |--2.20%--__usbnet_read_cmd
               |          |          |          |
               |          |          |           --2.00%--usb_control_msg
               |          |          |                     usb_start_wait_urb
               |          |          |                     |
               |          |          |                     |--0.90%--wait_for_completion_timeout
               |          |          |                     |          schedule_timeout
               |          |          |                     |
               |          |          |                      --0.84%--usb_submit_urb

and below:

     6.31%     0.00%  kworker/3:1+eve  [kernel.kallsyms]         [k] lan87xx_read_status
            |
            ---lan87xx_read_status
               |
               |--5.50%--mdiobus_read
               |          __mdiobus_read
               |          smsc95xx_mdiobus_read
               |          __smsc95xx_mdio_read
               |          |
               |          |--3.04%--__smsc95xx_phy_wait_not_busy
               |          |          __smsc95xx_read_reg
               |          |          usbnet_read_cmd
               |          |          |
               |          |          |--2.20%--__usbnet_read_cmd
               |          |          |          |
               |          |          |           --2.00%--usb_control_msg
               |          |          |                     usb_start_wait_urb
               |          |          |                     |
               |          |          |                     |--0.90%--wait_for_completion_timeout
               |          |          |                     |          schedule_timeout
               |          |          |                     |
               |          |          |                      --0.84%--usb_submit_urb
               |          |          |                                |
               |          |          |                                 --0.69%--usb_hcd_submit_urb
               |          |          |                                           |
               |          |          |                                            --0.53%--dwc_otg_urb_enqueue
               |          |          |
               |          |           --0.59%--usb_autopm_get_interface
               |          |                     __pm_runtime_resume
               |          |                     _raw_spin_unlock_irqrestore
               |          |
               |          |--1.37%--__smsc95xx_read_reg
               |          |          usbnet_read_cmd
               |          |          |
               |          |           --1.24%--__usbnet_read_cmd
               |          |                     usb_control_msg
               |          |                     |
               |          |                      --0.83%--usb_start_wait_urb
               |          |
               |           --1.09%--__smsc95xx_write_reg
               |                     usbnet_write_cmd
               |                     |
               |                      --0.96%--__usbnet_write_cmd
               |                                usb_control_msg
               |                                |
               |                                 --0.77%--usb_start_wait_urb
               |
                --0.80%--usleep_range
                          schedule_hrtimeout_range
                          schedule_hrtimeout_range_clock
                          |
                           --0.65%--schedule

Here a short output from top on my Raspberry 3b:

top - 00:53:20 up 13 days,  5:02,  1 user,  load average: 0.65, 0.61, 0.63
Tasks: 116 total,   1 running, 115 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.4 sy,  0.0 ni, 98.5 id,  0.9 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :    922.8 total,     46.2 free,     45.7 used,    831.0 buff/cache
MiB Swap:    100.0 total,     99.7 free,      0.3 used.    812.5 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
10698 root      20   0       0      0      0 D   1.3   0.0   3:35.03 kworker/3:1+events_power_efficient
11059 pi        20   0   11348   3156   2580 R   1.0   0.3   0:00.11 top
   86 root      20   0       0      0      0 S   0.3   0.0   0:14.25 usb-storage
11058 root      20   0       0      0      0 I   0.3   0.0   0:00.02 kworker/1:0-events_power_efficient
    1 root      20   0   33708   8688   6908 S   0.0   0.9   0:52.17 systemd

That's all for tonight.
Cheers,
Roland

@pelwell
Copy link
Contributor

pelwell commented Feb 3, 2022

That's useful output - it clearly points the finger at a large number of calls to phy_check_link_status. There are a number of code paths that can trigger this, but they aren't visible because of the asynchronous nature of the kworkers.

I'd like to add some extra kernel debug output to find out what is triggering the polling. Are you comfortable building a kernel yourself? If not, can you confirm the version string (uname -a) of the kernel you are running now?

@pelwell
Copy link
Contributor

pelwell commented Feb 3, 2022

And could you also state which WLAN dongle you are using - the output of lsusb should suffice.

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Hello Pelwell,
thank you for your answer. As written above in the explanation the version string of the kernel is
Linux wega 5.10.63-v7+ #1488 SMP Thu Nov 18 16:14:44 GMT 2021 armv7l GNU/Linux

I currently would prefer not building the kernel myself and hope that one of the kernel team tries the reproduce the problem which is really quite easy and easier and less time consuming than building a kernel.
By the way my friend was able to reproduce the problem on his Raspberry PI 2, too. He just needed to setup wlan and disconnect the lan cable. Then after a reboot the kworker was consuming CPU as explained above. On the Raspberry PI the kworker takes about 1.7% to 2.2 % CPU. (Maybe my friend will write later. He tries to reproduce it on a Raspberry PI 4 today)
After taking eth0 down with the following command ( /sbin/ifconfig eth0 down ), the kworker processes does not consume CPU anymore. Then the load goes down to 0.00
When calling /sbin/ifconfig eth0 up then the kworker process starts to consume CPU again. So that's a task that normally needs a few minutes to reproduce. It would be great if one of the kernel team tries to reproduce it again.

Currently I mainly use my Raspberry PI 3b (hostname wega). So I don't use a WLAN dongle (WLAN is build in there). There is only an external harddrive on USB connected. Here the output of lsusb:

$ lsusb
Bus 001 Device 004: ID 1058:25a2 Western Digital Technologies, Inc. Elements 25A2
Bus 001 Device 003: ID 0424:ec00 Microchip Technology, Inc. (formerly SMSC) SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Microchip Technology, Inc. (formerly SMSC) SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Kind regards,
Roland

pelwell added a commit that referenced this issue Feb 3, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
@pelwell
Copy link
Contributor

pelwell commented Feb 3, 2022

I've traced the increased CPU usage to 094e36e, which was a fix for #4393. Increasing the polling interval within the wait loop to 150us reduces the CPU used by that kworker to 0.3% (when shown on top - sometimes it doesn't appear) with no observable detriment. Increasing the interval to 500us doesn't reduce the overhead any further.

The link detection is hampered by a bug that requires energy efficiency to be turned off while polling, and it has to remain in this state for at least a second otherwise the link up event can be missed (#4393). A more effective CPU (and power) reduction would be to reduce the rate at which the PHY state machine runs in the event there is no link, but we can get nearly the same effect by inserting an extra 2s wait in the PHY status check if no energy is detected.

With b0272c6, running perf found significantly less time is less time is spent running phy_check_link_status - around 0.7% in total.

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Hi Pelwell,
thank you for your answers and for working on the problem. That's sounds really good. :)
So you were able to reproduce and already fix it and the fix works for all kind of raspberry PIs, am I right?
Kind regards,
Roland

@pelwell
Copy link
Contributor

pelwell commented Feb 3, 2022

Yes, I was able to reproduce it. I've only tested the fix on a 3B, but it should work work on all RPis that suffered the problem.

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Thank you very much Pelwell. That's great news. :)
So the next kernel update will probably include your fix?

@pelwell
Copy link
Contributor

pelwell commented Feb 3, 2022

Yes it will. The last rpi-update kernel release was yesterday so it might take a few days, but all future kernel builds should show the reduced overhead.

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Super 👍

@Roland74
Copy link
Author

Roland74 commented Feb 3, 2022

Hello Phil,
I know that the kernel update with the fix will take a while. Meanwhile I did the current update from kernel 5.10.63-v7+ #1488 SMP to 5.10.92-v7+ #1514 SMP and I expected that the kworker CPU consumption stays equal but I was wrong. I observed that with the current kernel 5.10.92-v7+ #1514 SMP the kworker CPU consumption increased a little bit. So this maybe gives a hint that some changes between the kernels can increase or decrease the CPU consumption.
Here is the current output from uname -a:
Linux wega 5.10.92-v7+ #1514 SMP Mon Jan 17 17:36:39 GMT 2022 armv7l GNU/Linux

Here is an example output from top:

top - 23:30:14 up 46 min,  1 user,  load average: 0.66, 0.62, 0.57
Tasks: 116 total,   1 running, 115 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.3 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :    922.8 total,    763.1 free,     41.6 used,    118.1 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    829.8 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   35 root      20   0       0      0      0 D   2.0   0.0   0:52.77 kworker/0:2+events_power_efficient
  690 pi        20   0   11348   3168   2596 R   0.7   0.3   0:00.23 top
  692 root      20   0       0      0      0 I   0.3   0.0   0:00.01 kworker/u8:3-brcmf_wq/mmc1:0001:1
    1 root      20   0   33700   8548   6816 S   0.0   0.9   0:05.67 systemd
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp

And here the perf output:

    9.26%     0.00%  kworker/0:2-mm_  [kernel.kallsyms]   [k] process_one_work
            |
            ---process_one_work
               phy_state_machine
               phy_check_link_status
               lan87xx_read_status
               |
               |--8.61%--mdiobus_read

    9.26%     0.00%  kworker/0:2-mm_  [kernel.kallsyms]   [k] phy_check_link_status
            |
            ---phy_check_link_status
               lan87xx_read_status
               |
               |--8.61%--mdiobus_read
               |          __mdiobus_read
               |          smsc95xx_mdiobus_read
               |          __smsc95xx_mdio_read
 

     9.26%     0.00%  kworker/0:2-mm_  [kernel.kallsyms]   [k] lan87xx_read_status
            |
            ---lan87xx_read_status
               |
               |--8.61%--mdiobus_read
               |          __mdiobus_read

I don't know if this information is helpful but hopefully it's interesting.
Kind regards,
Roland

@pelwell
Copy link
Contributor

pelwell commented Feb 4, 2022

There is a build available including the fix for this issue. A minor complication is that we have just switched our main kernel releases to the 5.15 kernel, but you can install the last 5.10 build with your fix by running:

$ sudo rpi-update 9c6362635

The fix will come to 5.15 soon - it will be in the next build - but it missed the cut this time.

@Roland74
Copy link
Author

Roland74 commented Feb 5, 2022

Hello Phil,
thank you very much. I will try out your fix soon.

For the new 50% increase of CPU consumption of the kworker processes it could be another problem.
Were you able to reproduce and find the reason that led to an about 50% increase of the CPU consumption between
kernel 5.10.63-v7 and kernel 5.10.92-v7?
This new observed problem is probably a different problem. Your new fix will probably help to reduce the problem there, too. Nevertheless it would be interesting to investigate a little bit here, too as we know the new kernel that introduced the increase in the kworker CPU consumption. Hopefully the relevant kernel changelists are small and it's hopefully easy to understand which changes causes the increase and whether the current fix is enough ...

Kind regards,
Roland

pelwell added a commit that referenced this issue Feb 5, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit that referenced this issue Feb 5, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit that referenced this issue Feb 5, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 7, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 7, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 7, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
whdgmawkd pushed a commit to whdgmawkd/linux that referenced this issue Feb 11, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 16, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 16, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 16, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Jun 21, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Jun 23, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Jul 4, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Jul 10, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Jul 11, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Jul 18, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Jul 22, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Jul 23, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 1, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Aug 12, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Aug 12, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 15, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Aug 18, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Aug 18, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 23, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 23, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: #4780

Signed-off-by: Phil Elwell <[email protected]>
papamoose pushed a commit to papamoose/ubuntu-kernel-raspi-jammy that referenced this issue Sep 3, 2022
BugLink: https://bugs.launchpad.net/bugs/1960323

Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>

(cherry picked from commit d789cd3288a3ebc70e13719382558700e2730a68 rpi-5.15.y)
Signed-off-by: Juerg Haefliger <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Sep 30, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
@hdtodd
Copy link

hdtodd commented Dec 10, 2022

There is a build available including the fix for this issue. A minor complication is that we have just switched our main kernel releases to the 5.15 kernel, but you can install the last 5.10 build with your fix by running:

$ sudo rpi-update 9c6362635

The fix will come to 5.15 soon - it will be in the next build - but it missed the cut this time.

I had this issue on a Pi-1B and Pi-3B running Buster at 5.10.103, so I applied the rpi-update 9c... and it updated my kernel to 5.15.80. That ended up running Buster over 5.15.80. It seemed to be functioning well. But I couldn't find kernel headers for 5.15.80 and so couldn't compile the driver for my TP-Link T2U+ WiFi dongle. So I reverted the kernel to 5.10.103.

The most recent kernel headers on the firmware site (https://archive.raspberrypi.org/debian/pool/main/r/raspberrypi-firmware/ I think) are for 5.15.76. As I understand it, that version wouldn't have the patch included in 9c...

Is there an rpi-update that would install a kernel >= .80 for which I could find the corresponding headers? And if so, where might I look?

it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Dec 17, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Dec 17, 2022
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
@clicube
Copy link

clicube commented Feb 9, 2023

Hello, Is this issue already fixed?

On my Raspberry Pi 1B with WLAN dongle, the load average increases 0.7 higher while using wlan0 for network and eth0 up.
CPU is not consumed.
While this issue appears, the process state of kworker shown in ps is mostly always D - uninterruptible sleep. (But I'm not sure that this is related or not. )

$ ps aux
...
root      1319  0.2  0.0      0     0 ?        D    01:05   0:00 [kworker/0:2+events_power_efficient]
$ uname -r
5.15.84+

$ lsusb
Bus 001 Device 004: ID 056e:4008 Elecom Co., Ltd WDC-150SU2M
Bus 001 Device 003: ID 0424:ec00 Microchip Technology, Inc. (formerly SMSC) SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Microchip Technology, Inc. (formerly SMSC) SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

@hdtodd
Copy link

hdtodd commented Feb 9, 2023

clicube,

It's not clear that the problem I had and resolved is the one you're pointing to, but here's a description of mine in case it's helpful.

I used the rpi-update command from pelwell's comment of 4 Feb 2022 (above) to install that kernel (5.15.80, I think it was) over a 5.10.103 Raspbian. THAT kernel did, indeed, resolve the load increase problem on my Pi's (3B and 1). That is, with that 5.15.80 kernel installed, I could run with no Ethernet connection but with the WiFi connected, and the load was near zero; with the 5.10 kernel, the load was about 0.7 to 1.0 higher when using WiFi but with no EN connection.

I don't want to do a full OS upgrade to 5.15 because I run motion on those Pi's and I need raspistill and friends to be functioning. 5.15 replaces them and breaks motion.

I would be surprised if the patched kernel identified by pelwell above didn't carry into subsequent kernel versions. You might try installing the 5.15.80 kernel to see if that resolves the problem for you.

Unfortunately, I could never find a kernel+headers of a version > 5.15.76 to install. I needed the headers to compile the driver code for my WiFi dongle, and I didn't want to get into the business of downloading and creating my own kernel headers. So I've chosen, instead, to create an Ethernet loop-back plug and hope that it resolves the phantom load problem when the Pi is operating with just WiFi connected. I haven't tried that yet.

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2023

My downstream patch mentioned previously is no longer in the kernel (it was dropped in 5.19), but neither is the commit it modified that allowed more time for link-up. I believe that neither are necessary following @l1k's commit (3131a20), but please report if you believe you are still seeing excessive kworker loads.

it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Apr 20, 2023
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue May 19, 2023
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Jun 20, 2023
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
wanghao75 pushed a commit to gitee2github/raspberrypi-kernel that referenced this issue Jul 13, 2023
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
it-is-a-robot pushed a commit to openeuler-mirror/raspberrypi-kernel that referenced this issue Nov 29, 2023
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
jai-raptee pushed a commit to jai-raptee/iliteck1 that referenced this issue Apr 30, 2024
Polling at 100Hz for 1.5s consumes quite a bit of kworker time with no
obvious benefit. Reduce that polling rate to ~6Hz.

To further save CPU and power, defer the next poll if no energy is
detected.

See: raspberrypi/linux#4780

Signed-off-by: Phil Elwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants