Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize find_kmod_module_from_sysfs_node #408

Merged
merged 2 commits into from
Jun 29, 2024

Conversation

marcan
Copy link
Contributor

@marcan marcan commented Jun 23, 2024

find_kmod_module_from_sysfs_node() is called for every platform device in the system via find_suppliers(). In turn, this calls kmod_module_new_from_lookup() for every device modalias. This is an expensive call that reads the modalias files every single time from scratch.

On many platforms, there are many identical platform devices (e.g. multiple serial ports, or dozens or hundreds of power domain devices). Therefore, it's worth memoizing this so we only perform the expensive lookup once per unique modalias.

This cuts down dracut generation time on an Apple M1 Pro MacBook Pro from 63 seconds to 24 seconds, give or take, after 80f2caf (in fact, this new code/behavior in dracut-ng was the root cause of the major perf regression that was improved in that commit).

Changes

Memoize find_kmod_module_from_sysfs_node() using a hashmap.

Checklist

  • I have tested it locally
  • I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

@github-actions github-actions bot added the dracut-install Issues related to dracut install label Jun 23, 2024
This variant of hashmap_get() returns whether the item exists, which
allows distinguishing a NULL item from a nonexistent one.
@marcan
Copy link
Contributor Author

marcan commented Jun 23, 2024

Addendum: That was testing on top of the Fedora downstream fork plus the mentioned commit. On main here, #328 (I think) improved generation time to 26 seconds, and then this PR on top brings it down to 17 seconds, almost as fast as before the regression.

find_kmod_module_from_sysfs_node() is called for every platform device
in the system via find_suppliers(). In turn, this calls
kmod_module_new_from_lookup() for every device modalias. This is an
expensive call that reads the modalias files every single time from
scratch.

On many platforms, there are many identical platform devices (e.g.
multiple serial ports, or dozens or hundreds of power domain devices).
Therefore, it's worth memoizing this so we only perform the expensive
lookup once per unique modalias.

This cuts down dracut generation time on an Apple M1 Pro MacBook Pro
from 26 seconds to 17 seconds, give or take (which is close to the
performance prior to 3de4c73, which introduced a major regression
which has been incrementally improved in prior commits already).
@LaszloGombos LaszloGombos merged commit 6500e95 into dracut-ng:main Jun 29, 2024
90 checks passed
@bdrung
Copy link
Contributor

bdrung commented Jul 1, 2024

I tested this change on a Raspberry Pi Zero 2W. Ubuntu 24.04 (noble) with dracut-install 060+5-1ubuntu3.1 (with linux 6.8.0-1006.6 on 2024-07-01):

bdrung@zero2w:~$ sudo hyperfine --warmup 1 -r 10 "update-initramfs -u"
Benchmark 1: update-initramfs -u
  Time (mean ± σ): 248.054 s ± 5.569 s [User: 67.410 s, System: 169.412 s]
  Range (min … max): 238.909 s … 257.384 s 10 runs

With those two commits applied:

$ sudo hyperfine --warmup 1 -r 10 "update-initramfs -u"
Benchmark 1: update-initramfs -u
  Time (mean ± σ):     249.595 s ±  7.243 s    [User: 66.584 s, System: 170.342 s]
  Range (min … max):   240.879 s … 260.506 s    10 runs

So with this hardware and this setup there is no measurable performance improvement.

@marcan
Copy link
Contributor Author

marcan commented Jul 5, 2024

This is a win for machines with many duplicate devices. On Apple machines the biggest offender here is the power domains, each of which is one device and there may be hundreds of them. Presumably on the rPi that is not the case. I think you need to do what I did and strace -tt the process and try to figure out where most of the time is being spent, and whether it can be improved.

@bdrung
Copy link
Contributor

bdrung commented Jul 5, 2024

A quick test:

sudo strace -tt /usr/lib/dracut/dracut-install -D /tmp/foo/ -m raid1 2> measure
$ cut -d ' ' -f 2 measure | sed 's/(.*//' | sort | uniq -c| sort| tail -n 5
   1168 close
   1169 openat
   1428 getdents64
   1458 fstat
   3852 readlinkat

It looks like most time is spend in traversing /sys:

$ grep -E '(readlinkat|openat)' measure | cut -d ' ' -f 3 | sort | uniq -c | sort -n | tail -n 8
     97 "/sys/devices/virtual",
     97 "/sys/devices/virtual/devlink",
    149 "power",
    359 "..",
    555 "/sys",
    555 "/sys/devices",
    589 "/sys/devices/platform/soc",
    653 "/sys/devices/platform",

@marcan
Copy link
Contributor Author

marcan commented Jul 5, 2024

Syscall count is not a useful proxy for time spent. The point of -tt is to get timestamps so you can manually look through the log manually and see where time is being spent overall. We know the general time bloat is the /sys scan to find consumer/producer device relationships, and most of the time within that is spent calling into kmod. The question is whether there is more low hanging fruit to speed that up (like this PR for machines with lots of identical devices) or whether dracut-install should consider a more drastic measure like caching things to a file in /run across invocations or something like that.

@bdrung
Copy link
Contributor

bdrung commented Jul 5, 2024

The log output seems to be evenly spread over the timeline. I see access to /sys/devices/platform recurring:

$ grep '"/sys/devices/platform"' measure 
10:31:16.621897 newfstatat(AT_FDCWD, "/sys/devices/platform", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
10:31:16.623205 openat(AT_FDCWD, "/sys/devices/platform", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
10:31:16.707017 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)
10:31:16.711080 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)
[...]
10:31:22.067747 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)
10:31:22.071025 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)
10:31:22.073957 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)
10:31:22.078695 readlinkat(AT_FDCWD, "/sys/devices/platform", 0xfffffb2cfff0, 1023) = -1 EINVAL (Invalid argument)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dracut-install Issues related to dracut install
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants