Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume does not work (potential #cee5853 regression?) #924

Open
davidgfnet opened this issue Sep 10, 2020 · 23 comments · Fixed by #2160
Open

Resume does not work (potential #cee5853 regression?) #924

davidgfnet opened this issue Sep 10, 2020 · 23 comments · Fixed by #2160
Assignees
Labels
bug Our bugs resume Issues related to the resume module

Comments

@davidgfnet
Copy link

davidgfnet commented Sep 10, 2020

Referencing this bug which did not get any love at all:
https://bugzilla.redhat.com/show_bug.cgi?id=1842279

Fedora 32, x64 fresh installation. Using LVM + LUKS (that is, LUKS encrypted LVM) that contains rootfs + swap partitions.
This might be potentially happening since #715

After a fresh Fedora install in the above indicated setup, /sys/power/resume is "0:0", which causes the dracut setup scripts to not install the required resume-from-disk machinery to recover from hibernation. This behaves like a loop since the lack of a defined swap partiton results in "dracut -f" generating ramdisks that do no contain the kernel arg "resume=foo" and so forth.

IMHO the script should gracefully handle the case where there's no swap partition defined for resume but one is available.

Please assume good intent. I'm not very familiar with Dracut nor Linux booting system so I could be wrong. However more users seem to experience this, suggesting there's indeed a bug and it is relatively easy to solve (see Bugzilla)

@davidgfnet davidgfnet added the bug Our bugs label Sep 10, 2020
@davidgfnet davidgfnet changed the title Resume does not work ( Resume does not work (potential #cee5853 regression?) Sep 10, 2020
@stale
Copy link

stale bot commented Dec 16, 2020

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Dec 16, 2020
@davidgfnet
Copy link
Author

Perhaps @danimo or @nabijaczleweli can comment on this?

@stale stale bot removed the stale communication is stuck label Dec 16, 2020
@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Dec 16, 2020

I'd argue that the post-#715 behaviour is strictly more correct logically in that host-only means (initrd has resume) <=> (host has resume), but that's worth very little if it's confusing.

Does the diff below work for you? It should re-add the old behaviour.

diff --git a/modules.d/95resume/module-setup.sh b/modules.d/95resume/module-setup.sh
index 96c2573e..9f16537b 100755
--- a/modules.d/95resume/module-setup.sh
+++ b/modules.d/95resume/module-setup.sh
@@ -13,7 +13,7 @@ check() {
     # Only support resume if hibernation is currently on
     # and no swap is mounted on a net device
     [[ $hostonly ]] || [[ $mount_needs ]] && {
-        swap_on_netdevice || [[ "$(cat /sys/power/resume)" == "0:0" ]] && return 255
+        swap_on_netdevice || [[ "$(cat /sys/power/resume)" == "0:0" ]] || { echo "${host_fs_types[@]}" | grep -qwE 'swap|swsuspend|swsupend'; } && return 255
     }
 
     return 0

Alternatively, $swap_devs looks like it could maybe be checked for emptiness instead?

@davidgfnet
Copy link
Author

Can't easily check it, but I'm assuming that should work.
I guess there's not much alternative to it, isn't it? It's a chicken and egg problem: you can't enable resume because the config checks whether it is enabled to enable it in the cmdline.
I'm curious why am I only seeing this? There should be more users with the same issue right?
I can say that this happened to both my computers after installing a fresh fedora 32 system with a luks encrypted fs. (That is: Luks + LVM (ext4 + swap). Any ideas? Thanks!!

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Dec 17, 2020

I mean, you can either manually tell dracut to include the resume module, or write maj:min of your swap partition to /sys/power/resume from your normal system after you boot, so I wouldn't say it's a chicken-egg as much as it's just a gotcha that could be avoided.
As for the second point, no idea, I don't use LUKS or LVM

@davidgfnet
Copy link
Author

Lemme quote myself (I had to re-read my bugzilla bug to get more context since I forgot a bit why this is happening):

"[...]If I ever boot my kernel with resume disabled that means that if I rebuild my initramfs (or upgrade the kernel via yum) the support for resume will be disabled. Is this what we really want? And how does this work during the initial install? I'm assuming there's no swap so the generation of the initramfs during a fresh install has to be somehow different right?"

Does that make sense? I think it is indeed true that if one boots without resume support (let's say you wanna ignore your hibernation image and boot normally) and updates the machine (via dnf), initramfs will be rebuilt and resume support will be dropped. Not sure whether that's desirable, but I also understand it might make sense in other situations or use cases that I do not use (perhaps!).

Many thanks!!

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jan 2, 2021

dracut.conf(5) lists add_dracutmodules+=" <dracut modules> ", if you add resume there you will have it unconditionally. Installer can pass --add resume as well, according to dracut(8), or just enable resume properly, if that's its domain (I wouldn't know, I've never installed Fedora).

I'm, personally, confused so as to what the issue is – hostonly exists specifically to avoid pulling in modules that aren't explicitly needed, and whether you support resume or not is host-specific configuration (even my patch upthread is, I'd argue, wrong from a purist's perspective, since I have at least one machine with more RAM than persistent storage, and it cannot be meaningfully hibernated (and, therefore, resumed), but has some swap).

If you boot with resume off, for one reason or another, you can turn it on later by echoing maj:min into /sys/power/resume to get your host to its normal state. Before then, your host is degraded (or, well, in maintenance, since you chose to do it), and, since you know what's different from the usual, you can take measures to work around the temporary misconfiguration.

@davidgfnet
Copy link
Author

Well let's try to simplify the issue if that makes it hard to tacke:
Suspend to disk does not work on a freshly installed fedora 32 system. The underlying issue is that resume is not present and therefore the swap contents are being ignored.

Also there's updates in https://bugzilla.redhat.com/show_bug.cgi?id=1842279 that indicate this might still be the case in Fedora 33.

@danimo danimo self-assigned this Jan 25, 2021
@javispedro
Copy link

I have reproduced this without LUKS; if at any point you boot the system with the swap device missing, AND run dracut (i.e. after a kernel update), then dracut will notice /sys/power/resume is 0:0 and don't install the resume module in the next initrd. Therefore you enter a vicious circle since /sys/power/resume will never be set again.

openSUSE does pass resume=UUID=xxx via bootloader cmdline, but this is not resolved by the kernel and also ignored by initrd when it has no resume module. The only way to break the cycle is to set /sys/power/resume manually and regenerate initrd; or to force resume module in dracut.conf . Guess distro policy should reflect this.

While debugging this I also noticed that I should probably also file a bug re 480aa96 . swap_on_netdevice check seems not to be doing anything, since swap_devs array contains /dev/xxx-style names but block_is_netdevice expects MAJOR:MINOR format. :/

@Conan-Kudo
Copy link
Member

This is afflicting Mageia too (mga#28528).

@danimo, have you had a chance to look at solving this issue?

@dtardon
Copy link
Contributor

dtardon commented Apr 6, 2021

If you boot with resume off, for one reason or another, you can turn it on later by echoing maj:min into /sys/power/resume to get your host to its normal state. Before then, your host is degraded (or, well, in maintenance, since you chose to do it), and, since you know what's different from the usual, you can take measures to work around the temporary misconfiguration.

So, one installed a new machine, set up hibernation by writing to /sys/power/resume, rebooted to test it and... voila... resume has failed. And it's completely non-obvious why it's failed. There is no hint anywhere that initrd needs to be regenerated.

In addition, the expectation that hibernation does have to be configured explicitly is simply not true anymore. When one uses "systemctl hibernate", one doesn't have to configure anything. One doesn't even need to know about the existence of /sys/power/resume...

@bluewww
Copy link

bluewww commented Apr 23, 2021

I also got hit by this problem. I also managed to work around by manually writing to /sys/power/resume and re-running dracut. Luckily I didn't waste too much time on this issue because I somehow realized that the output of dracut didn't say anything about the resume= parameter. One could easily argue that this is not a bug, but definitely it's user unfriendly.

@tblume
Copy link
Collaborator

tblume commented Jul 14, 2021

Since commit 733c71c resume is exclusively done by systemd-hibernate-resume on systemd based systems.
The manpage of systemd-hibernate-resume shows:

-->
systemd-hibernate-resume only supports the in-kernel hibernation implementation, known as swsusp[1]. Internally, it works by writing the major:minor of specified device node to /sys/power/resume.
--<

The device node is handed over by systemd-hibernate-resume-generator which reads the resume parameter from /proc/cmdline.

So, the check for the device in /sys/power/resume is indeed wrong when systemd-hibernate-resume is used.
Question is how to better check if hibernation is is enabled on the host (that is what #715 tried to implement).

lnykryn added a commit to lnykryn/dracut-rhel9 that referenced this issue Aug 24, 2021
We can't always correctly decide if the resume module is needed.
So let's play safe and always include it.

see: dracutdevs/dracut#924

RHEL-only

Resolves:#1926544
lnykryn added a commit to redhat-plumbers/dracut-rhel9 that referenced this issue Aug 24, 2021
We can't always correctly decide if the resume module is needed.
So let's play safe and always include it.

see: dracutdevs/dracut#924

RHEL-only

Resolves:#1926544
lnykryn added a commit to redhat-plumbers/dracut-rhel9 that referenced this issue Aug 24, 2021
We can't always correctly decide if the resume module is needed.
So let's play safe and always include it.

see: dracutdevs/dracut#924

RHEL-only

Resolves: #1926544
@jmfernandez
Copy link
Contributor

jmfernandez commented Sep 17, 2021

I'm a Gentoo user, and I have faced several issues related to resume not working with https://github.com/bircoph/suspend . As you can see in

resume="$(label_uuid_to_dev "$resume")"

function label_uuid_to_dev is called, but as you can see here:

label_uuid_to_dev() {
local _dev
_dev="${1#block:}"
case "$_dev" in
LABEL=*)
echo "/dev/disk/by-label/$(echo "${_dev#LABEL=}" | sed 's,/,\\x2f,g;s, ,\\x20,g')"
;;
PARTLABEL=*)
echo "/dev/disk/by-partlabel/$(echo "${_dev#PARTLABEL=}" | sed 's,/,\\x2f,g;s, ,\\x20,g')"
;;
UUID=*)
echo "/dev/disk/by-uuid/$(echo "${_dev#UUID=}" | tr "[:upper:]" "[:lower:]")"
;;
PARTUUID=*)
echo "/dev/disk/by-partuuid/$(echo "${_dev#PARTUUID=}" | tr "[:upper:]" "[:lower:]")"
;;
esac
}

when the resume device is already a path to a device, no condition is matched and it returns nothing, which leads to the else block at

else
{
if [ -x /usr/sbin/resume ]; then
printf -- '%s' 'SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="suspend|swsuspend|swsupend",'
# shellcheck disable=SC2016
printf -- ' RUN+="/sbin/initqueue --finished --unique --name 00resume /usr/sbin/resume %s $tempnode"\n' "$a_splash"
fi
printf -- '%s' 'SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="suspend|swsuspend|swsupend",'
printf -- '%s\n' ' RUN+="/sbin/initqueue --finished --unique --name 00resume echo %M:%m > /sys/power/resume"'
} >> /etc/udev/rules.d/99-resume.rules

So never happens

mv /lib/dracut/resume.sh /lib/dracut/hooks/pre-mount/10-resume.sh

Many parts in parse-resume.sh depend on having /usr/sbin/resume in place, which is detected at

for _bin in /usr/sbin/resume /usr/lib/suspend/resume /usr/lib/uswsusp/resume; do

but if your distro installs in /usr/lib64 instead of /usr/lib , the resume binary is never detected, and never included in dracut generated initramfs.

IMHO

inst_script "$moddir/resume.sh" /lib/dracut/resume.sh
should include an additional line like to be more robust

    inst_hook pre-mount 10 "$moddir/resume.sh"

@jmfernandez
Copy link
Contributor

I have just created pull request #1607 with the minor fixes I have added to my installation in order to have it working

@jmfernandez
Copy link
Contributor

Something I'm experiencing using the changes in my pull request in order to use s2disk is that sometimes the resume process from hibernation does not work. But, as the whole boot process stalls, I only have to do a hard shutdown and boot in order to try again, and it can work. It is like some device or condition is not ready yet, but I still have to gather additional information.

lnykryn added a commit to redhat-plumbers/dracut-rhel9 that referenced this issue Aug 17, 2022
We can't always correctly decide if the resume module is needed.
So let's play safe and always include it.

see: dracutdevs/dracut#924

RHEL-only

Resolves: #1926544
@LaszloGombos LaszloGombos added the resume Issues related to the resume module label Oct 25, 2022
@johannbg
Copy link
Collaborator

Is this still an issue with an dracut release that contains the patches from @jmfernandez ?

@aafeijoo-suse
Copy link
Member

Is this still an issue with an dracut release that contains the patches from @jmfernandez ?

His patches were unrelated to the original issue.

So yes, it's still an issue, comments #924 (comment) and #924 (comment) reflect the problem we have here.

@johannbg
Copy link
Collaborator

johannbg commented Dec 22, 2022

afaik can gather, it's strictly a problem with resuming from hibernation ( not suspending or hibernating and systemd has been hit by it as well ) because the resume kernel parameter is missing.

Now we are not responsible for resuming and the kernel seems to be riddle with bugs related to this ( most recent I could find was this 1 which means some random hw works others fail ).

Now whatever is adding that kernel parameter ( be it the user or some application ) it needs to enable the resume module and rebuild the initrd so I'm quite frankly failing to see how we are supposed to fix this issue o_O

What expectation are people having of us? How are we supposed to somehow fix this?

@dtardon
Copy link
Contributor

dtardon commented Jan 2, 2023

afaik can gather, it's strictly a problem with resuming from hibernation ( not suspending or hibernating and systemd has been hit by it as well ) because the resume kernel parameter is missing.

No. The problem is that the condition for including the resume module is overly strict: it assumes that the resume partition in /sys/power/resume must be configured manually. That was true 10 years ago, but it's not needed anymore with systemctl hibernate. IMHO existence of a local swap should be sufficient reason for including the module.

E.g., Anaconda (the installer of Fedora/RHEL) automatically adds an appropriate resume= to the installed kernel's command line if a swap partition of a sufficient size is created (which it is in the default layout). Then, in the installed system, one can just run systemctl hibernate and the system hibernates and is resumed again after the machine is turned on. Except that it isn't, because the initrd doesn't contain the resume module for the aforementioned reason.

Yes, hibernate generally sucks, but that doesn't mean we shouldn't strive to make it suck a bit less if it's possible.

@aafeijoo-suse
Copy link
Member

E.g., Anaconda (the installer of Fedora/RHEL) automatically adds an appropriate resume= to the installed kernel's command line if a swap partition of a sufficient size is created

YaST does the same thing on SUSE distros.

IMHO existence of a local swap should be sufficient reason for including the module.

I agree. Proposed patch:

--- a/modules.d/95resume/module-setup.sh
+++ b/modules.d/95resume/module-setup.sh
@@ -10,10 +10,11 @@ check() {
         return 1
     }
 
-    # Only support resume if hibernation is currently on
-    # and no swap is mounted on a net device
+    # Only support resume if there is any suitable swap and
+    # it is not mounted on a net device
     [[ $hostonly ]] || [[ $mount_needs ]] && {
-        swap_on_netdevice || [[ -f /sys/power/resume && "$(cat /sys/power/resume)" == "0:0" ]] && return 255
+        ((${#swap_devs[@]})) || return 255
+        swap_on_netdevice && return 255
     }
 
     return 0

@nabijaczleweli
Copy link
Contributor

Then those installers should write to /sys/power/resume (and resume_offset, if the swap is on file) before generating an initrd, it's that simple.

By definition hibernation is configured iff /sys/power/resume or resume= (the latter strictly only really applies if you manage to resume from the kernel in lateinit, but the initrd also consumes resume= to emulate that, so meh).

If we take "host-only" to mean "the minimal amount of stuff needed to boot the current host", then this violates that by including resume on all hosts with swap, which is all hosts, esp. since fedora started using systemd-zram-generator by default.
For some reason broken installers that don't understand that "having swap" is largely unrelated from "having hibernation" means that dracut should always include hibernation handling? Fix your installer (left as an exercise to the reader). Fix your system (ls -l $SWAPDEV | awk '{gsub(",", ""); print $5 ":" $6}' > /sys/power/resume).

nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 2, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 2, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 2, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 2, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 5, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Jan 5, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
nabijaczleweli added a commit to nabijaczleweli/dracut-upstream that referenced this issue Feb 5, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
johannbg pushed a commit that referenced this issue Feb 13, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes #924
@aafeijoo-suse
Copy link
Member

#2160 (comment):

  • the grep for "resume=" is only performed if hibernation is currently on and no swap is mounted on a net device, which is the opposite of what is wanted
  • the grep always matches because it matches either '^' or '[[:space:]]resume='
# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-5.14.21-150400.24.55-default root=/dev/mapper/cr-auto-1 security=apparmor
# grep -rq '^\|[[:space:]]resume=' /proc/cmdline
# echo $?
0
  • if the grep matches, check() returns 255, which indicates failure, not success

@martinwhitaker could you submit your patch as a PR?

@aafeijoo-suse aafeijoo-suse reopened this Apr 14, 2023
Henrik66 added a commit to Henrik66/dracut that referenced this issue Jun 4, 2023
…ation

Move getcmdline from dracut-lib.sh to dracut-dev-lib.sh to make it
available on the host as well.

Closes dracutdevs#924
Henrik66 added a commit to Henrik66/dracut that referenced this issue Jun 20, 2023
…ation

Move getcmdline from dracut-lib.sh to dracut-dev-lib.sh to make it
available on the host as well.

Closes dracutdevs#924
pvalena pushed a commit to pvalena/dracut that referenced this issue Jul 23, 2023
…ation

Don't consider noresume to disable, that's a single-boot flag

Closes dracutdevs#924
Henrik66 added a commit to Henrik66/dracut that referenced this issue Aug 20, 2023
…ation

Move getcmdline from dracut-lib.sh to dracut-dev-lib.sh to make it
available on the host as well.

Closes dracutdevs#924
Conan-Kudo pushed a commit to dracut-ng/dracut-ng that referenced this issue Mar 30, 2024
The grep introduced in commit e3a7112
does not work as intended. The resume module is always excluded in hostonly
mode.

Made this a bit more explicit with if/else so it is more clear what is going
on. The in-line ||/&& makes the line really long and makes it more difficult
to understand what is going on.

Bug: dracutdevs/dracut#924
Signed-off-by: Andrew Ammerlaan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Our bugs resume Issues related to the resume module
Projects
None yet