Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/zfs: introduce option to control hibernation #171680

Merged
merged 1 commit into from
Oct 29, 2022

Conversation

bryanasdev000
Copy link
Member

@bryanasdev000 bryanasdev000 commented May 5, 2022

Description of changes

This should close #106093.

Since ZFS is not hibernation friendly at the moment (see openzfs/zfs#260) I think disabling hibernation by default is a safe measure to prevent data loss.

Kernel doc for this param https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt#L3559.

Since this is a breaking change, I have added a observation on release notes for 22.11.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@8573
Copy link
Contributor

8573 commented May 5, 2022

Also, I would like to message it to the user in some way besides the unofficial doc (https://nixos.wiki/wiki/NixOS_on_ZFS) any tip on how is welcome.

You could add a note to the docstring (description) of the boot.zfs.enabled option that will set the kernel parameter, and maybe even do the same with the options config.boot.supportedFilesystems and config.boot.initrd.supportedFilesystems that can enable that option.

@bryanasdev000 bryanasdev000 removed the request for review from thiagokokada May 6, 2022 19:42
@ElvishJerricco
Copy link
Contributor

Note that I believe that by now ZFS can be hibernated, as long as you do not import any pools before resuming. Until we can make assurances to that effect in nixos, we probably should have something like this though.

@bryanasdev000
Copy link
Member Author

Note that I believe that by now ZFS can be hibernated, as long as you do not import any pools before resuming. Until we can make assurances to that effect in nixos, we probably should have something like this though.

Yeah, I got this impression too reading the original ZFS issue mentioned in #106093, but I did not test further as I am using ZFS both on my desktop and laptop, and I do not use suspend neither hibernation.

As soon as I have some free time, will revisit this issue and open again the PR for further review, for now I am just using the notes from https://nixos.wiki/wiki/ZFS

@mightyiam
Copy link
Member

I have this option explicitly set precisely for the same reason.

@jakubgs
Copy link
Contributor

jakubgs commented Oct 23, 2022

Can we please get this merged? Every time I used - intentionally or unintentionally - the hibernate functionality I got ZFS pool corruption and could not mount the pool at boot. Running zpool import -f -FX rpool from a livecd does fix it, but it's not safe.

@bryanasdev000
Copy link
Member Author

Can we please get this merged? Every time I used - intentionally or unintentionally - the hibernate functionality I got ZFS pool corruption and could not mount the pool at boot. Running zpool import -f -FX rpool from a livecd does fix it, but it's not safe.

Hey there!

This is mostly stale because I did not have enough time to research more, so because of that is draft :|

Furthermore the cause seems to be more in case of hibernating with Swap on ZFS, see openzfs/zfs#12842.

I will check if I can update this soon (maybe this week)? And we should decide if we make this safe step for the user or just put a bigger note of this possibly problem.

@bryanasdev000
Copy link
Member Author

Ok, I have did a quick research, and the result is "better safe than sorry".

ZFS corruption seem caused by either using swap on ZFS (which is not a good idea, see openzfs/zfs#7734) or hibernating and importing a pool before resume/swap (openzfs/zfs#12842 (comment) openzfs/zfs#12842 (comment)).

I do not think that my first implementation is a good way to "fix" it because would require users to overwrite their kernelParams if they want to hibernate, so I created a new option (allowHibernation) that defaults to false and do the same job.

@bryanasdev000 bryanasdev000 changed the title nixos/zfs: add nohibernate to kernel params nixos/zfs: introduce option to control hibernation Oct 24, 2022
@bryanasdev000 bryanasdev000 marked this pull request as ready for review October 24, 2022 06:31
@mightyiam
Copy link
Member

This would make the explicit ["nohibernate"] in my config redundant.

@bryanasdev000
Copy link
Member Author

This would make the explicit ["nohibernate"] in my config redundant.

Indeed, since now, boot.zfs.allowHibernation will be setting this for you.

I think that's the nice and easy way, since we allow the user to opt-in if desired.

My first implementation probably would need a mkForceor similar, probably messing with any other module that does append to boot.kernelParams which will break things for the end user that will need to map the necessary kernel parameters for its system to work.

Copy link
Contributor

@klemensn klemensn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I lost track of this and have since switched to another system.

nixos/modules/tasks/filesystems/zfs.nix Outdated Show resolved Hide resolved
nixos/modules/tasks/filesystems/zfs.nix Outdated Show resolved Hide resolved
@jakubgs
Copy link
Contributor

jakubgs commented Oct 24, 2022

I think "better safe than sorry" is the correct approach.

@thiagokokada
Copy link
Contributor

@ofborg test zfs

@bryanasdev000
Copy link
Member Author

@ofborg test zfs

I did a few runs locally and no problems found.

@thiagokokada thiagokokada merged commit f83198a into NixOS:master Oct 29, 2022
@mightyiam
Copy link
Member

Thank you.

@infinisil
Copy link
Member

ZFS corruption seem caused by either using swap on ZFS (which is not a good idea, see openzfs/zfs#7734) or hibernating and importing a pool before resume/swap (openzfs/zfs#12842 (comment) openzfs/zfs#12842 (comment)).

Oh I might have inadvertently fixed swap with ZFS on NixOS then. I've patched my local nixpkgs to do the hibernation resuming before doing any zpool imports (which happens in postDeviceCommands): infinisil@448fe5d

Motivation was not having to enter my ZFS encryption password if I'm restoring from swap anyways. (notably not using swap encryption yet, so ZFS encryption doesn't add much right now), but as far as I can see, that fixes this issue too right?

# https://github.com/openzfs/zfs/issues/260
# https://github.com/openzfs/zfs/issues/12842
# https://github.com/NixOS/nixpkgs/issues/106093
kernelParams = lib.optionals (!config.boot.zfs.allowHibernation) [ "nohibernate" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right workaround. Because it allows people to assign a swapDevices but that not taking any effect. I think it would be better to have an assertion like this:

{
  assertions = [{
    assertion = (config.boot.zfs.enable && config.swapDevices != []) -> config.config.boot.zfs.allowHibernation;
    message = "Using swap with ZFS can corrupt data. Either disable swap by removing all `swapDevices` entries, or if you know what you're doing, ignore this assertion by enabling `boot.zfs.allowHibernation`";
  }];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have another assertion which throws when ZFS is enabled and you have a swap file on a ZFS filesystem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using swap with ZFS can corrupt data.

Does this apply even for a swap partition that's not on a ZVOL or otherwise on ZFS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right workaround. Because it allows people to assign a swapDevices but that not taking any effect. I think it would be better to have an assertion like this:

{
  assertions = [{
    assertion = (config.boot.zfs.enable && config.swapDevices != []) -> config.config.boot.zfs.allowHibernation;
    message = "Using swap with ZFS can corrupt data. Either disable swap by removing all `swapDevices` entries, or if you know what you're doing, ignore this assertion by enabling `boot.zfs.allowHibernation`";
  }];

I just got bit by this. I'm only using ZFS on a block file in an EXT4 filesystem so I'm not concerned with hibernating ZFS filesystems. I think this suggestion would aid discoverability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a misunderstanding in @infinisil's year old comment. The issue is not that you cannot have swap devices while using ZFS; you totally can (though, for other unrelated reasons, those cannot be stored on ZFS). The issue is that you cannot hibernate while ZFS pools are imported, period, no matter where the swap is stored, or else those pools have a chance of being irreparably corrupted.

Point being: ZFS + swap on non-ZFS: Good!. ZFS + hibernation in any case whatsoever: Bad! Hence what we have now is the right fix. If you want to ignore it, well it is configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that you cannot hibernate while ZFS pools are imported, period, no matter where the swap is stored, or else those pools have a chance of being irreparably corrupted.

Makes sense! Though, the discoverability for hibernation being disabled is not great. I only realized ZFS was the problem after reading this discourse post. I'm really not using ZFS in anger, so I'm happy to risk data loss for being able to hibernate my laptop, and I think @infinisil's suggestion would make this clear to new users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infinisil's suggestion would disable swap entirely with ZFS, which is certainly not desirable. We don't really have a way to know at eval-time whether a user intends to use hibernation, so it's difficult to warn about.

Actually, we should probably have an assertion that boot.resumeDevice cannot be set while allowHibernation is false, and that would cover some use cases. But NixOS has auto-detection of resume devices in stage 1 (if you're using EFI and scripted initrd exactly because... reasons), so this is very often not set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infinisil's suggestion would disable swap entirely with ZFS, which is certainly not desirable. We don't really have a way to know at eval-time whether a user intends to use hibernation, so it's difficult to warn about.

Hmm, how about:

{
  assertions = [{
    assertion = (config.boot.zfs.enable && ! (builtins.elem "nohibernate" config.boot.kernelParams)) -> config.config.boot.zfs.allowHibernation;
    message = "Using hibernation with ZFS can corrupt data. Either disable hibernation with `boot.kernelParams = [ "nohibernate" ]` or if you know what you're doing, ignore this assertion by enabling `boot.zfs.allowHibernation`";
  }];

Copy link
Member

@infinisil infinisil Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As also hinted at in #171680 (comment), I've been using ZFS with swap (on a separate partition) with hibernation for a while now and nothing has corrupted 🤷. I am using a patch on top of Nixpkgs though: infinisil@448fe5d. This makes swap restore before ZFS imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infinisil that patch helps. Certainly, it should be quite rare with that patch. But "quite a rare, but known, source of data loss" is not something we want people using by accident.

@RyanGibb That assertion doesn't really do anything though? allowHibernation implies that nohibernate is not in the cmdline. Unless the user manually added it themselves and enabled allowHibernation. While that would be a nonsensical thing for them to do, it would at least be safe, and nothing to worry about, and means that they explicitly added both configuration lines.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/hibernate-doesnt-work-anymore/24673/12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using ZFS in combination with suspend to disk may cause filesystem corruption