Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

Can't reformat machine when using LVM on RAID1 #10

Open
nh2 opened this issue Jul 24, 2017 · 10 comments
Open

Can't reformat machine when using LVM on RAID1 #10

nh2 opened this issue Jul 24, 2017 · 10 comments

Comments

@nh2
Copy link

nh2 commented Jul 24, 2017

I'm using this on a Hetzner server with nixops:

        deployment.hetzner.partitions = ''
          clearpart --all --initlabel --drives=sda,sdb

          part raid.1 --grow --ondisk=sda
          part raid.2 --grow --ondisk=sdb

          raid pv.01 --level=1 --device=root --fstype=ext4 --label=root raid.1 raid.2

          volgroup vg0 pv.01
          logvol swap           --vgname=vg0 --recommended      --fstype swap --name=swap
          logvol /              --vgname=vg0 --size=400000      --fstype ext4 --name=root
          logvol /data --vgname=vg0 --size=1000 --grow --fstype xfs  --name=gluster-brick1
        '';

I get:

test-node-1> installing machine...
test-node-1> rebooting machine ‘test-node-1’ (1.2.3.4) into rescue system
test-node-1> sending reboot command... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> waiting for rescue system...Connection to 1.2.3.4 closed by remote host.
[down]..........................................................[up]
test-node-1> building Nix bootstrap installer... done. (/nix/store/4v16dw4gvm9ih3ki55gh8j1d6q6g7iaw-hetzner-nixops-installer/bin/hetzner-bootstrap)
test-node-1> creating nixbld group in rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> checking if tmpfs in rescue system is large enough... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> yes: 15956 MB
test-node-1> copying bootstrap files to rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> partitioning disks... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> Traceback (most recent call last):
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 166, in <module>
test-node-1>     main()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 126, in main
test-node-1>     ks.initialize()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 978, in initialize
test-node-1>     self.handler.clearpart.execute(self.storage, self.handler)
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 246, in execute
test-node-1>     storage.clearPartitions()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 773, in clearPartitions
test-node-1>     self.recursiveRemove(part)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 741, in recursiveRemove
test-node-1>     self.destroyDevice(leaf)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 1178, in destroyDevice
test-node-1>     action = ActionDestroyDevice(device)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/deviceaction.py", line 315, in __init__
test-node-1>     device.teardown()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devices.py", line 3193, in teardown
test-node-1>     mdraid.mddeactivate(self.path)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devicelibs/mdraid.py", line 225, in mddeactivate
test-node-1>     raise MDRaidError("mddeactivate failed for %s: %s" % (device, msg))
test-node-1> blivet.errors.MDRaidError: mddeactivate failed for /dev/md/0: running mdadm --stop /dev/md/0 failed

Going into rescue mode manually and running it myself hints at the problem:

# mdadm --stop /dev/md/0
mdadm: Cannot get exclusive access to /dev/md/0:Perhaps a running process, mounted filesystem or active volume group?
 # lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb                         8:16   0   3.7T  0 disk  
|-sdb2                      8:18   0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sdb1                      8:17   0     1M  0 part  
loop0                       7:0    0   2.5G  1 loop  
sda                         8:0    0   3.7T  0 disk  
|-sda2                      8:2    0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sda1                      8:1    0     1M  0 part  

Indeed, the problem is that the volume group is active:

root@rescue ~ # vgchange -a n vg0
  0 logical volume(s) in volume group "vg0" now active

After this I can stop:

root@rescue ~ # mdadm --stop /dev/md/0
mdadm: stopped /dev/md/0

I think there's something wrong in blivet or nixpart or nixops, it doesn't know that it should stop all VGs on the device before trying to --stop it.

I think this is because of some failed (or rather, successful?) deployment before that created the current state as shown in lsblk; I suspect that when the rescue mode boots, it immediately starts the mdadm array and the LVM volume group.

I could work around it by wiping that LVM setup manually:

root@rescue ~ # mdadm --zero-superblock /dev/sda2 
root@rescue ~ # mdadm --zero-superblock /dev/sdb2 

But I think that nixops should be able to provision Hetzner machines, no matter what's on the disk.

@nh2
Copy link
Author

nh2 commented Jul 24, 2017

Oops, I intended this to be a https://github.com/NixOS/nixops/ issue but filed it in the wrong (or maybe not?) repo.

Anyway, as mentioned, this might be a nixpart problem, because I guess nixpart should be able to provision Hetzner machines, no matter what's on the disk?

@nh2
Copy link
Author

nh2 commented Jul 24, 2017

CC @aszlig

@nh2 nh2 changed the title Can't deploy to Hetzner when using LVM on RAID1 Can't reformat machine when using LVM on RAID1 Jul 24, 2017
@nh2
Copy link
Author

nh2 commented Jul 25, 2017

A similar error can happen with existing non-LVM partitions:

test-node-1> partitioning disks... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> /nix/store/yzzvqaa8y7x6xbsz3baik8gkz0qxisc1-pykickstart-1.99.39/lib/python2.7/site-packages/pykickstart/commands/raid.py:321: UserWarning: A RAID device with the name root has already been defined.
test-node-1>   warnings.warn(_("A RAID device with the name %s has already been defined.") % rd.device)
test-node-1> Traceback (most recent call last):
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 166, in <module>
test-node-1>     main()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 126, in main
test-node-1>     ks.initialize()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 984, in initialize
test-node-1>     self.handler.raid.execute(self.storage, self.handler)
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 673, in execute
test-node-1>     r.execute(storage, ksdata)
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 780, in execute
test-node-1>     raise KickstartValueError(formatErrorMsg(self.lineno, msg="The Software RAID array name \"%s\" is already in use." % devicename))
test-node-1> pykickstart.errors.KickstartValueError: The following problem occurred on line 13 of the kickstart file:
test-node-1> 
test-node-1> The Software RAID array name "root" is already in use.

@cleverca22
Copy link

cleverca22 commented Jul 25, 2017 via email

@nh2
Copy link
Author

nh2 commented Jul 25, 2017

I'm also wondering if I could do the various "turn that off" steps in side that kickstart definition, or if it's necessary that nixops/nixpart do it for me.

After all, I do clarpart to establish some base state.

Is there some equivalent to clearpart that can turn off VGs , and with which we could get rid of The Software RAID array name "root" is already in use.?

@nh2
Copy link
Author

nh2 commented Jul 28, 2017

Conversation between @cleverca22 and me on IRC on this:

clever:  https://github.com/NixOS/nixpart/issues/10#issuecomment-317600553
clever:  running "vgchange -a n" turns all LVM arrays off, freeing any names it might have been using, and freeing any locks it held on partitions
clever:  i think in both our cases, the PV was in use, so linux couldnt reload the new partition tables
clever:  so the partitioning changes failed to apply, and that broke everything else
nh2:     yes, I think the problem is that if there is a VG then Linux's (or Debian's or whatever) default behaviour is to turn it on
clever:  nixos also does that
clever:  which caused isues in my kexec stuff
clever:  technically, linux has zero support for LVM, its all done in userland
clever:  both luks and lvm use device-mapper to create virtual block devices, and configure the mappings between the virtual and real devices
clever:  and once that mapping is configured, the lvm code is no longer required, and leaves the picture entirely
nh2:     yeah I imagine. For Hetzner we don't have a way to turn it off "in the system" because their rescue mode is a fixed thing, so nixpart or nixops or in my case that kickstart script as to do that turning-off explicitly
clever:  yeah, which should be as simple as `vgchange -a n`
clever:  any device still open will block that, but the problematic ones shouldnt be mounted anyways
nh2:     yes and executing that manually in the Hetzner rescue mode confirms that it works
nh2:     I think there's a way to run arbitrary commands from that kickstart script thing
clever:  id say its a bug in the scripts nixops is running, and that they should run that always
clever:  but i'm also wanting to rewrite those scripts to use my kexec trick, so we control the state more
nh2:     but do you think the component that should turn it off is nixops or nixpart?
clever:  nixops i think
clever:  before it runs nixpart
clever:  a script one level up should close all open handles (including lvm) and wipefs the drive
clever:  so nixpart has a clean slate to work from
nh2:     OK that'd be here https://github.com/NixOS/nixops/blob/4a204e28daef51418f9116170dcead44849546ad/nixops/backends/hetzner.py#L286
clever:  yeah
nh2:     I'll post our conversation on https://github.com/NixOS/nixops/issues/708
clever:  oh, interesting, line 289/290
clever:  it looks like its supposed to detect the problem i'm giving a fix for
clever:  and then reboot the entire machine
clever:  but running `vgchange -a n` will prevent the problem from happening, and save you a reboot
nh2:     hmm in my case that clearly doesn't seem to happen with the reboot detection
clever:  yeah, so the error 100 could be broken
nh2:     also are you sure that this is supposed to handle our problem? I'd expect it to retry the formatting then, but it doesn't do that
clever:  fhttps://github.com/NixOS/nixpart/commit/5c4ff7f20abd02088d80b170d62763cbf14c40b3
nh2:     I mean how does a reboot fix it?
clever:  the problem, is that if a partition is open when you delete it from the partition tables, linux cant delete&remake the device nodes in /dev/
clever:  so it just defers that update until you reboot
clever:  it cant be open at bootup, so it can make the right nodes while booting
clever:  but telling lvm to close all open handles, prevents that from being an issue
nh2:     but if blivet crashes with `mddeactivate failed`, and does nothing, how is rebooting going to fix it?
clever:  what i think happened, is that nixpart redid the partition tables, and linux failed to apply the new state
clever:  so the nodes in /dev/sda? pointed to the old partition layout
clever:  and mdadm then made things in weird places, or failed to make things entirely because, say, sda4 didnt exist
nh2:     but that suggests it managed to alter the disks and the only thing needed is a reboot. So then if I ran nixops again on the same machine (which always forces Hetzner to reboot it into recovery mode) it should work, right? But I'm quite sure I tried that and repeatedly got the same problem until I wiped the LVM blocks manually
clever:  one difference, is that your creating new lvm in your nixpart script
clever:  so upon rebooting, it might have the same problem again
clever:  ahh, i think i see part of the complication, maybe
clever:  nixpart returns 100, if it has to reboot to apply changes to the partition table
clever:  but the script you gave it, also has lvm and mdadm changes to apply, which require that reboot
clever:  so it would have to pause in the middle, reboot the machine, then re-run the same script
clever:  and i dont think the backends/hetzner.py can do that?
clever:  skipping the reboot entirely, would simplify the whole thing
clever:  test-node-1>     self.handler.clearpart.execute(self.storage, self.handler)
clever:  test-node-1>     mdraid.mddeactivate(self.path)
clever:  mdadm: Cannot get exclusive access to /dev/md/0:Perhaps a running process, mounted filesystem or active volume group?
clever:  aha, i think the bug is in nixpart now
clever:  the `clearpart` command is smart enough to know mdadm is using sdaX, and tells mdadm to stop it
clever:  but its not smart enough to know lvm is using md0!
clever:  so mdadm cant stop md0
clever:  so nixpart is trying to handle closing everything for you, and the lvm ontop of mdadm is triggering an edgecase
clever:  that sounds like the best explanation
nh2:     hmm sounds good
nh2:     but what you said before sounds like another problem, that theoretically I'd have to reboot between the `part` partition changes and the setup of LVM?
clever:  test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/deviceaction.py", line 315, in __init__
clever:  test-node-1>     device.teardown()
clever:  test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devices.py", line 3193, in teardown
clever:  test-node-1>     mdraid.mddeactivate(self.path)
clever:  so in here, it has to recursively call device.teardown
clever:  so as it tears down sdaX, it has to also teardown md0
clever:  and then while tearing down md0, it has to discover lvm is using it, and run `vgchange -a n`
clever:  if this teardown code does its job, the reboot wont be required
nh2:     that makes sense to me
clever:  oh, and upon reading the above backtrace, i think its a bug in blivet
clever:  also, that reboot code is missing from nixpart
clever:  https://github.com/NixOS/nixpart/commit/5c4ff7f20abd02088d80b170d62763cbf14c40b3
clever:  it was added in here
clever:  https://github.com/NixOS/nixpart/commit/a1a765ca4e9baf855022f621eb4b2fea8a23f917
clever:  and it vanished in here
clever:  so that code in backends/hetzner.py is now useless/broken
nh2:     OK that explains why that bit did nothing for me

@aszlig
Copy link
Member

aszlig commented Jul 30, 2017

clever: a1a765c
clever: and it vanished in here
clever: so that code in backends/hetzner.py is now useless/broken

This commit is only in master, which is currently unreleased and not in use for NixOps yet. The version in master currently won't work for NixOps at all and is part of NixOS/nixpkgs#21403.

@cleverca22
Copy link

ah, didn't look into which version nixops was using

@nh2
Copy link
Author

nh2 commented May 8, 2018

I'm implementing a feature deployment.hetzner.partitioningScript that beyond other things allows to work around this issue, by doing something like

test -b /dev/md0 && mdadm --stop /dev/md0

before the other actions.

@nh2
Copy link
Author

nh2 commented May 8, 2018

I'm implementing a feature deployment.hetzner.partitioningScript

That is NixOS/nixops#948. This might be able to provide the workaround.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants