Can't deploy to Hetzner when using LVM on RAID1 #15

nh2 · 2017-07-24T23:55:42Z

(Note, I accidentally filed the corresponding nixpart issue first: NixOS/nixpart#10)

I'm using this on Hetzner:

        deployment.hetzner.partitions = ''
          clearpart --all --initlabel --drives=sda,sdb

          part raid.1 --grow --ondisk=sda
          part raid.2 --grow --ondisk=sdb

          raid pv.01 --level=1 --device=root --fstype=ext4 --label=root raid.1 raid.2

          volgroup vg0 pv.01
          logvol swap           --vgname=vg0 --recommended      --fstype swap --name=swap
          logvol /              --vgname=vg0 --size=400000      --fstype ext4 --name=root
          logvol /data --vgname=vg0 --size=1000 --grow --fstype xfs  --name=gluster-brick1
        '';

I get:

test-node-1> installing machine...
test-node-1> rebooting machine ‘test-node-1’ (1.2.3.4) into rescue system
test-node-1> sending reboot command... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> waiting for rescue system...Connection to 1.2.3.4 closed by remote host.
[down]..........................................................[up]
test-node-1> building Nix bootstrap installer... done. (/nix/store/4v16dw4gvm9ih3ki55gh8j1d6q6g7iaw-hetzner-nixops-installer/bin/hetzner-bootstrap)
test-node-1> creating nixbld group in rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> checking if tmpfs in rescue system is large enough... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> yes: 15956 MB
test-node-1> copying bootstrap files to rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> partitioning disks... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> Traceback (most recent call last):
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 166, in <module>
test-node-1>     main()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 126, in main
test-node-1>     ks.initialize()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 978, in initialize
test-node-1>     self.handler.clearpart.execute(self.storage, self.handler)
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 246, in execute
test-node-1>     storage.clearPartitions()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 773, in clearPartitions
test-node-1>     self.recursiveRemove(part)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 741, in recursiveRemove
test-node-1>     self.destroyDevice(leaf)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 1178, in destroyDevice
test-node-1>     action = ActionDestroyDevice(device)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/deviceaction.py", line 315, in __init__
test-node-1>     device.teardown()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devices.py", line 3193, in teardown
test-node-1>     mdraid.mddeactivate(self.path)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devicelibs/mdraid.py", line 225, in mddeactivate
test-node-1>     raise MDRaidError("mddeactivate failed for %s: %s" % (device, msg))
test-node-1> blivet.errors.MDRaidError: mddeactivate failed for /dev/md/0: running mdadm --stop /dev/md/0 failed

Going into rescue mode manually and running it myself hints at the problem:

# mdadm --stop /dev/md/0
mdadm: Cannot get exclusive access to /dev/md/0:Perhaps a running process, mounted filesystem or active volume group?

 # lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb                         8:16   0   3.7T  0 disk  
|-sdb2                      8:18   0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sdb1                      8:17   0     1M  0 part  
loop0                       7:0    0   2.5G  1 loop  
sda                         8:0    0   3.7T  0 disk  
|-sda2                      8:2    0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sda1                      8:1    0     1M  0 part

Indeed, the problem is that the volume group is active:

root@rescue ~ # vgchange -a n vg0
  0 logical volume(s) in volume group "vg0" now active

After this I can stop:

root@rescue ~ # mdadm --stop /dev/md/0
mdadm: stopped /dev/md/0

I think there's something wrong in blivet or nixpart or nixops, it doesn't know that it should stop all VGs on the device before trying to --stop it.

I think this is because of some failed (or rather, successful?) deployment before that created the current state as shown in lsblk; I suspect that when the rescue mode boots, it immediately starts the mdadm array and the LVM volume group.

I could work around it by wiping that LVM setup manually:

root@rescue ~ # mdadm --zero-superblock /dev/sda2 
root@rescue ~ # mdadm --zero-superblock /dev/sdb2

But I think that nixops should be able to provision Hetzner machines, no matter what's on the disk.

The text was updated successfully, but these errors were encountered:

domenkozar · 2017-07-27T14:28:20Z

cc @aszlig

nh2 · 2017-07-28T22:38:50Z

New insights at NixOS/nixpart#10 (comment)

grahamc transferred this issue from NixOS/nixops Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't deploy to Hetzner when using LVM on RAID1 #15

Can't deploy to Hetzner when using LVM on RAID1 #15

nh2 commented Jul 24, 2017

domenkozar commented Jul 27, 2017

nh2 commented Jul 28, 2017

Can't deploy to Hetzner when using LVM on RAID1 #15

Can't deploy to Hetzner when using LVM on RAID1 #15

Comments

nh2 commented Jul 24, 2017

domenkozar commented Jul 27, 2017

nh2 commented Jul 28, 2017