Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deploy to Hetzner when using LVM on RAID1 #15

Open
nh2 opened this issue Jul 24, 2017 · 2 comments
Open

Can't deploy to Hetzner when using LVM on RAID1 #15

nh2 opened this issue Jul 24, 2017 · 2 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Jul 24, 2017

(Note, I accidentally filed the corresponding nixpart issue first: NixOS/nixpart#10)

I'm using this on Hetzner:

        deployment.hetzner.partitions = ''
          clearpart --all --initlabel --drives=sda,sdb

          part raid.1 --grow --ondisk=sda
          part raid.2 --grow --ondisk=sdb

          raid pv.01 --level=1 --device=root --fstype=ext4 --label=root raid.1 raid.2

          volgroup vg0 pv.01
          logvol swap           --vgname=vg0 --recommended      --fstype swap --name=swap
          logvol /              --vgname=vg0 --size=400000      --fstype ext4 --name=root
          logvol /data --vgname=vg0 --size=1000 --grow --fstype xfs  --name=gluster-brick1
        '';

I get:

test-node-1> installing machine...
test-node-1> rebooting machine ‘test-node-1’ (1.2.3.4) into rescue system
test-node-1> sending reboot command... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> waiting for rescue system...Connection to 1.2.3.4 closed by remote host.
[down]..........................................................[up]
test-node-1> building Nix bootstrap installer... done. (/nix/store/4v16dw4gvm9ih3ki55gh8j1d6q6g7iaw-hetzner-nixops-installer/bin/hetzner-bootstrap)
test-node-1> creating nixbld group in rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> checking if tmpfs in rescue system is large enough... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> yes: 15956 MB
test-node-1> copying bootstrap files to rescue system... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> done.
test-node-1> partitioning disks... 
test-node-1> bash: warning: setlocale: LC_TIME: cannot change locale ()
test-node-1> Traceback (most recent call last):
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 166, in <module>
test-node-1>     main()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/bin/.nixpart-wrapped", line 126, in main
test-node-1>     ks.initialize()
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 978, in initialize
test-node-1>     self.handler.clearpart.execute(self.storage, self.handler)
test-node-1>   File "/nix/store/ni2js4lwp1w6l14azfcjlgwn2im38m1b-nixpart-0.4.1/lib/python2.7/site-packages/nixkickstart.py", line 246, in execute
test-node-1>     storage.clearPartitions()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 773, in clearPartitions
test-node-1>     self.recursiveRemove(part)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 741, in recursiveRemove
test-node-1>     self.destroyDevice(leaf)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/__init__.py", line 1178, in destroyDevice
test-node-1>     action = ActionDestroyDevice(device)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/deviceaction.py", line 315, in __init__
test-node-1>     device.teardown()
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devices.py", line 3193, in teardown
test-node-1>     mdraid.mddeactivate(self.path)
test-node-1>   File "/nix/store/fl2pafsa1c4y6z4hnqv8235jpbj6wja1-blivet-0.17-1/lib/python2.7/site-packages/blivet/devicelibs/mdraid.py", line 225, in mddeactivate
test-node-1>     raise MDRaidError("mddeactivate failed for %s: %s" % (device, msg))
test-node-1> blivet.errors.MDRaidError: mddeactivate failed for /dev/md/0: running mdadm --stop /dev/md/0 failed

Going into rescue mode manually and running it myself hints at the problem:

# mdadm --stop /dev/md/0
mdadm: Cannot get exclusive access to /dev/md/0:Perhaps a running process, mounted filesystem or active volume group?
 # lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb                         8:16   0   3.7T  0 disk  
|-sdb2                      8:18   0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sdb1                      8:17   0     1M  0 part  
loop0                       7:0    0   2.5G  1 loop  
sda                         8:0    0   3.7T  0 disk  
|-sda2                      8:2    0   3.7T  0 part  
| `-md0                     9:0    0   3.7T  0 raid1 
|   |-vg0-root            253:1    0 390.6G  0 lvm   
|   |-vg0-swap            253:2    0  15.7G  0 lvm   
|   `-vg0-gluster--brick1 253:0    0   3.2T  0 lvm   
`-sda1                      8:1    0     1M  0 part  

Indeed, the problem is that the volume group is active:

root@rescue ~ # vgchange -a n vg0
  0 logical volume(s) in volume group "vg0" now active

After this I can stop:

root@rescue ~ # mdadm --stop /dev/md/0
mdadm: stopped /dev/md/0

I think there's something wrong in blivet or nixpart or nixops, it doesn't know that it should stop all VGs on the device before trying to --stop it.

I think this is because of some failed (or rather, successful?) deployment before that created the current state as shown in lsblk; I suspect that when the rescue mode boots, it immediately starts the mdadm array and the LVM volume group.

I could work around it by wiping that LVM setup manually:

root@rescue ~ # mdadm --zero-superblock /dev/sda2 
root@rescue ~ # mdadm --zero-superblock /dev/sdb2 

But I think that nixops should be able to provision Hetzner machines, no matter what's on the disk.

@domenkozar
Copy link
Member

cc @aszlig

@nh2
Copy link
Contributor Author

nh2 commented Jul 28, 2017

New insights at NixOS/nixpart#10 (comment)

@grahamc grahamc transferred this issue from NixOS/nixops Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants