Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatcar-install fails when using path disk devices with multiple copies #1506

Closed
jqueuniet opened this issue Jul 30, 2024 · 5 comments · Fixed by flatcar/init#125 · May be fixed by ader1990/init#1
Closed

flatcar-install fails when using path disk devices with multiple copies #1506

jqueuniet opened this issue Jul 30, 2024 · 5 comments · Fixed by flatcar/init#125 · May be fixed by ader1990/init#1
Labels
kind/bug Something isn't working

Comments

@jqueuniet
Copy link

Description

flatcar-install fails when using path disk devices with multiple copies

ie:

# ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx. 1 root root  9 Jul 18 14:01 pci-0000:44:00.0-ata-1 -> ../../sda
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1-part4 -> ../../sda4
lrwxrwxrwx. 1 root root  9 Jul 18 14:01 pci-0000:44:00.0-ata-1.0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1.0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1.0-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 Jul 18 14:01 pci-0000:44:00.0-ata-1.0-part4 -> ../../sda4
lrwxrwxrwx. 1 root root  9 Jul 18 14:01 pci-0000:44:00.0-ata-2 -> ../../sdb
lrwxrwxrwx. 1 root root  9 Jul 18 14:01 pci-0000:44:00.0-ata-2.0 -> ../../sdb
[...]

Impact

flatcar-install -d /dev/disk-by-path/pci-0000:44:00.0-ata-1 -i provider.ign

Install script fails to mount the OEM partition to write the ignition file

Expected behavior

Install script completes successfully.

Additional information

The following command does work:

flatcar-install -d /dev/disk-by-path/pci-0000:44:00.0-ata-1.0 -i provider.ign

It seems the root cause is the way the OEM partition is located using blkid to write the Ignition file.

local OEM_DEV=$(blkid -t "LABEL=OEM" -o device "${DEVICE}"*)

This command returns multiple values in our case but handled like a single path, which consequently fails.

@jqueuniet jqueuniet added the kind/bug Something isn't working label Jul 30, 2024
@ader1990
Copy link

ader1990 commented Jul 30, 2024

Hello @jqueuniet,

The flatcar install script can be found here https://github.com/flatcar/init/blob/flatcar-master/bin/flatcar-install:

According to the man page of blkid, adding the flag -l should solve the issue https://linux.die.net/man/8/blkid.

Can you please run this command on your server to confirm?

blkid -l -t "LABEL=OEM" -o device $DEVICE

Thank you.

@ader1990
Copy link

Made a possible fix here, let me know if it fixes the issue: https://github.com/ader1990/init/blob/fix-multiple-devices-same-label-install/bin/flatcar-install

@jqueuniet
Copy link
Author

It looks like it does solve my issue. The only side-effect I can see is that it returns the canonical device path (dev/sdXY) instead of staying on the same alias. Here is the output from a Flatcar PXE ramdisk:

root@localhost ~ # blkid -t "LABEL=OEM" -o device /dev/disk/by-path/pci-0000\:44\:00.0-ata-1*
/dev/disk/by-path/pci-0000:44:00.0-ata-1-part6
/dev/disk/by-path/pci-0000:44:00.0-ata-1.0-part6
root@localhost ~ # blkid -l -t "LABEL=OEM" -o device /dev/disk/by-path/pci-0000\:44\:00.0-ata-1
/dev/sdb6

By the way, here are the SATA controller details, in case anyone needs them:

44:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51) (prog-if 01 [AHCI 1.0])
	Subsystem: Super Micro Computer Inc H12SSL-i [15d9:7901]
	Flags: bus master, fast devsel, latency 0, IRQ 91, IOMMU group 43
	Memory at b0600000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [64] Express Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=16/16 Maskable- 64bit+
	Capabilities: [d0] SATA HBA v1.0
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Capabilities: [270] Secondary PCI Express
	Capabilities: [2a0] Access Control Services
	Capabilities: [400] Data Link Feature <?>
	Capabilities: [410] Physical Layer 16.0 GT/s <?>
	Capabilities: [440] Lane Margining at the Receiver <?>
	Kernel driver in use: ahci
	Kernel modules: ahci

@ader1990
Copy link

It looks like it does solve my issue. The only side-effect I can see is that it returns the canonical device path (dev/sdXY) instead of staying on the same alias. Here is the output from a Flatcar PXE ramdisk:

root@localhost ~ # blkid -t "LABEL=OEM" -o device /dev/disk/by-path/pci-0000\:44\:00.0-ata-1*
/dev/disk/by-path/pci-0000:44:00.0-ata-1-part6
/dev/disk/by-path/pci-0000:44:00.0-ata-1.0-part6
root@localhost ~ # blkid -l -t "LABEL=OEM" -o device /dev/disk/by-path/pci-0000\:44\:00.0-ata-1
/dev/sdb6

By the way, here are the SATA controller details, in case anyone needs them:

44:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51) (prog-if 01 [AHCI 1.0])
	Subsystem: Super Micro Computer Inc H12SSL-i [15d9:7901]
	Flags: bus master, fast devsel, latency 0, IRQ 91, IOMMU group 43
	Memory at b0600000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [64] Express Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=16/16 Maskable- 64bit+
	Capabilities: [d0] SATA HBA v1.0
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Capabilities: [270] Secondary PCI Express
	Capabilities: [2a0] Access Control Services
	Capabilities: [400] Data Link Feature <?>
	Capabilities: [410] Physical Layer 16.0 GT/s <?>
	Capabilities: [440] Lane Margining at the Receiver <?>
	Kernel driver in use: ahci
	Kernel modules: ahci

Great to hear that it solves your issue. I will draft a PR to get some comments and see how I can improve. For the moment, can you use the fork code or you need the patch asap in the main?

Thanks.

@jqueuniet
Copy link
Author

I'm not in a hurry, using the device with the longest name to only get a single match is a viable workaround for me until this fix hits the next release.

I mostly wanted to report this as the error is a bit cryptic, the default mount behavior with no filesystem hint ends up detecting this broken path as an NFS share, which results in spending a lot of time trying to mount the OEM partition as such before hitting timeout and returning with a misleading error.

Anyway, thanks a lot for the quick solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
2 participants