-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kairos not installing on correct device #2243
Comments
I should add that these machines are being created from AuroraBoot, not sure that matters. |
can you do an |
Hi @jimmykarily Maybe you already knew all of this, but it's news to me. :) So, I changed my #!/bin/bash
# Copyright (c) 2024 Schweitzer Engineering Laboratories, Inc.
# SEL Confidential
set -euo pipefail
IFS=$'\n\t'
# cSpell:ignore
# This script is ran in the cloud_init.yaml and not in the Dockerfile so it must remain in the target container image.
make_directory() {
local directory="${1-}"
if [[ -d "$directory" ]]; then
log " The $directory directory already exists, skipping creation" "cyan"
else
log " Creating the $directory directory" "green"
mkdir -p "$directory"
fi
}
# Format /dev/$disk if not already formatted.
format_disk() {
local disk="${1-}"
log " Status before mount" "cyan"
ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
if [[ ! $(lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0") = *"ext4"* ]]; then
log " Formatting disk ${disk}" "green"
mkfs.ext4 -L "SEL_${disk}" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
log " Status after mount" "cyan"
ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
else
log " Disk $disk is already formatted" "cyan"
fi
}
# Mount /dev/$disk to /rke and create the sub directorires.
mount_disk() {
local disk="${1-}"
local directory="${2-}"
local owner="${3-}"
local extra_directories="${4-}"
log " Status before mount" "cyan"
ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
log " Mounting disk $disk to $directory" "green"
mount -o rw --source "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0" "$directory"
log " Status after mount" "cyan"
ls -l "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:${disk}:0"
if [[ -n "$extra_directories" ]]; then
IFS=","
for new_directory in $extra_directories; do
if [[ -d "$new_directory" ]]; then
log " The $new_directory directory already exists, skipping creation" "cyan"
else
log " Creating the $new_directory directory" "green"
mkdir -p "$new_directory"
fi
done
IFS=$'\n\t'
fi
if [[ -n "$owner" ]]; then
log " Setting ${owner} as the owner of ${directory} recursively" "green"
chown -R "${owner}:${owner}" "${directory}"
fi
}
main() {
source "/usr/bin/lib/sh/log.sh"
local option="${1-}"
local disk="${2-}"
local directory="${3-}"
local owner="${4-}"
local extra_directories="${5-}"
log "Running mount_disk.sh with option $option for disk $disk in directory $directory" "blue"
case "$option" in
"make_directory")
make_directory "$directory"
;;
"format_disk")
format_disk "$disk"
;;
"mount_disk")
mount_disk "$disk" "$directory" "$owner" "$extra_directories"
;;
esac
}
# Run main
if ! (return 0 2> /dev/null); then
(main "$@")
fi As you can see I'm now referencing the So now I should be able to control disks 1 and 2 with confidence, but I don't know if that will 100% solve the Kairos not using disk 0 issue. For reference the disk paths are /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0 # First physical disk, Karios should use this.
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0 # Second physical disk, I mount this to /var/lib/rancher/rke2
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0 # Third physical disk, I mount this to /var/lib/rancher/longhorn Ohhh, I just realized that install:
device: "/dev/sda" Should accept install:
device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" Yes/No? I ran out of time tonight so I'll work on testing this tomorrow and let you know what I find. |
To answer the question as to weather AuroraBoot will allow install:
device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" The answer is no. Kairos Version: 9-core-amd64-generic-v2.4.3
2024-02-14 17:51:45 Target OSs /etc/systemd/system/cloud_init.yaml does not pass validation. Quitting.
2024-02-14 17:51:45 jsonschema: '/install/device' does not validate with file:///schema.json#/properties/install/$ref/properties/device/pattern: does not match pattern '^(auto|/|(/[a-zA-Z0-9_-]+)+)$' I set Is this fixable? |
I've built two more test clusters and so far all the right physical disks ended up attached to the right directories. Kairos on disk 0, RKE2 on Disk 1 and Lonhorn on 2. I think it would still be nice to get install:
device: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" working too though. Telling kairos to install to |
My recommendation for a "fix" are the following.
|
@jimmykarily any thoughts on the above? |
Choosing disks by label/id/path/etc is not yet supported (it has been discussed before). What you are describing is a valid use case and I think the only workaround for now would be to use
and do the partitioning completely manually using some script in a cloud config. @kairos-io/maintainers what would be the right stage to do the partitioning? |
@jimmykarily I have an update on this and it is really really weird. I'll try to keep in short.
I'm flummoxed. Is this a requirement of some kind or a bug? |
@jimmykarily I've stripped down our config as much as possible, including now mounting disks 2 and 3 and as little config in the |
I created the new issue after doing some more investigation. |
Kairos version:
/kairos/rockylinux:9-core-amd64-generic-v2.4.3
CPU architecture, OS, and Version:
Linux lpul-vault-k8s-agent-2.vault.ad.selinc.com 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Describe the bug
Hello Kairos team. I'm running into an old issue again. I thought we got this solved by adding volume labels to my other disks but looks like not.
I have three disks in my VM, sda, sdb, sdc
The
cloud_init.yaml
isThe
mount_disk.sh
file is:The log output of mount_disk.sh on the working nodes is
The log output of mount_disk.sh on the broken node is
This feels like a race condition.
What is the point of
If it's going to ignore it?
Any help?
To Reproduce
See above config
Expected behavior
All nodes should use the volume specified int he 'cloud_init.yaml' file.
The text was updated successfully, but these errors were encountered: