Correct way to add top-level mountpoints to disk images #814

achilleas-k · 2024-10-03T14:49:39Z

In bootc-image-builder we now support adding custom mountpoints to images, but we restrict it to mountpoints under /var/ (with some exceptions).

We're getting a lot of requests to support adding top-level mountpoints (e.g. /data) and I've been struggling to figure out the correct to do this without breaking the system. The script below sort of replicates how BIB does disk image building, for reference:

#!/usr/bin/bash

set -euo pipefail

echo ":: Creating disk file"
truncate disk.raw -s "10G"

echo ":: Partitioning disk"
sfdisk disk.raw << EOF
label: gpt
label-id: D209C89E-EA5E-4FBD-B161-B461CCE297E0
start="2048", size="2048", type="21686148-6449-6E6F-744E-656564454649", uuid="FAC7F1FB-3E8D-4137-A512-961DE09A5549", bootable
start="4096", size="1026048", type="C12A7328-F81F-11D2-BA4B-00A0C93EC93B", uuid="68B2905B-DF3E-4FB3-80FA-49D1E773AA33"
start="1030144", size="2097152", type="0FC63DAF-8483-4772-8E79-3D69D8477DE4", uuid="CB07C243-BC44-4717-853E-28852021225B"
start="3127296", size="17844191", type="0FC63DAF-8483-4772-8E79-3D69D8477DE4", uuid="6264D520-3FB9-423F-8AB8-7A0A8E3D3562"
EOF

echo ":: Attaching loop device"
loop=$(sudo losetup -f -P --show ./disk.raw)
detach() {
    echo ":: Detaching ${loop}"
    sudo losetup --detach "${loop}"
}
cleanup() {
    detach
}
trap cleanup EXIT

espvolid="7B7795E7"
datauuid="359340b8-facf-4aeb-8a74-fed44bcfeecb"
rootuuid="3279ab06-6122-4c76-aade-692e8dc81985"

echo ":: Creating filesystems"
sudo mkfs.fat -i "${espvolid}" "${loop}p2"  # ESP

# orphan_file is enabled by default on newer mkfs.ext4 versions but not
# supported by the C9S kernel and tooling
sudo mkfs.ext4 -O "^orphan_file" -U "${datauuid}" "${loop}p3"  # data
sudo mkfs.ext4 -O "^orphan_file" -U "${rootuuid}" "${loop}p4"  # root

echo ":: Mounting root and ESP"
sudo mkdir -p ./mnt
sudo mount "${loop}p4" ./mnt
umountroot() {
    echo ":: Unmounting root"
    sudo umount ./mnt
}
cleanup() {
    umountroot
    detach
}
trap cleanup EXIT

sudo mkdir -p ./mnt/boot/efi
sudo mount "${loop}p2" ./mnt/boot/efi
umountesp() {
    echo ":: Unmounting ESP"
    sudo umount ./mnt/boot/efi
}
cleanup() {
    umountesp
    umountroot
    detach
}
trap cleanup EXIT

echo ":: Running bootc install to-filesystem"
sudo podman run --rm --privileged --pid=host \
    -v /var/lib/containers:/var/lib/containers \
    -v "${HOME}/.ssh/authorized_keys":/var/authorized_keys \
    -v ./mnt:/var/mnt \
    --security-opt label=type:unconfined_t \
    quay.io/centos-bootc/centos-bootc:stream9 bootc install to-filesystem \
    --root-ssh-authorized-keys /var/authorized_keys \
    --skip-fetch-check /var/mnt

echo ":: Writing fstab"
sudo mount -o remount,rw ./mnt
etcpaths=(./mnt/ostree/deploy/default/deploy/*.0/etc)
etcpath=${etcpaths[0]}
echo "UUID=${datauuid} /var/data ext4 defaults 0 0" | sudo tee -a "${etcpath}/fstab"

(forgive the jankiness of the script; it's just for illustrative purposes).

The /var/data mount works great, but if the fstab entry is changed to mount it to /data, the system fails to boot.

I partially understand the reasons why this is happening, with the live / mountpoint on the system being a read-only composefs overlay. I suspect this can be solved by generating a mount unit and carefully configuring it to run at the right time during boot (with After and Before directives). I wanted to ask here though to understand the details of the problem fully instead of just resorting to trial and error.

The text was updated successfully, but these errors were encountered:

cgwalters · 2024-10-03T15:10:51Z

We talked about this before I thought? We currently require the mountpoints to be created as part of the container image.

achilleas-k · 2024-10-03T21:39:32Z

Yes, that works. Is there no way to do it with a base image that doesn't have the mountpoints, using a base image without deriving just to create a top-level directory? I'm wondering if we can find a way to create arbitrary mountpoints after the fact, without needing to modify the base image.

If not, that's fine, we'll just have to document it and maybe, if we want to be pro-active, we can check for the directories and validate the mountpoint configuration in bootc-image-builder ahead of time.

cgwalters · 2024-10-03T21:50:12Z

without needing to modify the base image.

Why? We expect people to start by creating custom images in general. I think there's a conceptual mismatch here in that bootc expectation is the center of gravity should be container builds as much as possible.

Certainly it does feel annoying to need to do something both in the container and in an external partitioning config...and I think as you know, I had really hoped to solve that by embedding the partitioning in the container image. We know that's actually what's required in many dynamic cases anyways - i.e. partitioning on firstboot or each boot via mount units, repart, custom LVM commands, etc. ultimately invoked by systemd units.

Now I would say a valid use case in general to make a disk image with custom partitioning from a container image that one didn't own...but wanting to make a custom toplevel mountpoint for an image you don't own starts to become a big corner case I think. Such use cases can probably opt-in to having / include a writable overlayfs upper...at the cost of some immutability and security.

If not, that's fine, we'll just have to document it and maybe, if we want to be pro-active, we can check for the directories and validate the mountpoint configuration in bootc-image-builder ahead of time.

Right.

So I think the short term deliverable here is just docs.

achilleas-k · 2024-10-04T09:14:10Z

Now I would say a valid use case in general to make a disk image with custom partitioning from a container image that one didn't own...but wanting to make a custom toplevel mountpoint for an image you don't own starts to become a big corner case I think.

I'm not sure it's that far off into the corner as this implies, based on some of the moves I'm seeing with base images being promoted as ready-made products.

But that's besides the point here, I suppose.

I had really hoped to solve that by embedding the partitioning in the container image.

We've been circling this topic for a while now without much concrete progress. Let me experiment with some ideas and see if I can have something to show for this on Monday.

cgwalters · 2024-10-04T13:21:47Z

I'm not sure it's that far off into the corner as this implies, based on some of the moves I'm seeing with base images being promoted as ready-made products.

I agree it's a real issue.

Ultimately this one is rooted in a decision made long ago on the ostree side that / should be immutable too and owned by the person making the OS image; it was just "less immutable" and could be more easily bypassed - composefs gives one single immutability mechanism here.

And the problem is that basically because people do put things outside of /usr and expect them to be updated and be immutable too, it's in conflict - and "custom toplevel mountpoints for images owned by someone else" loses out by default.

Anyways will see about updating the docs.

We've been circling this topic for a while now without much concrete progress. Let me experiment with some ideas and see if I can have something to show for this on Monday.

Cool! It'd be nice if in Fedora derivatives we had...fewer...partitioning languages but I guess something like /usr/lib/osbuild/blueprint.toml or something to start, then we document something different like /usr/lib/anaconda/inst.ks for that, etc. could be nice.

cgwalters · 2024-10-04T14:56:18Z

composefs gives one single immutability mechanism here.

To expand on this a bit more see
containers/composefs#360 - basically by making things not writable we can actually much more strictly lock things down, and that possibility gets significantly harder if we have an arbitrarily writable tmpfs upper for example. Of course, we could get to a world where we generalize something like systemd-sysext such that we support booting into a union of merged (optionally signed) composefs images...it's strongly related to #22 too.

We want to be clear that toplevel directories for mountpoints need to be created in the container build. Also, this moves the transient root and stateoverlay to markdown level 2, where they should have been. Closes: containers#814 Signed-off-by: Colin Walters <[email protected]>

cgwalters added the area/install Issues related to `bootc install` label Oct 3, 2024

cgwalters mentioned this issue Oct 11, 2024

docs/filesystem: Mention toplevels and mountpoints #823

Merged

cgwalters closed this as completed in #823 Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct way to add top-level mountpoints to disk images #814

Correct way to add top-level mountpoints to disk images #814

achilleas-k commented Oct 3, 2024

cgwalters commented Oct 3, 2024

achilleas-k commented Oct 3, 2024

cgwalters commented Oct 3, 2024

achilleas-k commented Oct 4, 2024

cgwalters commented Oct 4, 2024

cgwalters commented Oct 4, 2024

Correct way to add top-level mountpoints to disk images #814

Correct way to add top-level mountpoints to disk images #814

Comments

achilleas-k commented Oct 3, 2024

cgwalters commented Oct 3, 2024

achilleas-k commented Oct 3, 2024

cgwalters commented Oct 3, 2024

achilleas-k commented Oct 4, 2024

cgwalters commented Oct 4, 2024

cgwalters commented Oct 4, 2024