Add better meta-layer approach #21

allisonkarlitskaya · 2024-11-01T20:25:28Z

Instead of returning the second-highest layer as the meta-layer, we can:

explicitly label the meta layer as a label on the container config
explicitly label the UKI as a label on the container config (probably by its fs-verity digest)
have some kind of a nicer install-kernel command or so that defaults to unpacking in /boot. We could use that during the OS image build, but it would also allow for easier "upgrades" on running systems.

The text was updated successfully, but these errors were encountered:

cgwalters · 2024-11-04T16:06:26Z

It's good to have a spike for this though I'd like to try to collaborate on an design doc for this.

explicitly label the UKI as a label on the container config (probably by its fs-verity digest)

Shouldn't we just follow https://uapi-group.org/specifications/specs/unified_kernel_image/#locations-for-distribution-built-ukis-installed-by-package-managers ?

allisonkarlitskaya · 2024-11-05T07:23:26Z

The issue with just putting it in the filesystem is the circular hash dependency thing. It needs to be out of band, either in an artifact or hidden layer. That's the thing that needs to be identified.

In the hidden layer, I was actually imagining something like following the bootloader spec.

I started to lean against the idea of hardcoding UKI and having a verity digest for that one file. Following BLS actually gives us the ability to also do this without a UKI, which seems like something that someone might like to do.

cgwalters · 2024-11-05T12:08:28Z

In what I'm thinking here, we do:

build rootfs (without /usr/lib/modules/$kver/uki.efi), get its fsverity digest, embed in uki
Add a new layer with that UKI
compute the standard fsverity digest (i.e. the non-bootable generic container one), add as label to the config as usual

When we boot it'd be into the rootfs without the uki, but it'd be in that regular place when we're run as a container image.

What I don't think we've talked about though is whether and how we want to try to create a chain that allows us to establish trust in the manifest/config starting from the rootfs. I think in the longer term we do need to do that in order to have things like bootc status as well as general bootc upgrade doing change detection (i.e. no-op upgrades if we're on a matching manifest digest already). We could start by saying that's just local unsigned state, but I think we can do it with something like containers/composefs#294 (comment)

allisonkarlitskaya · 2024-11-06T09:50:33Z

In what I'm thinking here, we do:

Although I don't have anything against this system (where we effectively end up with two fsverity digests — one for the "base" image and one for the chained image) it would substantially complicate the deployment in practice. Here's why:

We download a container image that we want to boot into. That means that we need to create a composefs for that container image. This container image had the kernel inside of it, so the composefs that we generate will also have the kernel inside of it. That means that it's going to be the "wrong" fs-verity compared to the one that we actually want to mount.

So how do we resolve that? We could say that we also generate a secondary composefs with some files masked out, which is the one that we use strictly for booting? Perhaps anything called /usr/lib/modules/*/uki.efi?

Or do we refer to the original base image from our with-kernel image and that's the image that we use to build the composefs from?

This gigantic mess is kinda impenetrable... and it's sort of what sent us down this "ephemeral signing key" path in the first place... Something needs to give, and I think the thing needs to be that the UKI doesn't appear as part of the filesystem of the container.

The "hidden metadata layer" approach is a massively gigantic hack but it gets the job done. I'd like it if some day we could have a way of attaching the UKI as an OCI artifact instead, but it seems like the tooling is lagging a bit there...

As for linking the composefs back to the originating container image, this seems like it would be very difficult to do in a way that didn't introduce cyclic dependencies again. As part of the main thrust of what we're doing here, we want to sign the container image, which includes the fs-verity digest of the compete composefs. If that composefs contains a reference back to the container, we're in trouble again (unless we specify that the digest gets excluded in that computation). Another idea that I had to side-step that issue is to include it in the kernel commandline or other PE section such that it's not part of the filesystem, but this doesn't help either, since (although it doesn't impact the fs-verity digest of the composefs) the entire content of the kernel is still (indirectly) hashed into the container image, affecting its ID.

So ya — maybe we invent a hash over a container config where the containers.composefs.fsverity label has been replaced with all-0s? But what does that get us? This wouldn't be the "normal" container ID anymore. You couldn't go looking for the container by that ID...

If we imagine a future where the container is an OCI artifact instead of part of the container image then this could resolve the conflict. Artifacts refer to containers, not the other way around, and we can only find them via the referrer API. I find that sort of unsettling (for example, we could theoretically have multiple kernels for a given container image, depending on what the repository feels like serving us that day). But: the artifact for the UKI would be the thing that needs to be signed... and it also resolves the cyclic dependency, so we could in that case refer to the container ID from the commandline of the kernel.

This all feels kinda "far away" though.

cgwalters · 2024-11-06T12:56:14Z

That means that it's going to be the "wrong" fs-verity compared to the one that we actually want to mount.

Yes for sure, we'd want a special annotation on the final kernel layer, so that tooling knows to extract and also make a composefs for everything before that layer as well.

I don't see this as weird at all - remember in a general image derivation case we'll often have around a composefs for layers 0..N for a base image and another for layers 0..(N+1) for a final derived image.

The "hidden metadata layer" approach is a massively gigantic hack

Can you write out a bit more what this is? There's not many comments in the code and it's not immediately obvious to me how it works in the code right now.

allisonkarlitskaya · 2024-11-06T18:41:43Z

Can you write out a bit more what this is? There's not many comments in the code and it's not immediately obvious to me how it works in the code right now.

Yes. This absolutely needs to be better documented, but here we go:

We need a way to include a UKI in the filesystem of the container in such a way that the composefs fs-verity digest doesn't change. We can't attach large blobs of data directly to the config of the container. OCI artifacts aren't fully supported in podman yet, (and even if they were, it's not clear if they're the right fit because they're not actually part of the image, but rather associated with it on the container repository).

So here's what we do. Take the base image (which we computed the composefs fs-verity digest against) and add two layers to it:

a layer containing a /composefs-meta/ directory with some metadata inside of it. For this use case it has a boot/ subdirectory containing boot loader entries as per the bootloader spec.
a layer containing /.wh.composefs-meta. This causes the contents of the previous layer not to appear in the filesystem of the container image.

Assuming the incoming container didn't already have a /composefs-meta/ directory in the root of the filesystem, the content of the base image and the final image will be identical: we added some files in a directory and then deleted them again. But: we can still access that "hidden" layer to get at those files. Right now this layer is assumed to be the second-last layer in a container image which at least 3 layers. In the future, I hope to set a label to point to the layer by its diff-id.

Once we extract the UKI, it will contain a composefs= cmdline parameter that references the composefs of the final container image (which, as mentioned above, is equal to the composefs of the base image). This is how the cyclic hash dependency gets resolved.

cgwalters · 2024-11-06T19:12:18Z

OK, right I think I remember you describing this "add then whiteout". I need to think harder about the security properties of this versus my proposal of "just add a layer".

First, do you agree that "bootable verified OCI" should just be a special case of "verified OCI" for apps or other use cases? If so, then in the "add then whiteout" we would have a situation where the config digest == UKI digest right?
But when we have the manifest layer annotations as I think we should, then we automatically will have a verifiable digest covering the "meta-layer" as you call it (the non-whiteout one) and given the config we can then find a verity digest of the UKI to install it.

But in the end what's the advantage of having "config digest == UKI digest"? We can equally well access the UKI in a "just add a layer" approach - here config digest != UKI digest, but I don't see a problem with that; from the system again given a verified config, finding the UKI is just traversing into its standard place in /usr/lib/modules instead of finding the meta-layer.

So they seem equivalent from a security PoV. From an elegance/understandable PoV, wouldn't you agree it's just nicer to have e.g. podman|docker run <image> let you see the UKI?

cgwalters mentioned this issue Nov 13, 2024

Random thought: two composefs output formats #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add better meta-layer approach #21

Add better meta-layer approach #21

allisonkarlitskaya commented Nov 1, 2024

cgwalters commented Nov 4, 2024

allisonkarlitskaya commented Nov 5, 2024

cgwalters commented Nov 5, 2024

allisonkarlitskaya commented Nov 6, 2024 •

edited

Loading

cgwalters commented Nov 6, 2024

allisonkarlitskaya commented Nov 6, 2024

cgwalters commented Nov 6, 2024

Add better meta-layer approach #21

Add better meta-layer approach #21

Comments

allisonkarlitskaya commented Nov 1, 2024

cgwalters commented Nov 4, 2024

allisonkarlitskaya commented Nov 5, 2024

cgwalters commented Nov 5, 2024

allisonkarlitskaya commented Nov 6, 2024 • edited Loading

cgwalters commented Nov 6, 2024

allisonkarlitskaya commented Nov 6, 2024

cgwalters commented Nov 6, 2024

allisonkarlitskaya commented Nov 6, 2024 •

edited

Loading