Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured /var #8016

Open
1 task done
Tracked by #9249
smira opened this issue Dec 1, 2023 · 1 comment
Open
1 task done
Tracked by #9249

Structured /var #8016

smira opened this issue Dec 1, 2023 · 1 comment
Assignees

Comments

@smira
Copy link
Member

smira commented Dec 1, 2023

Tasks

  1. dsseng

Problem Statement

At the moment /var (also known as EPHEMERAL partition) doesn’t have any specific structure: users are allowed to create mount points, put user files at random locations under /var.

For the pod hostPath mount to work properly with all the features supported by the kubelet, mount path should be available in the kubelet mount namespace same way as in the host namespace. This requires manual and non-obvious configuration.

For the external mounts (e.g. NFS) to work properly if mounting is done from the kubelet, the mount path should be done in the kubelet.

Talos doesn’t offer a way to put user files really ephemeral (i.e. using tmpfs), so that reboot is enough to clean up the state.

Talos doesn’t support full reconciliation for machine.files key, as contents of the /var are not known, and the effect of removing a value from machine.files is not clear.

There’s no way to remove parts of the /var (e.g. if some directory was created by mistake).

Some critical or system-important parts of /var are not protected from simple mistakes (e.g. creating a wrong hostPath mount under the etcd data directory).

What’s in /var?

  • lib/etcd - etcd data directory (only controlplane nodes)
  • system/overlays - Talos internal path for overlayfs mounts (probably almost all can be migrated to tmpfs with small exceptions)
  • log - pod logs, API server audit logs, etc.
  • lib/containerd - CRI containerd state (container workspaces)
  • lib/kubelet - kubelet state
  • lib/cni - CNI state (???)
  • run - various ephemeral things (should be in tmpfs)

Proposal

etcd

Make sure etcd data directory is only accessible by etcd itself (and, Talos itself for the purposes of backup/restore). No other workload should be able to access the etcd data ever.

E.g. we could use SELinux, which will protect etcd from other workloads while it can also protect workloads from accessing etcd.

kubelet

Mostly same thing as etcd, we should look into protecting data directory from other workloads. As kubelet makes a lot of random access, it’s hard to contain kubelet itself from accessing other directories.

Logs

We can look into making sure other workloads have read-only access to the logs, while kubelet (?) can write the logs.

run

Should we make this tmpfs (if not already?)

containerd

Not much we can offer, as workloads write to the container scratch space.

overlays

This is a Talos-specific location, and we shouldn’t allow random writes there (overlayfs upperdir, workdir). We should look into minimizing the overlays on /var (we could replace with overlays on tmpfs when it makes sense).

/var/mnt

Introduce new directory (naming TBD) which serves a root mount point for:

  • hostPath mounts, local-path-provisioner default path, etc.
  • mounts done by the kubelet container (e.g. NFS mounts)

This path is mounted as rshared into the kubelet container, so that mounts both ways (from the kubelet to the host, from the host to the kubelet) are visible.

Users are supposed only to use this hostPath for such mounts.

Questions:

  • how do we enforce it? (e.g. SELinux)
  • how can we make third-party software to use this path? (e.g. OpenEBS, Ceph/Rook, etc.)
  • this should be documented clearly in the Talos docs

machine.files

We need to split it into the usecases for this feature:

  • static pods (now can be done via .machine.pods better way), deprecated now
  • creating random directories (I guess we don’t need it now, given that /var/mnt is mounted to the kubelet?)
  • dropping configuration for static pods, system extensions - we should better use tmpfs location (easier to prune with reboot)
  • CRI custom configuration (doesn’t go into /var)

In general, machine.files should work on top of the controller.

API to Remove File(s)

Should be restricted to work under /var/mnt only.

Benefits

  • Better security model, protecting services from other workloads (e.g. etcd), and also making sure a service doesn’t have access to the data it shouldn’t have access to.
  • Better structure for the users (/var/mnt)
  • Controller-based machine.files
  • Pruning the node can be selective - e.g. if the containerd state gets corrupted on a single controlplane node, it can be pruned without touching etcd data.
  • Selective pruning of the node, e.g. on upgrade we can prune containerd state and overlay mounts, but keep /var/mnt for hostPath persistence.
@smira smira changed the title structured /var Structured /var Dec 1, 2023
@runningman84
Copy link

Protecting stuff is one thing another problem right now is that they all share the same filesystem. That means if we use the local path provisioner and consume all the space, etcd will crash due to out of disk space errors. In my old k3s setup I used to have lvm volumes for eauch of the consumers like etcd, longhorn, local-path and so on....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants