Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using custom path for storing container images and state #2068

Closed
benp20 opened this issue Jul 27, 2020 · 35 comments
Closed

using custom path for storing container images and state #2068

benp20 opened this issue Jul 27, 2020 · 35 comments

Comments

@benp20
Copy link

benp20 commented Jul 27, 2020

Environmental Info:
K3s Version:
k3s version v1.18.4+k3s1 (97b7a0e)

Node(s) CPU architecture, OS, and Version:
Linux nvidia-desktop 4.9.140-tegra #1 SMP PREEMPT Wed Apr 8 18:15:20 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration:
1 master

Describe the bug:
I'd like to change the default path where containerd under k3s stores container related images, state etc (bydefault /run/k3s/containerd/) since my root partition does not have enough spare space.
I;d like to use my data partition instead.
What is the recommended procedure for doing so with k3s?

I tried referring to https://rancher.com/docs/k3s/latest/en/advanced/#configuring-containerd, but did not find the information there.

Steps To Reproduce:

  • Installed K3s:

Expected behavior:

Actual behavior:

Additional context / logs:

@brandond
Copy link
Member

The easiest thing to do would probably be to bind-mount your data partition as /run/k3s

@benp20
Copy link
Author

benp20 commented Jul 29, 2020

Thanks for the suggestion. I mounted the data partition at /run/k3s and now I see overlays for containers getting created at that path (and using the data partition).

Earlier, I was seeing an issue where the node is tainted due to disk pressure and hence the pods are left in a pending state:
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.

I had to reduce thresholds to get around this issue (see my k3s launch command below). I don't see the issue anymore. I am guessing it is because it is (somehow) recognize that more space is available in the new partition?

sudo /usr/local/bin/k3s server --kubelet-arg='eviction-soft=nodefs.available<15%' --kubelet-arg='eviction-soft-grace-period=nodefs.available=60m' --kubelet-arg='eviction-hard=nodefs.available<5%' --kubelet-arg='eviction-soft=nodefs.inodesFree<5%' --kubelet-arg='eviction-soft-grace-period=nodefs.inodesFree=120m' --kubelet-arg='eviction-hard=nodefs.inodesFree<5%'

@benp20
Copy link
Author

benp20 commented Jul 29, 2020

Update: I still see the node reporting diskpressure when I run pods with larger containers even through /run/k3s is mapped to the data partition already.
Any advice on how to address this problem through use of the data partition? (seemingly it is still using my root partition somewhere which is quite full)

Based on kubernetes documentation (https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/) seems like the issue is with nodefs.

kubelet supports only two filesystem partitions.

The nodefs filesystem that kubelet uses for volumes, daemon logs, etc.
The imagefs filesystem that container runtimes uses for storing images and container writable layers.

Does this have to be mounted to use the partition, and if so how?

Thanks!

@brandond
Copy link
Member

Have you considered just throwing a larger SD card or alternative partition layout at the problem? I run k3s on a couple Pi4b nodes with 32GB SD cards without any special configuration.

@jeroenjacobs79
Copy link

jeroenjacobs79 commented Aug 19, 2020

I'm not sure this solves anything, but on my CentOS server, k3s container images are stored in /var/run/k3s/containerd/, not /run/k3s/containerd/.

@stale
Copy link

stale bot commented Jul 31, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 31, 2021
@stale stale bot closed this as completed Aug 14, 2021
@sourcehawk
Copy link

Is there any solution for this as of today?

@mdrakiburrahman
Copy link

K3s uses containerd in a close to vanilla state.

  • containerd root: /var/lib/rancher/k3s/agent/containerd (all your images, container files etc)
  • containerd state: /run/k3s/containerd (scratch space blown up during containerd reboots)

Practically, what we want is for everything that can grow to be backed by some large drive.

So say, you have a large mounted drive at /mnt, this script below does the trick (the Kubernetes nodefs and imagefs will both be backed by /mnt, and so will kubelet and the local-path persistent volumes) - so your machine's OS directory stays unbloated:

# =======================================
# Storage prep to "/mnt" drive (~500 GB+)
# =======================================
MNT_DIR="/mnt"
K3S_VERSION="v1.25.4+k3s1"

# nodefs
#
KUBELET_DIR="${MNT_DIR}/kubelet"
sudo mkdir -p "${KUBELET_DIR}"

# imagefs: containerd has a root and state directory
#
# - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
#
# containerd root -> /var/lib/rancher/k3s/agent/containerd
#
CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent"
CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}"
sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}"

# containerd state -> /run/k3s/containerd
#
CONTAINERD_STATE_DIR_OLD="/run/k3s"
CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd"
sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}"
sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}"

# pvs -> /var/lib/rancher/k3s/storage
#
PV_DIR_OLD="/var/lib/rancher/k3s"
PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage"
sudo mkdir -p "${PV_DIR_OLD}"
sudo mkdir -p "${PV_DIR_NEW}"
sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}"

# =======
# Install
# =======
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml

When kubernetes comes up, you see everything is backed with a lot of space (in my case, /mnt has 600 GB) - so nodefs and imagefs is nice and full:

image

And nodefs and imagefs are at 99% - meaning Eviction Manager will not fire under normal circumstances:
image

@LarsBingBong
Copy link

@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.

Have a great day

@mdrakiburrahman
Copy link

@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.

Have a great day

Seemed easier to use ln, it's simple and effective tech 😄

Besides K3s, my team manages a bunch of other Kubernetes flavors that doesn't have Rancher's toml file. ln works everywhere.

@LarsBingBong
Copy link

@mdrakiburrahman totally fair! I agree with your points. I'm reaching out on Rancher Slack, on the K3s channel, to see if someone from the K3s project can elaborate more on the somewhat documented approach.

Thanks

@brandond
Copy link
Member

brandond commented Jan 30, 2023

I personally would probably just set up another mount point and symlink things into place (as @mdrakiburrahman has done) instead of modifying the containerd config template. If you provide your own config template, then you're responsible for keeping it up to date with any changes we make to the default template. We don't do that often, but I feel like it's more fragile than a couple symlinks.

@LarsBingBong
Copy link

LarsBingBong commented Jan 30, 2023

@brandond and @mdrakiburrahman with your input I'm going with the symlink approach. Thank you very much. Low-key Linux conf. FTW once again 👍🏿 .... have a great day.

@sourcehawk
Copy link

Does the --data-dir flag on installation not set the storage path for all k3s resources, including container images?

@LarsBingBong
Copy link

@hauks96 I would be eager to know this as well ... whether or not this is the case. @brandond are we going of the beaten path here on this one. In regards to going the prolonged symlink approach - when we ( maybe ) - could just go by the way of using the --data-dir argument on the K3s worker/agent process?

Thank you to you both.

@LarsBingBong
Copy link

LarsBingBong commented Feb 1, 2023

Tried it and I'm getting

Feb 01 16:12:31 test-test-worker-29 k3s[1936]: E0201 16:12:31.551646    1936 cri_stats_provider.go:452] "Failed to get the info of the filesystem with mountpoint" err="failed to get device for dir \"/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs\": stat failed on /k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs with error: no such file or directory" mountpoint="/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs"

The above might be because there's a timing issue. The LVM2 whereon /k3s-worker-data is mounted is being created as K3s is being installed.

Verifying whether or not that's the case.

@brandond
Copy link
Member

brandond commented Feb 1, 2023

data-dir just relocates /var/lib/rancher/k3s. Other things like the runtime directories for the kubelet, containerd, cni, pod logs and so on are essentially hardcoded and will break other things in the ecosystem if they are changed so we do not.

@LarsBingBong
Copy link

LarsBingBong commented Feb 1, 2023

@brandond thank you. So the symlink approach is clearly the way to go. Thank you.

@LarsBingBong
Copy link

@mdrakiburrahman do you use Longhorn or perhaps see the below issue with whatever CSI you use?

When we use the --kubelet-arg root-dir option on the K3s binary Kubelet data goes into the defined path. However, /var/lib/kubelet/plugins/ still contains the driver.longhorn.io folder. Which then causes the following error on StatefulSet workloads: AttachVolume.Attach failed for volume "pvc-ID" : CSINode NODE_NAME does not contain driver driver.longhorn.io.

Any idea? Thank you.

@larssb
Copy link

larssb commented Feb 2, 2023

It was the Longhorn CSI that needed to have csi.kubeletRootDir set in the values.yaml Helm file and the the longhorn-csi-plugin DaemonSet had to be re-deployed for that to go into full effect. Then Longhorn was able registrar the Longhorn driver in the correct kubelet folder ...

@VladoPortos
Copy link

Is there a proper solution to this without using ln ?
I can see from systemctl status k3s-agent that it runs the containerd with parameters --state and --root

Group: /system.slice/k3s-agent.service
           ├─19541 /usr/local/bin/k3s agent
           ├─19571 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
           └─19721 /sbin/iptables -t nat -S FLANNEL-POSTRTG 1 --wait

But where is this defined ? Its not in /etc/systemd/system/k3s-agent.service

@brandond
Copy link
Member

brandond commented Feb 3, 2023

It's hardcoded, sorry.

The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.

@sourcehawk
Copy link

The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.

The documentation on K3S.io specifically states that --data-dir changes the location where the state is kept. Might want to have that changed to reflect in more detail what is actually kept there.

So what exactly is being kept there if not the containers, state, cni nor logs? I have my worker nodes configured with a --data-dir flag. The nodes have a 16GB root volume, so I was really banking on the volume mount approach, mainly for the ease of resizing it and adding a new one.

# --data-dir /data/docker/rancher/k3s

$ ls /data/docker/rancher/k3s
agent  data

$ sudo du -sh /data/docker/rancher/k3s/agent
9.5G    /data/docker/rancher/k3s/agent

$ sudo du -sh /data/docker/rancher/k3s/data
208M    /data/docker/rancher/k3s/data

What I really want is to prevent data from being stored on the root volume, because I do not want a volume that cannot be reconfigured to fill up, forcing me to create a new VM. What are the options here?

@LarsBingBong
Copy link

So what you need @hauks96

Is to use the approach outlined by @mdrakiburrahman. So:

  • use his script to symlink containerd-root and state dirs to e.g. an LVM2 or ZFS disk - that you can expand dynamically when in need later on
  • use the --kubelet-arg root-dir to move the kubelet dir to the dedicated LVM2/ZFS mount
  • if you're using Longhorn ( likely also for other out-of-tree CSI plugins ) you need to tell it where the kubelet dir is now at. Now that you moved the kubelet dir. to the dedicated LVM2/ZFS disk. So the kubelet is no longer at the default /var/lib/kubelet.

You can reach out to me on Slack. Either the Kubernetes community or the Rancher one. Where I'm: Lars Bingchong / Lead DevOps Engineer.

N.B. the --data-dir seems to be specific K3s related data. So meta-data it needs for generated certs, certs itself and the like.

@sourcehawk
Copy link

Although it seems to be working in general after this change, I am getting permission errors for resources within the cluster trying to create or access certain files now.

Should I just chmod 777 my mount directory?

@LarsBingBong
Copy link

I don't think so @hauks96 ... that should indeed not be necessary. Are you bumping into some inherited permissions causing this? Is it stateuful workloads seeing this or emptyDir consuming ones - or just in general?

@sourcehawk
Copy link

Yeah I figured. The problem was existing PVC's in the cluster that had to be deleted. Thanks again

@vanniszsu
Copy link

add root = and state = in /var/lib/rancher/k3s/agent/etc/containerd/config.toml should be able to set custom path for k3s integrated containerd's images and state

@predictablemiracle
Copy link

add root = and state = in /var/lib/rancher/k3s/agent/etc/containerd/config.toml should be able to set custom path for k3s integrated containerd's images and state

This won't work as the file is overwritten when the k3s service starts.

@ianb-mp
Copy link

ianb-mp commented Dec 7, 2023

Be warned that modifying containerd storage location as suggested by @mdrakiburrahman (and others) can break Kubevirt - see kubevirt/kubevirt#10703 (comment)

EDIT: I also tried using bind mounts rather than symlink, but still had issues

@larssb
Copy link

larssb commented Dec 7, 2023

Thank you for pointing that out @ianb-mp. We don't use KubeVirt so we have no issues. Isn't it also possible to configure KubeVirt so that it "knows" where containerd data and conf. is? At least it should be.

@mansoncui
Copy link

How was your problem solved?

@codeReaper2001
Copy link

K3s uses containerd in a close to vanilla state.

  • containerd root: /var/lib/rancher/k3s/agent/containerd (all your images, container files etc)
  • containerd state: /run/k3s/containerd (scratch space blown up during containerd reboots)

Practically, what we want is for everything that can grow to be backed by some large drive.

So say, you have a large mounted drive at /mnt, this script below does the trick (the Kubernetes nodefs and imagefs will both be backed by /mnt, and so will kubelet and the local-path persistent volumes) - so your machine's OS directory stays unbloated:

# =======================================
# Storage prep to "/mnt" drive (~500 GB+)
# =======================================
MNT_DIR="/mnt"
K3S_VERSION="v1.25.4+k3s1"

# nodefs
#
KUBELET_DIR="${MNT_DIR}/kubelet"
sudo mkdir -p "${KUBELET_DIR}"

# imagefs: containerd has a root and state directory
#
# - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
#
# containerd root -> /var/lib/rancher/k3s/agent/containerd
#
CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent"
CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}"
sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}"

# containerd state -> /run/k3s/containerd
#
CONTAINERD_STATE_DIR_OLD="/run/k3s"
CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd"
sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}"
sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}"

# pvs -> /var/lib/rancher/k3s/storage
#
PV_DIR_OLD="/var/lib/rancher/k3s"
PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage"
sudo mkdir -p "${PV_DIR_OLD}"
sudo mkdir -p "${PV_DIR_NEW}"
sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}"

# =======
# Install
# =======
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml

When kubernetes comes up, you see everything is backed with a lot of space (in my case, /mnt has 600 GB) - so nodefs and imagefs is nice and full:

image

And nodefs and imagefs are at 99% - meaning Eviction Manager will not fire under normal circumstances: image

Hello, I would like to understand how the Eviction Thresholds in the second image were derived. I would like to verify if my configuration is taking effect. Could you please provide more details on this? Thanks a lot!

@mdrakiburrahman
Copy link

@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/

@codeReaper2001
Copy link

codeReaper2001 commented Mar 25, 2024

@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/

Thanks a lot, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests