Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s rootless does not work when run from docker container #2054

Closed
pmahoney opened this issue Jul 23, 2020 · 19 comments
Closed

k3s rootless does not work when run from docker container #2054

pmahoney opened this issue Jul 23, 2020 · 19 comments

Comments

@pmahoney
Copy link

pmahoney commented Jul 23, 2020

Environmental Info:
K3s Version: k3s version v1.18.6+k3s1 (6f56fa1)

Node(s) CPU architecture, OS, and Version: Linux docker-desktop 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020 x86_64 GNU/Linux

Cluster Configuration: k3s docker container, server and agent running together in same container

Describe the bug:

In rootless mode, the cluster fails with:

failed to setup network &{binary:slirp4netns mtu:65520 ipnet:0xc00089b980 disableHostLoopback:true apiSocketPath: enableSandbox:false enableSeccomp:false}: setting up tap tap0: executing [[nsenter -t 15 -n -m -U --preserve-credentials ip tuntap add name tap0 mode tap] [nsenter -t 15 -n -m -U --preserve-credentials ip link set tap0 up]]: exit status 1

Steps To Reproduce:

  • customized k3s docker image configured with uidmap and configured to run as non-root user:
FROM alpine:3.12 AS alpine
RUN apk -u --no-cache add shadow-uidmap

FROM rancher/k3s:v1.18.6-k3s1
COPY --from=alpine /etc/passwd /etc/group /etc/shadow /etc/subgid /etc/subuid /etc/
COPY --from=alpine /usr/bin/newgidmap /usr/bin/newuidmap /usr/bin/
COPY --from=alpine /lib/ld-musl-x86_64.so.1 /lib/

RUN adduser -g k3s -s /bin/false -D -u 1001 k3s \
        && mkdir -p /home/k3s && chown k3s /home/k3s \
        && echo k3s:165536:65536 >> /etc/subuid \
        && echo k3s:165536:65536 >> /etc/subgid

USER k3s
  • started like this:
docker run --rm --privileged -p 80:80 customized-k3s-image server --rootless

Expected behavior:

Well I'm not sure. I'd like to be able to run rootless k3s in a docker container, but I'm not sure if this is possible.

Actual behavior:

$ docker run --rm --privileged -p 80:80 customized-k3s-image server --rootless
open: Permission denied
time="2020-07-23T13:51:52.542185940Z" level=fatal msg="failed to setup network &{binary:slirp4netns mtu:65520 ipnet:0xc0007c0990 disableHostLoopback:true apiSocketPath: enableSandbox:false enableSeccomp:false}: setting up tap tap0: executing [[nsenter -t 16 -n -m -U --preserve-credentials ip tuntap add name tap0 mode tap] [nsenter -t 16 -n -m -U --preserve-credentials ip link set tap0 up]]: exit status 1"

Additional context / logs:

@pmahoney
Copy link
Author

I got a little farther by setting the group of the k3s user to root so it has write access to /dev/net/tun.

Using strace, I see the invocation of slirp4netns, which writes an error (that I do not see on the terminal) and exits 1:

execve("/bin/slirp4netns", ["slirp4netns", "--mtu", "65520", "-r", "3", "--disable-host-loopback", "--cidr", "10.41.0.0/16", "221", "tap0"], 0xc0000eb3e0 /* 11 vars */) = 0

...

writev(2, [{iov_base="", iov_len=0}, {iov_base="the option -r FD requires -c\n", iov_len=29}], 2) = 29

@pmahoney
Copy link
Author

pmahoney commented Jul 23, 2020

And farther still after upgrading the slirp4netns binary as per #1949

But still no dice (even with chmod a+rwx /dev/kmsg)

F0723 15:48:16.213402      25 server.go:270] failed to run Kubelet: failed to create kubelet: open /dev/kmsg: operation not permitted
FATA[2020-07-23T15:48:16.361997795Z] child diedcommand [/proc/self/exe server --rootless] exited: waitid: no child processes
FATA[2020-07-23T15:48:16.384687943Z] child exited: exit status 1

@pmahoney
Copy link
Author

On the host system, running sysctl kernel.dmesg_restrict=0 fixed that issue (was set to 1, though this varies depending on the host configuration of course). See https://unix.stackexchange.com/questions/390184/dmesg-read-kernel-buffer-failed-permission-denied

@brandond
Copy link
Member

brandond commented Jul 23, 2020

There have been several recent fixes for rootless mode, including the slirp4netns update and --snapshotter flag that allows you to use the native snapshotter instead of overlayfs. These are only available on master though - can you try a recent master build and see if it works better?

@pmahoney
Copy link
Author

pmahoney commented Jul 23, 2020

@brandond Sure. Using rancher/k3s:v1.18.6-k3s-4eb88a2f-amd64 getting a little farther still. Had to do a chmod g+w /bin/aux so something could create a bunch of iptables symlinks there.

Currently stuck on:

E0723 17:08:39.311738      25 summary_sys_containers.go:47] Failed to get system container stats for "/systemd": failed to get cgroup stats for "/systemd": failed to get container info for "/systemd": unknown container "/systemd"

W0723 17:10:30.019027      25 container_manager_linux.go:641] [ContainerManager] Failed to ensure state of "/systemd": [failed to move PID 25 (in "/docker/6d92b9cb206a8306c2bac301942583f05318f71f10dc4e8816680d8823342537") to "/systemd": mkdir /sys/fs/cgroup/cpuset/systemd: permission denied, failed to apply oom score -999 to PID 25: write /proc/25/oom_score_adj: permission denied]

Also, what should the snapshotter be? I'm using --snapshotter native at the moment, but is that correct? What are the possible values?

@brandond
Copy link
Member

Just out of curiosity, have you tried k3d, or is there some reason in particular you want to manually run rootless in docker?

@pmahoney
Copy link
Author

I've not looked at k3d in long time. Started there, but moved to our own scripts to run k3s in docker for various reasons. But if k3d supports a rootless mode, then I'll give it another look.

@brandond
Copy link
Member

I don't know that it specifically supports rootless, but I think it's the best way to run k3s in docker. k3s in docker while also rootless is not something that's seen much testing, as far as I'm aware.

@pmahoney
Copy link
Author

pmahoney commented Jul 23, 2020

Well those above errors are not fatal it seems. I'm able to run kubectl and start pods, etc.

Curiously, I cannot run ctr, for example:

$ docker exec -ti $container sh

## user "k3s" in the container

$ ctr image ls
ctr: failed to dial "/run/k3s/containerd/containerd.sock": context deadline exceeded

$ ls -l /run/k3s/containerd
ls: cannot access '/run/k3s/containerd': No such file or directory

$ ps -e -o pid,ppid,args | grep containerd
   66    49 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
  830    17 /bin/containerd-shim-runc-v2 -namespace k8s.io -id 79b5f37335148f0dc36a13404685ca24b049a4f8da080172c0917c85954a097c -address /run/k3s/containerd/containerd.sock
...

$ nsenter -t 66 -a -- ls /run/k3s/containerd
nsenter: reassociate to namespace 'ns/cgroup' failed: Operation not permitted

And I've also not figured out how to connect over the network to any of my pods, but that's a semi-known issue with rootless.

@pmahoney
Copy link
Author

pmahoney commented Jul 23, 2020

Entering the container as root (which sort of defeats the whole exercise) does allow me to run ctr:

$ docker exec --user root -ti $container sh

### root in the container

# ls -l /run/k3s/containerd
ls: cannot access '/run/k3s/containerd': No such file or directory

## root can use nsenter though...
# nsenter -t 66 -a -- ctr images ls
<... expected image list ...>

@brandond
Copy link
Member

I'm a little confused by the fact that you are getting '/run/k3s/containerd': No such file or directory when non-root but clearly it exists and is accessible when you exec in as root. These are the exact same containers?

@pmahoney
Copy link
Author

pmahoney commented Jul 23, 2020

Same container. Note it's not root vs. non-root, but whatever the default container's namespace is vs. entering containerd's namespace (both of those shell session were run from inside a docker exec ... $container; the one run as root allowed using nsenter). I'm definitely confused on how all the namespaces fit together, so I could be missing some things.

(I updated the shell listings to clarify a little)

@pmahoney
Copy link
Author

To summarize current status:

abstract

I can launch k3s, run kubectl, deploy my app, and connect to it from the external host (the host running docker), all with only non-root docker containers.

There are various warnings and errors in the logs. I cannot run ctr as non-root user.

details

Here's the image that I'm running. It's k3s built from current master plus the uidmap tools and a "k3s" non-root user.

FROM alpine:3.12 AS alpine
RUN apk -u --no-cache add shadow-uidmap

# custom built from k3s source
FROM rancher/k3s:v1.18.6-k3s-4eb88a2f-amd64

COPY --from=alpine /etc/passwd /etc/group /etc/shadow /etc/subgid /etc/subuid /etc/
COPY --from=alpine /usr/bin/newgidmap /usr/bin/newuidmap /usr/bin/
COPY --from=alpine /lib/ld-musl-x86_64.so.1 /lib/

RUN mkdir -p /var/lib/rancher/k3s \
        && adduser -h /var/lib/rancher/k3s -g k3s -s /bin/false -D -u 1001 -G root k3s \
        && echo k3s:165536:65536 >> /etc/subuid \
        && echo k3s:165536:65536 >> /etc/subgid

RUN chmod g+w /bin/aux
RUN echo F3F79821-80EE-4B43-A4DD-E3DA712CA2BC >/etc/machine-id

USER k3s:root

I'm not sure the machine id is necessary or best practice, but having one eliminated an error message that got logged repeatedly.

I start this container like:

$ docker run --privileged -p 6443:6443 -p 80:10080 \
  server --rootless --snapshotter native --data-dir /var/lib/rancher/k3s

At this point, I can deploy and run my app. I can connect to it from the host machine via port 80 (which goes to 10080 inside the container, which is k3s' network as described in https://rancher.com/docs/k3s/latest/en/advanced/#known-issues-with-rootlesskit

From inside the container (docker exec -ti $container sh) I can run kubectl.

From inside the container, I cannot run ctr:

$ ctr image ls
ctr: failed to dial "/run/k3s/containerd/containerd.sock": context deadline exceeded

In fact I cannot even see that socket:

$ ls /run/k3s
ls: cannot access '/run/k3s': No such file or directory

Nor can I enter the namespaces of containerd:

$ pid=$(ps -e -o comm,pid,ppid,args | grep '^containerd ' | awk '{print $2}')
$ nsenter -t $pid -a -- ls /run/k3s
nsenter: reassociate to namespace 'ns/cgroup' failed: Operation not permitted

From inside the container as root, I can enter the namespaces of containerd and run ctr:

$ docker exec --user=root -ti $container sh

# pid=$(ps -e -o comm,pid,ppid,args | grep '^containerd ' | awk '{print $2}')
# nsenter -t $pid -a -- ls /run/k3s
containerd

# nsenter -t $pid -a -- ctr images ls | head -n1
REF                                           TYPE                                                      DIGEST                                                                  SIZE      PLATFORMS                                                                                                            LABELS
docker.io/rancher/coredns-coredns:1.6.9    application/vnd.docker.distribution.manifest.list.v2+json sha256:e70c936deab8efed89db66f04847fec137dbb81d5b456e8068b6e71cb770f6c0 12.8 MiB  linux/amd64,linux/arm,linux/arm64,linux/ppc64le,linux/s390x                                                          io.cri-containerd.image=managed
...

In the logs of the k3s container, these are repeated over and over but seems mostly harmless:

E0724 16:57:28.869540      27 summary_sys_containers.go:47] Failed to get system container stats for "/systemd": failed to get cgroup stats for "/systemd": failed to get container info for "/systemd": unknown container "/systemd"

W0724 16:58:15.857572      27 container_manager_linux.go:641] [ContainerManager] Failed to ensure state of "/systemd": [failed to move PID 27 (in "/docker/4d3fb7cbac317f8c622cf011c34d10917e4b5d65220233c9db22cf9739e0cb29") to "/systemd": mkdir /sys/fs/cgroup/cpuset/systemd: permission denied, failed to apply oom score -999 to PID 27: write /proc/27/oom_score_adj: permission denied]

@pmahoney
Copy link
Author

Well it's not perfect, but it does run at this point, so I'm closing this issue. Thanks for your help @brandond. Would appreciate if anyone has tips/suggestions for getting past the error and warning log messages, and/or explaining the namespaces or a way to run ctr without root.

@offlinehacker
Copy link

I needed to update Dockerfile to be able to run latest k3s, code is available here: https://github.com/xtruder/docker-images/tree/master/k3s-rootless and image is available here: https://hub.docker.com/repository/docker/xtruder/k3s-rootless
I am successfully running this as part of vscode remote container, so i can develop in safe and more secure development environment.

I don't understand why this issue is closed? I think it would be better to fix k3s image, rather hacking around to make it work.

@brandond
Copy link
Member

brandond commented Mar 5, 2021

Rootless is experimental, and not intended to be the default state, either on bare metal or in a container. That's why the default image doesn't run rootless.

@ddomnik
Copy link

ddomnik commented May 19, 2021

@brandond Sure. Using rancher/k3s:v1.18.6-k3s-4eb88a2f-amd64 getting a little farther still. Had to do a chmod g+w /bin/aux so something could create a bunch of iptables symlinks there.

Currently stuck on:

E0723 17:08:39.311738      25 summary_sys_containers.go:47] Failed to get system container stats for "/systemd": failed to get cgroup stats for "/systemd": failed to get container info for "/systemd": unknown container "/systemd"

W0723 17:10:30.019027      25 container_manager_linux.go:641] [ContainerManager] Failed to ensure state of "/systemd": [failed to move PID 25 (in "/docker/6d92b9cb206a8306c2bac301942583f05318f71f10dc4e8816680d8823342537") to "/systemd": mkdir /sys/fs/cgroup/cpuset/systemd: permission denied, failed to apply oom score -999 to PID 25: write /proc/25/oom_score_adj: permission denied]

Also, what should the snapshotter be? I'm using --snapshotter native at the moment, but is that correct? What are the possible values?

I am probably at the same stage as here. But my log message is a bit different. It basically repeats this two logs every few minutes.

W0519 12:12:26.941710      21 sysinfo.go:203] Nodes topology is not available, providing CPU topology
E0519 12:12:27.989099      21 container_manager_linux.go:572] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 21: write /proc/21/oom_score_adj: permission denied"

I use my Raspberry 4b (4GB) with Ubuntu 20.04 LTS. Downloaded the k3s-arm64 binary from https://github.com/k3s-io/k3s/releases/tag/v1.21.0+k3s1 and executed it with ./k3s-arm64 server --rootless. Before that I had to change sysctl -w net.ipv4.ip_forward=1 and had to install uidmap because of some other errors.
@pmahoney how did you proceed at that point?

@brandond
Copy link
Member

The two messages you're getting are expected. The first is because Pis do not have CPU topology, the second is because non-root users are not allowed to adjust their OOM score. Are you seeing anything not work?

@ddomnik
Copy link

ddomnik commented May 20, 2021

The two messages you're getting are expected. The first is because Pis do not have CPU topology, the second is because non-root users are not allowed to adjust their OOM score. Are you seeing anything not work?

Thanks for the reply. For now it looks like everything is working so far despite the various warnings and errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants