Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s6-overlay-preinit: fatal: unable to chown /var/run/s6: Operation not permitted #309

Closed
jinnko opened this issue Nov 8, 2020 · 17 comments
Closed

Comments

@jinnko
Copy link

jinnko commented Nov 8, 2020

Raising this issue here as I suspect a recent change may have changed the behaviour. Originally raised this in linuxserver/docker-unifi-controller#62

When running dockerd with userns-remap, then starting up a container with --userns=host, I get the following error message:

s6-overlay-preinit: fatal: unable to chown /var/run/s6: Operation not permitted

Inspecting the contents of the container I found:

root@nexter:/etc/s6# ls -l /bin/s6-overlay-preinit
-rwsr-xr-x 1 493216 493216 9120 Oct 19 20:32 /bin/s6-overlay-preinit

I found an old ticket for dockerd (moby) that seems to indicate this is to be expected, however this wasn't a problem until just this past week.

So the issue is that when the container is started the ownership of /bin/s6-overlay-preinit is the remapped UID, and given it's SUID, it doesn't run as the intended user. This is also confirmed in the docker documentation: https://docs.docker.com/engine/security/userns-remap/#disable-namespace-remapping-for-a-container

Has anything changed in this or a related repo that may be related?

Would you be open to a PR that ensures the ownership of /bin/s6-overlay-preinit is correct at runtime?

@jprjr
Copy link
Member

jprjr commented Nov 12, 2020

Always open for PRs!

I think the fix is to try and chown + chmod s6-overlay-preinit before running it. I'm guessing this would fail if somebody's running a container with a read-only root filesystem or using the --user switch, it would need to be wrapped in foreground blocks.

A hard requirement is we only use tools that ship in the tarball, so you'll need to use s6-chown, s6-chmod, execline, etc.

@Crinisen
Copy link

I have now spent exactly 25 minutes using execlineb so I wouldn't want to submit this as a full PR because I suspect there is a much cleaner method, and I honestly didn't work through all the security implications. As a reference however, I just forced a ownership change by changing the last line of /init from: /etc/s6/init/init-stage1 $@ to:

backtick -n euid { id -u } importas -u euid euid
backtick -n egid { id -g } importas -u egid egid
foreground { /bin/s6-chown -u ${euid} -g ${egid} /bin/s6-overlay-preinit } /etc/s6/init/init-stage1 $@

@jprjr
Copy link
Member

jprjr commented Nov 29, 2020

It would probably be good to run the whole thing in a foreground block, right now this would be introducing the euid and egid variables into the whole supervision tree. If it's in a foreground block, it runs as a forked process and doesn't change the environment.

s6-chown has the interesting property of being able to use -U to read UID and GID environment variables - importas really just reads the rest of the script and substitutes variables into the CLI (this is a huge oversimplification), but if we use -U in s6-chown we can skip that. So this could be simplified a bit:

foreground {
    backtick -n UID { id -u }
    backtick -n GID { id -g }
    foreground { /bin/s6-chown -U /bin/s6-overlay-preinit }
} /etc/s6/init/init-stage1 $@

Only issue here is id is a binary on the system, we don't rely on any pre-installed binaries, just the ones we include in the tarball, I don't think there's anything in the s6-ecosystem that can grab the current UID/GID.

@skarnet would it make sense for s6-portable-utils to get an id replacement, or for s6 to get a command like s6-getuidgid that exports the current UID/GID into environment variables and exec's into a program? If those don't really have much use outside of this project I can just try writing my own.

@skarnet
Copy link
Contributor

skarnet commented Nov 30, 2020

@jprjr Well it wouldn't really make sense outside of suid binaries. If you want to s6-chown a file, it means you're root, so you know your uid and gid...
If the user you want to chown to is listed in /etc/passwd, then s6-envuidgid user will put the correct value in the UID and GID environment variables. If it's not, I'm afraid you'll have to rely on id or bundle your own.

@jinnko
Copy link
Author

jinnko commented Nov 30, 2020

At a minimum we want to check the ownership of /bin/s6-overlay-preinit as that should always be root, but in the scenario that triggers this it will be a UID above 64k. If it isn't owned by UID 0, we'd want to chown root.

I'm not familiar enough with the s6 tools like exceline, but based on the suggestion above, how about something like this..

foreground {
    backtick -n REMAPPED_ROOT_UID { /usr/bin/stat -c '%u' /bin/s6-overlay-preinit }
    if { s6-test $REMAPPED_ROOT_UID -ne 0 } /bin/s6-chown -u 0 /bin/s6-overlay-preinit
} /etc/s6/init/init-stage1 $@

Unfortunately this does end up depending on stat.

I don't think using the id binary is achieving the correct goal as the runtime context will always return UID 0, but the container may have been built on a dockerd running with userns-remap enabled..

UPDATE: changed the check from / to the specific script as this is explicitly where the issue lies.

jinnko added a commit to jinnko/s6-overlay that referenced this issue Dec 1, 2020
When the image was built on or pulled by a dockerd running with userns-remap the root
filesystem will be owned by the remapped UID.  When that image is run explicitly in the
host's user namespace the ownership of the files on disk will be based on the userns-remapped
UIDs.  This is generally fine, except where an executable is SUID, in which case, which is the
case for the s6-overlay-preinit tool, so we must detect the situation and set the ownership of the
relevant tools.

See:
- just-containers#309
- moby/moby#28986
- https://docs.docker.com/engine/security/userns-remap/#disable-namespace-remapping-for-a-container
@robertbaker
Copy link

This might be similar to the permission issues I get when running docker 20.10 as rootless.

@jprjr
Copy link
Member

jprjr commented Jan 14, 2021

Are we overcomplicating this? The idea is s6-overlay-preinit should run as root no matter what the --user flag was for the container.

Should this just be:

foreground {
    /bin/s6-chown -u 0 /bin/s6-overlay-preinit
} /etc/s6/init/init-stage1 $@

If you're running with --user, you'd get an error from s6-chown, but it's not fatal since it's all in a foreground block anyway.

@jinnko
Copy link
Author

jinnko commented Jan 15, 2021

Thanks for looking at this. Perhaps just doing a blind chown may simplify things. I tried all sorts of variations to the execlineb init script a couple of weeks ago without success, though possibly not a blind chown. It did seem that only changing ownership of s6-overlay-preinit wasn't sufficient though, and I had run out of other ideas on how to make this work otherwise.

The issue isn't about the --user flag, but rather about the docker daemon configured with userns-remap, see user namespace isolation. Long story short, add these configs then restart the docker daemon (note - don't do this on an active docker daemon as the structures of /var/lib/docker change - revert config to go back, but you won't see docker daemon state migrated in either direction).

$ cat /etc/docker/daemon.json
{
  "userns-remap": "dockremap",
}
$ grep dockremap /etc/subuid /etc/subgid
/etc/subuid:dockremap:493216:65536
/etc/subgid:dockremap:493216:65536

When the docker daemon is running as such the pulled images and builds will inherit the remapped UIDs. The problem arises when you run a container on the daemon with the --userns=host argument which results in the UIDs going back to the standard host UIDs. The reason you'd do this is if you want to use user namespace remapping primarily which is only possibly by configuring the docker daemon - i.e. it's not possible to opt in to userns remapping at runtime.

Unfortunately one of the reason it is necessary to run a container in the host user namespace like this is when that container must also be on the host network namespace - i.e. you can't have the host network namespace without the host user namespace. This is the case for the linuxserver/docker-unifi-controller container as it doesn't function correctly behind a NAT.

A few weeks ago I did try again to make this all work and after a few hours of tinkering unsuccessfully I fell back to patching the upstream container like this.

$ cat Dockerfile
FROM linuxserver/unifi-controller:latest

ADD init-wrapper /init-wrapper
ENTRYPOINT ["/init-wrapper"]
$ cat init-wrapper            
#!/bin/sh

REMAPPED_ROOT_UID="$(stat -c '%u' /)"
if [ "$REMAPPED_ROOT_UID" -ne 0 ]; then
  find / -xdev -perm /4000 -uid "$REMAPPED_ROOT_UID" -exec chown root '{}' \;
fi

echo "Executing /init $@"
exec /init $@

@jprjr
Copy link
Member

jprjr commented Jan 15, 2021

Right, sorry, I just brought up --user in terms of how my proposed fix would affect that use-case - that it would still work (maybe just toss up an non-fatal error message).

So what I'm wondering is, it seems like the core issue is - SUID binaries need to have their UID set to 0.

I know I got hung up on trying to identify at runtime whether this is running in a userns-remap scenario, and only running chown if it's needed. But the only SUID binary we ship is the s6-overlay-preinit, and since it should always be owned by root anyway - would it be sufficient to just run s6-chown on it no matter what? That saves us having to write some kind of stat-like utility to include.

@jinnko
Copy link
Author

jinnko commented Jan 15, 2021

Good question - I believe it wasn't sufficient, but I will have another go using your proposed approach over the weekend and let you know my findings.

@skarnet
Copy link
Contributor

skarnet commented Jan 15, 2021

Tbh it's not surprising that suid binaries don't play nice with user namespaces and uid squashing and whatever mutant Linux feature is at play here.
@jprjr : why does s6-overlay-preinit need to be suid exactly? what does it do and why does it need root powers while being run as an unprivileged user?

@jprjr
Copy link
Member

jprjr commented Jan 15, 2021

@skarnet s6-overlay-preinit really takes care of two things.

Thing 1: /var/run/s6 needs to exist. If the container is started with a read-only root filesystem with /var set as a tmpfs, s6-overlay-preinit takes care of making sure /var/run and /var/run/s6 exist.

Thing 2: /var/run/s6 needs to be writable by the process tree user. Whether or not you're using a read-only root filesystem, if you start a container with the --user switch, you'll run the whole shebang as Not Root. s6-overlay-preinit will chown /var/run/s6 to whatever user the init system is running as, so operations like, copying service directories in, etc work.

@jprjr
Copy link
Member

jprjr commented Jan 15, 2021

Now that I've written that out, maybe s6-overlay-preinit could try only calling chown if the euid is root.

If the euid isn't 0, the chown is likely to fail anyway, and I think in this issue's case, the rest of the process tree is actually running as root, it's just this particular process not running as root.

jprjr added a commit to just-containers/s6-overlay-preinit that referenced this issue Jan 15, 2021
jprjr added a commit to just-containers/s6-overlay-preinit that referenced this issue Jan 15, 2021
@jprjr
Copy link
Member

jprjr commented Jan 15, 2021

@jinnko If you could try this build of s6-overlay-preinit, that would be a huge help: https://github.com/just-containers/s6-overlay-preinit/suites/1837529552/artifacts/35531088 - I don't have a Docker setup that I can easily switch to userns remapping.

Here's a Dockerfile that will download, unzip, etc automatically - this is a build of overlay-preinit that only tries to chown the /var/run/s6 folder if the EUID is 0.

FROM linuxserver/unifi-controller:latest

RUN apt-get update && apt-get install -y unzip

RUN curl -R -L -o /tmp/dist.zip \
  https://nightly.link/just-containers/s6-overlay-preinit/workflows/all/issue-309/dist.zip

RUN cd /tmp && unzip -d dist dist.zip
RUN tar xzf /tmp/dist/s6-overlay-preinit-1.0.4-linux-amd64.tar.gz -C /

@jinnko
Copy link
Author

jinnko commented Jan 17, 2021

@jprjr The build worked perfectly on a docker daemon running with userns-remap enabled and a container launched with --userns_mode=host --network_mode=host. The resulting file modes for relevant files in the container are:

# ls -sl /bin/s6-overlay-preinit
12 -rwsr-xr-x 1 493216 493216 9120 Jan 15 16:44 /bin/s6-overlay-preinit

# ls -al /var/run/s6
total 0
drwxr-xr-x 1 493216 root   124 Jan 17 16:32 .
drwxr-xr-x 1 493216 493216  54 Jan 17 16:32 ..
drwxr-xr-x 1 root   root    86 Jan 17 16:32 container_environment
drwxr-xr-x 1 493216 493216   8 Jan 17 16:32 env-stage1
drwxr-xr-x 1 root   root     0 Jan 17 16:32 env-stage2
drwxr-xr-x 1 root   root     0 Jan 17 16:32 env-stage3
drwxr-xr-x 1 root   root    90 Jan 17 16:32 etc
drwxr-xr-x 1 493216 493216  54 Jan 17 16:32 services

Thinking through the various launch scenarios, these are the outcomes I would expect:

  1. ✔️ - Regular docker daemon with container launched as root inside
  2. ✔️ - Regular docker daemon with container launched as another user (docker run --user ...)
  3. ✔️ - Remapped docker daemon (userns_remap), with container launched as root inside
  4. ❓ - Remapped docker daemon (userns_remap), with container launched as another user (docker run --user ...)

I think the 4th scenario would fail currently, and I think it would still fail with the update provided. Having said that this is still a step forward and I'd suggest it's worth rolling with.

To give an idea of why it's important to support userns_remap, consider all the people out there running docker on their workstations such that their regular user can launch containers. Without userns_remap it's trivial for a launched container to have root access to the whole system, whereas with remapping enabled any launched container would appear to have root inside the container, but not on the whole filesystem externally. So really if we add our regular user to the docker group we should generally also be enabling remapping. Obviously the benefits aren't limited to workstations.

Thanks for helping solve this - very much appreciated. The specific scenario I hit is resolved so I'm happy with the outcome. I'll leave the decision about closing this issue to you as there's still one more scenario that probably needs attention, but it does seem a bit of an edge case.

@jprjr
Copy link
Member

jprjr commented Jan 20, 2021

Latest version v2.2.0.0 includes the updated justc-envdir: https://github.com/just-containers/s6-overlay/releases/tag/v2.2.0.0

@jprjr
Copy link
Member

jprjr commented Jan 22, 2021

Going to close this out, that fourth use-case is still an issue, I think the only really good workaround for that is to create /var/run/s6 at build-time and chown it to whoever the eventual USER is. I suspect this case will be really, really niche.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants