-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds rootless containers support #318
Conversation
The Init: true flag is needed due to changes introduced from 1.0.0rc5 version of libcontainer to 1.0.0rc6. If we do not provide it, containers cannot find their entrypoint Signed-off-by: ncordon <[email protected]>
Signed-off-by: Denys Smirnov <[email protected]>
Init: true is needed for the first process spawning in a container Signed-off-by: ncordon <[email protected]>
Special ping to @dennwc (since I completed his work) to test this branch and suggest possible changes or discard this completely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can proceed with a PR assuming all drivers still work with all those changes.
Reviewed 3 of 3 files at r1, 2 of 2 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ncordon)
a discussion (no related file):
Let's update the README
as well with all the things you mentioned. Maybe adding a rootless.md
and linking to it will be a good idea.
runtime/runtime.go, line 160 at r3 (raw file):
}, { Source: "sysfs",
Can you please verify that all existing drivers work with this change?
runtime/runtime.go, line 160 at r3 (raw file): Previously, dennwc (Denys Smirnov) wrote…
They seem to be working, yes (at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ncordon)
runtime/runtime.go, line 160 at r3 (raw file):
Previously, ncordon (Nacho Cordón) wrote…
They seem to be working, yes (at least
bblfsh-cli
can parse files for each language we support inrecommended
drivers). Do you suggest testing anything else?
No, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ncordon)
daemon/driver.go, line 77 at r3 (raw file):
Stdout: os.Stdout, Stderr: os.Stderr, Init: true,
Some of the files in this PR will need a gofmt before merging.
a discussion (no related file): Previously, dennwc (Denys Smirnov) wrote…
Done! |
daemon/driver.go, line 77 at r3 (raw file): Previously, creachadair (M. J. Fromberger) wrote…
Done! |
runtime/container.go, line 89 at r4 (raw file):
If I use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @creachadair and @ncordon)
a discussion (no related file):
Previously, ncordon (Nacho Cordón) wrote…
Done!
Docs looks great, thanks a lot!
bblfshd-seccomp.json, line 55 at r5 (raw file):
{ "names": [ "accept",
Hmm, interesting. We should review this and narrow down the list (in future PRs).
runtime/container.go, line 89 at r4 (raw file):
Previously, ncordon (Nacho Cordón) wrote…
If I use
SIGTERM
instead ofSIGKILL
here, we see errors inbblfshd
logs when running without--privileged
flags about drivers not being stopped. In other words, in rootless containers the containers processes I spin ignore theSIGTERM
signal.
Killing sounds like a good idea in general. We don't care about any state in the driver's memory.
Can you please add a comment, so no one reverts it by accident?
runtime/container.go, line 89 at r4 (raw file): Previously, dennwc (Denys Smirnov) wrote…
Done! |
bblfshd-seccomp.json, line 55 at r5 (raw file): Previously, dennwc (Denys Smirnov) wrote…
Nothing against it of course 👌 . Just to clarify, this syscalls are the ones docker is allowed by default (default.json) and are considered secure, plus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r6.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @creachadair)
bblfshd-seccomp.json, line 55 at r5 (raw file):
Previously, ncordon (Nacho Cordón) wrote…
Nothing against it of course 👌 . Just to clarify, this syscalls are the ones docker is allowed by default (default.json) and are considered secure, plus
mount, unshare, pivot_root, keyctl, umount2
andsethostname
(those last ones are needed for sure because I tested removing any of them an we cannot spawn containers inside anymore. It is better than usingseccomp=unconfined
anyway, which does not restrict syscalls. And using--privileged
is even less restrictive thanseccomp=unconfined
because you can do a lot of dangerous stuff with the host inside the containers.
Sure, I was not arguing abouit it in any way :)
In general, Babelfish usually assumed that drivers won't use anything except a few basic syscalls. So my point was, if we introduce seccomp, we can as well go one step further and deny syscalls that won't be used for parsing and serving the API. Plus basic FS access, of course.
But that's for another time. If it won't be superseded by the bblfshd redesign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved
runtime/container.go, line 91 at r6 (raw file):
// Running bblfshd as a rootless container requires to use // SIGKILL instead of SIGTERM or SIGINT to kill the process. // Otherwise it ignores the order
Hmm, that seems odd. Not a blocker, but I would expect that either the container process owner or root should be able to kill child processes cleanly (e.g., via SIGTERM).
README.md, line 40 at r6 (raw file):
I am realizing, would this line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kuba--)
README.md, line 40 at r6 (raw file):
Previously, ncordon (Nacho Cordón) wrote…
I am realizing, would this line
-v/proc:/newproc
work under macOS? @kuba-- @creachadair
Ah, no—macOS does not have a /proc
filesystem. Sorry, I missed that.
README.md, line 40 at r6 (raw file): Previously, creachadair (M. J. Fromberger) wrote…
So can we suppose it is safe to omit that part? I think we do not have way of testing it, since I do not have a macOs and people with macOS cannot do |
README.md, line 40 at r6 (raw file): Previously, ncordon (Nacho Cordón) wrote…
I think it is safe to leave it there, because the folder would be looked for inside the Moby VM that runs docker on macOS, according to this: https://docs.docker.com/docker-for-mac/osxfs/#namespaces |
Also reindents files with gofmt. If we killed the process with SIGTERM instead of SIGKILL in rootless mode, the containers ignored our order. Also added rootless.md file to explain the rootless configuration for running bblfshd Signed-off-by: ncordon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kuba--)
README.md, line 40 at r6 (raw file):
Previously, ncordon (Nacho Cordón) wrote…
I think it is safe to leave it there, because the folder would be looked for inside the Moby VM that runs docker on macOS, according to this: https://docs.docker.com/docker-for-mac/osxfs/#namespaces
👍
Adds rootless containers support and also bumps the version of
libcontainer
from1.0.0rc5
to current1.0.0rc9
.Supersedes #153. It takes over previous @dennwc work. The reason why #153 was failing was due to problems with version rc5 itself, that were solved in rc6 as documented in libcontainer#1759 and licontainer#1806.
Note that bumping the
libcontainer
version (it is a problem unrelated to rootless) requires to useInit: true
for the first process (not sure whether if some process arrives to the container after that with that flag set would cause any damage, probably it would ignored), as documented in libcontainer#2089 and libcontainer#1957. Note this change was introduced in rc6, and was not documented until recently (rc8 or even rc9 🙄). I had to actually navigate and debug thelibcontainer
code to look why the containers were not starting correctly, and later I searched for known issues related toInit
flag in the bug tracker of the project.Also, this may require
sudo
access in some OSs to enable unprivileged containers support (not Ubuntu for example), as documented in the buildah documentation or in the usernetes documentation. It may we worth linking this information in theREADME
if we proceed forward with this PR or a hint of what it may be needed to enable such unprivileged containers:# For only the current session sudo sysctl kernel.unprivileged_userns_clone=1
With all that being said, as documented in the Docker docs, the security policy by default disables
unshare
command inside the containers (which is needed to create a user namespace and also to give aHostname
to the container as a searchable driver). Therefore need to runbblfshd
with--security-opt seccomp=unconfined
. Also, as documented in libcontainer#1658, there is a problem with spawning rootless containers inside another non-root container and the/proc
mount / masking. Adding a volume-v /proc:/newproc
would solve that problem. I am going to try to come up with a nicer default config to squash the aforementioned two bugs. Right now this works:# Builds bblfsh/bblfshd:dev-924c14e make build docker run \ --name bblfshd -p 9432:9432 \ -v /var/lib/bblfshd:/var/lib/bblfshd \ -v /proc:/newproc \ --security-opt seccomp=unconfined \ bblfsh/bblfshd:dev-924c14e
This change is