Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds rootless containers support #318

Merged
merged 4 commits into from
Oct 27, 2019
Merged

Adds rootless containers support #318

merged 4 commits into from
Oct 27, 2019

Conversation

ncordon
Copy link
Member

@ncordon ncordon commented Oct 23, 2019

Adds rootless containers support and also bumps the version of libcontainer from 1.0.0rc5 to current 1.0.0rc9.

Supersedes #153. It takes over previous @dennwc work. The reason why #153 was failing was due to problems with version rc5 itself, that were solved in rc6 as documented in libcontainer#1759 and licontainer#1806.

Note that bumping the libcontainer version (it is a problem unrelated to rootless) requires to use Init: true for the first process (not sure whether if some process arrives to the container after that with that flag set would cause any damage, probably it would ignored), as documented in libcontainer#2089 and libcontainer#1957. Note this change was introduced in rc6, and was not documented until recently (rc8 or even rc9 🙄). I had to actually navigate and debug the libcontainer code to look why the containers were not starting correctly, and later I searched for known issues related to Init flag in the bug tracker of the project.

Also, this may require sudo access in some OSs to enable unprivileged containers support (not Ubuntu for example), as documented in the buildah documentation or in the usernetes documentation. It may we worth linking this information in the README if we proceed forward with this PR or a hint of what it may be needed to enable such unprivileged containers:

# For only the current session
sudo sysctl kernel.unprivileged_userns_clone=1
# For a permanent change
echo "kernel.unprivileged_userns_clone=1" >> /etc/sysctl.conf
sudo sysctl -p

With all that being said, as documented in the Docker docs, the security policy by default disables unshare command inside the containers (which is needed to create a user namespace and also to give a Hostname to the container as a searchable driver). Therefore need to run bblfshd with --security-opt seccomp=unconfined. Also, as documented in libcontainer#1658, there is a problem with spawning rootless containers inside another non-root container and the /proc mount / masking. Adding a volume -v /proc:/newproc would solve that problem. I am going to try to come up with a nicer default config to squash the aforementioned two bugs. Right now this works:

# Builds bblfsh/bblfshd:dev-924c14e
make build

docker run \
  --name bblfshd -p 9432:9432 \
  -v /var/lib/bblfshd:/var/lib/bblfshd \
  -v /proc:/newproc \
  --security-opt seccomp=unconfined \
  bblfsh/bblfshd:dev-924c14e
# Managing drivers works
docker exec -it bblfshd bblfshctl driver remove --all
docker exec -it bblfshd bblfshctl driver install --recommended
# Parsing from inside the container works
docker exec -it bblfshd bblfshctl parse /opt/bblfsh/etc/examples/python.py
# Parsing from outside the container works
curl https://raw.githubusercontent.com/bblfsh/sdk/master/uast/types.go -o types.go
bblfsh-cli types.go

This change is Reviewable

ncordon and others added 3 commits October 23, 2019 11:16
The Init: true flag is needed due to changes introduced from 1.0.0rc5 version of libcontainer to 1.0.0rc6. If we do not provide it, containers cannot find their entrypoint

Signed-off-by: ncordon <[email protected]>
Signed-off-by: Denys Smirnov <[email protected]>
Init: true is needed for the first process spawning in a container

Signed-off-by: ncordon <[email protected]>
@ncordon ncordon requested a review from a team as a code owner October 23, 2019 11:24
@ncordon
Copy link
Member Author

ncordon commented Oct 23, 2019

Special ping to @dennwc (since I completed his work) to test this branch and suggest possible changes or discard this completely

@ncordon ncordon self-assigned this Oct 23, 2019
Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can proceed with a PR assuming all drivers still work with all those changes.

Reviewed 3 of 3 files at r1, 2 of 2 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ncordon)

a discussion (no related file):
Let's update the README as well with all the things you mentioned. Maybe adding a rootless.md and linking to it will be a good idea.



runtime/runtime.go, line 160 at r3 (raw file):

			},
			{
				Source:      "sysfs",

Can you please verify that all existing drivers work with this change?

@ncordon
Copy link
Member Author

ncordon commented Oct 23, 2019


runtime/runtime.go, line 160 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Can you please verify that all existing drivers work with this change?

They seem to be working, yes (at least bblfsh-cli can parse files for each language we support in recommended drivers). Do you suggest testing anything else?

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ncordon)


runtime/runtime.go, line 160 at r3 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

They seem to be working, yes (at least bblfsh-cli can parse files for each language we support in recommended drivers). Do you suggest testing anything else?

No, thanks :)

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ncordon)


daemon/driver.go, line 77 at r3 (raw file):

		Stdout: os.Stdout,
		Stderr: os.Stderr,
		Init: true,

Some of the files in this PR will need a gofmt before merging.

@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019

a discussion (no related file):

Previously, dennwc (Denys Smirnov) wrote…

Let's update the README as well with all the things you mentioned. Maybe adding a rootless.md and linking to it will be a good idea.

Done!


@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019


daemon/driver.go, line 77 at r3 (raw file):

Previously, creachadair (M. J. Fromberger) wrote…

Some of the files in this PR will need a gofmt before merging.

Done!

@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019


runtime/container.go, line 89 at r4 (raw file):

func (c *container) Stop() error {
	if err := c.process.Signal(syscall.SIGKILL); err != nil {

If I use SIGTERM instead of SIGKILL here, we see errors in bblfshd logs when running without --privileged flags about drivers not being stopped. In other words, in rootless containers the containers processes I spin ignore the SIGTERM signal.

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @creachadair and @ncordon)

a discussion (no related file):

Previously, ncordon (Nacho Cordón) wrote…

Done!

Docs looks great, thanks a lot!



bblfshd-seccomp.json, line 55 at r5 (raw file):

		{
			"names": [
				"accept",

Hmm, interesting. We should review this and narrow down the list (in future PRs).


runtime/container.go, line 89 at r4 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

If I use SIGTERM instead of SIGKILL here, we see errors in bblfshd logs when running without --privileged flags about drivers not being stopped. In other words, in rootless containers the containers processes I spin ignore the SIGTERM signal.

Killing sounds like a good idea in general. We don't care about any state in the driver's memory.

Can you please add a comment, so no one reverts it by accident?

@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019


runtime/container.go, line 89 at r4 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Killing sounds like a good idea in general. We don't care about any state in the driver's memory.

Can you please add a comment, so no one reverts it by accident?

Done!

@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019


bblfshd-seccomp.json, line 55 at r5 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Hmm, interesting. We should review this and narrow down the list (in future PRs).

Nothing against it of course 👌 . Just to clarify, this syscalls are the ones docker is allowed by default (default.json) and are considered secure, plus mount, unshare, pivot_root, keyctl, umount2and sethostname (those last ones are needed for sure because I tested removing any of them an we cannot spawn containers inside anymore. It is better than using seccomp=unconfined anyway, which does not restrict syscalls. And using --privileged is even less restrictive than seccomp=unconfined because you can do a lot of dangerous stuff with the host inside the containers.

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r6.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @creachadair)


bblfshd-seccomp.json, line 55 at r5 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

Nothing against it of course 👌 . Just to clarify, this syscalls are the ones docker is allowed by default (default.json) and are considered secure, plus mount, unshare, pivot_root, keyctl, umount2and sethostname (those last ones are needed for sure because I tested removing any of them an we cannot spawn containers inside anymore. It is better than using seccomp=unconfined anyway, which does not restrict syscalls. And using --privileged is even less restrictive than seccomp=unconfined because you can do a lot of dangerous stuff with the host inside the containers.

Sure, I was not arguing abouit it in any way :)

In general, Babelfish usually assumed that drivers won't use anything except a few basic syscalls. So my point was, if we introduce seccomp, we can as well go one step further and deny syscalls that won't be used for parsing and serving the API. Plus basic FS access, of course.

But that's for another time. If it won't be superseded by the bblfshd redesign.

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


runtime/container.go, line 91 at r6 (raw file):

	// Running bblfshd as a rootless container requires to use
	// SIGKILL instead of SIGTERM or SIGINT to kill the process.
	// Otherwise it ignores the order

Hmm, that seems odd. Not a blocker, but I would expect that either the container process owner or root should be able to kill child processes cleanly (e.g., via SIGTERM).

@ncordon
Copy link
Member Author

ncordon commented Oct 24, 2019


README.md, line 40 at r6 (raw file):

  -p 9432:9432 \
  -v bblfsh-storage:/var/lib/bblfshd bblfsh/bblfshd \
  -v /proc:/newproc \

I am realizing, would this line -v/proc:/newproc work under macOS? @kuba-- @creachadair

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kuba--)


README.md, line 40 at r6 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

I am realizing, would this line -v/proc:/newproc work under macOS? @kuba-- @creachadair

Ah, no—macOS does not have a /proc filesystem. Sorry, I missed that.

@ncordon
Copy link
Member Author

ncordon commented Oct 25, 2019


README.md, line 40 at r6 (raw file):

Previously, creachadair (M. J. Fromberger) wrote…

Ah, no—macOS does not have a /proc filesystem. Sorry, I missed that.

So can we suppose it is safe to omit that part? I think we do not have way of testing it, since I do not have a macOs and people with macOS cannot do make build and test the new container 🙄

@ncordon
Copy link
Member Author

ncordon commented Oct 25, 2019


README.md, line 40 at r6 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

So can we suppose it is safe to omit that part? I think we do not have way of testing it, since I do not have a macOs and people with macOS cannot do make build and test the new container 🙄

I think it is safe to leave it there, because the folder would be looked for inside the Moby VM that runs docker on macOS, according to this: https://docs.docker.com/docker-for-mac/osxfs/#namespaces

Also reindents files with gofmt.

If we killed the process with SIGTERM instead of SIGKILL in rootless mode, the containers ignored our order.

Also added rootless.md file to explain the rootless configuration for running bblfshd

Signed-off-by: ncordon <[email protected]>
Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @kuba--)


README.md, line 40 at r6 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

I think it is safe to leave it there, because the folder would be looked for inside the Moby VM that runs docker on macOS, according to this: https://docs.docker.com/docker-for-mac/osxfs/#namespaces

👍

@ncordon ncordon merged commit 76a10a4 into master Oct 27, 2019
@ncordon ncordon deleted the rootless branch October 28, 2019 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants