Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup v2: support rootless systemd #2281

Merged
merged 1 commit into from
May 8, 2020

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Apr 1, 2020

Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 .

$ podman --cgroup-manager=systemd run -it --rm --runtime=runc \
  --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine
/ # cat /proc/self/cgroup
0::/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope
/ # cat
/sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max
44040192
/ # cat
/sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max
42000 100000
/ # cat
/sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max
42

The behavior corresponds to crun v0.13.

To test with Moby, launch dockerd-rootless.sh with --exec-opt native.cgroupdriver=systemd.

Fix #2163

@AkihiroSuda
Copy link
Member Author

@giuseppe @filbranden @mrunalp PTAL

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make detectuid sligthly more forgiving to the value of XDG_RUNTIME_DIR

Copy link
Contributor

@filbranden filbranden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @AkihiroSuda !

I have some comments, I hope you find them useful. Overall, looks really good!

Cheers,
Filipe

libcontainer/cgroups/systemd/unified_hierarchy.go Outdated Show resolved Hide resolved
libcontainer/cgroups/systemd/unified_hierarchy.go Outdated Show resolved Hide resolved
libcontainer/cgroups/systemd/user.go Outdated Show resolved Hide resolved

// detectUID detects UID from $XDG_RUNTIME_DIR if running in userNS.
// Otherwise returns os.Getuid() .
func detectUID() (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping you wouldn't need this... But looks like you still do for the external auth of the user D-Bus connection... Let me think about this one see if there's a better solution than this.

Having said that, see my comment below on requiring that the environment variables are properly set. So you could say that this is a hard requirement here. Looks like your code will fail if it doesn't match /run/user/%d which I think is a good thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libsystemd (which is used by crun) seems extracting UID from the "0::/user.slice/user-%d.slice/..." string in /proc/self/cgroup:
https://github.com/systemd/systemd/blob/6829d8ce6927c151e4391c6a5e69b76906fbd2be/src/libsystemd/sd-bus/sd-bus.c#L104
https://github.com/systemd/systemd/blob/ce4c73eb5d3c9a23628c450db0e25a529620adbe/src/basic/cgroup-util.c#L1361-L1395

I feel this seems less robust than $XDG_RUNTIME_DIR, but maybe we should do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👍 for path.Base(xdr) proposed by @kolyshkin #2281 (comment)

Copy link
Member Author

@AkihiroSuda AkihiroSuda Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another way is to exec busctl --user status and grep ^OwnerUID=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

busctl --user status ends up resolving it from getsockopt(fd, SOL_SOCKET, SO_PEERCRED, ...), see here.

I believe you can do the same in Golang with unix.GetsockoptUcred(int(fd), unix.SOL_SOCKET, unix.SO_PEERCRED) from the "golang.org/x/sys/unix" package. Does that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libsystemd (which is used by crun) seems extracting UID from the "0::/user.slice/user-%d.slice/..." string in /proc/self/cgroup:

Yes, I think that's possibly reasonable. It's still reimplementing C code in Golang but I think that's preferrable than some of the alternatives.

But see my message above about getting the real UID from a SO_PEERCRED call...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

busctl --user status ends up resolving it from getsockopt(fd, SOL_SOCKET, SO_PEERCRED, ...), see here.

Seems not true for OwnerUID. OwnerUID seems extracted from cgroup.
https://github.com/systemd/systemd/blob/v245/src/libsystemd/sd-bus/bus-dump.c#L375-L377
https://github.com/systemd/systemd/blob/v245/src/libsystemd/sd-bus/bus-creds.c#L509-L526

Anyway, SO_PEERCRED isn't what we want here, because the kernel automatically translates the UID in the ucred struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to use busctl --user status. I think this is the most robust way.

I guess eventually systemd may switch away from parsing cgroup and adopt more robust logic , but we won't be affected as long as we use busctl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point sorry my search was too shallow for it...

I think I just worry a bit about busctl not being available, it was first introduced as a debugging tool so I can see how distributions might not want to ship it or not ship it by default and it might go missing in the future...

Anyways, it's a fallback and we can iterate on it. Looks good.

@AkihiroSuda AkihiroSuda force-pushed the rootless-systemd branch 5 times, most recently from d5e4e38 to 159eecf Compare April 2, 2020 08:28
@filbranden
Copy link
Contributor

LGTM

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor

@AkihiroSuda I have a WIP progress patchset to remove Paths map from the systemd UnifiedManager, and simplify its initialization (meaning, for example, that you won't have to add Rootless as exported field). I'm not ready to create a PR yet but it looks like adding rootless support on top of my patch will be very easy. Can you please take a look?

Again, it's not ready yet but you can get the idea of what I'm trying to do:
https://github.com/kolyshkin/runc/commits/fs2-init-ctrl

@AkihiroSuda
Copy link
Member Author

@kolyshkin SGTM, do you want your commits to be merged before this PR or after?

@kolyshkin
Copy link
Contributor

@AkihiroSuda I think yours will be easier to port on top of mine than vise versa

@AkihiroSuda
Copy link
Member Author

If #2299 takes longer, can we merge this PR first?

@kolyshkin
Copy link
Contributor

If #2299 takes longer, can we merge this PR first?

Sorry for the delay, it is ready now, and it will definitely be easier to work with the updated/refactored codebase.

@kolyshkin
Copy link
Contributor

Need a rebase

@AkihiroSuda
Copy link
Member Author

Needs #2340

@AkihiroSuda
Copy link
Member Author

CI green. Ready to review/merge.

@opencontainers/runc-maintainers

tests/rootless.sh Outdated Show resolved Hide resolved
@AkihiroSuda AkihiroSuda force-pushed the rootless-systemd branch 2 times, most recently from 41faeb4 to 7e37c19 Compare April 29, 2020 05:07
@AkihiroSuda AkihiroSuda force-pushed the rootless-systemd branch 2 times, most recently from fbc0cc7 to a251537 Compare April 29, 2020 07:25
@h-vetinari h-vetinari mentioned this pull request Apr 29, 2020
@AkihiroSuda
Copy link
Member Author

@kolyshkin @mrunalp PTAL

@AkihiroSuda
Copy link
Member Author

rebased

@AkihiroSuda
Copy link
Member Author

@kolyshkin LGTY?

@kolyshkin
Copy link
Contributor

kolyshkin commented May 8, 2020

Left a few nits, but can be addressed as a followup

LGTM

Approved with PullApprove

@AkihiroSuda AkihiroSuda force-pushed the rootless-systemd branch 2 times, most recently from 5552602 to e8389d3 Compare May 8, 2020 03:10
@AkihiroSuda
Copy link
Member Author

addressed comments

@kolyshkin
Copy link
Contributor

still LGTM

@kolyshkin
Copy link
Contributor

kolyshkin commented May 8, 2020

LGTM (for pullapprove)

Approved with PullApprove

Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 .

$ podman --cgroup-manager=systemd run -it --rm --runtime=runc \
  --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine
/ # cat /proc/self/cgroup
0::/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max
44040192
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max
42000 100000
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max
42

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda
Copy link
Member Author

rebased

@mrunalp
Copy link
Contributor

mrunalp commented May 8, 2020

LGTM

Approved with PullApprove

1 similar comment
@kolyshkin
Copy link
Contributor

kolyshkin commented May 8, 2020

LGTM

Approved with PullApprove

@kolyshkin kolyshkin merged commit 2b31437 into opencontainers:master May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroup2: does not work with rootless podman
7 participants