Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

lifubang · 2023-10-02T09:25:22Z

If we create a container with a shared or host PID namespace, after the init process has dead, the container's state becomes Stopped, but after we have merged #3825, we can't send any signal to a stopped container, because we removed the logic to check this condition in f8ad20f . We should find them back.

Fix #4047
This also fixes #4040 .

fuweid · 2023-10-02T12:16:30Z

libcontainer/container_linux.go

@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error {
 	// To avoid a PID reuse attack, don't kill non-running container.
 	switch status {
 	case Running, Created, Paused:
+	case Stopped:


Even if the pid namespace is private and all the processes should receive the kill signal for stopped, I think we can kill if cgroup exists.

If the pid namespace is private, all processes in this pid ns will be killed by kernel automatically after the init process dead.

Yes. For the logic, we can simplify it with checking the cgroup only.

I don’t think so, with private pid ns and without systemd driver, the cgroup path is always exit even if there is no process, so this will cause an unnecessary function call.

lifubang · 2023-10-02T16:15:07Z

libcontainer/state_linux.go

+		if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+			return err
+		}
+	}
 	err := c.cgroupManager.Destroy()


Furthermore, I think if we can’t destroy the cgroup, we should not go to the next step. It will cause the state directory has been deleted but processes in the old container are still running, we can’t kill them from runc anymore. WDYT @kolyshkin

This error has been ignored silently.

kolyshkin · 2023-10-02T22:02:09Z

libcontainer/state_linux.go

+	if !c.config.Namespaces.IsPrivate(configs.NEWPID) && c.cgroupManager.Exists() {
+		if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+			return err
+		}
+	}


This is wrong -- it should only be done if runc delete --force is used. So, I guess, this logic should be in delete.go.

I think this is right, we can go here because the container has been in ‘Stopped’ state, delete without ’-f’ should work will be better.

I'm a bit unsure if runc delete should really kill any processes. That is such a mess 😩

OTOH runtime-spec says

Attempting to delete a container that is not stopped MUST have no effect on the container and MUST generate an error. Deleting a container MUST delete the resources that were created during the create step. Note that resources associated with the container, but not created by this container, MUST NOT be deleted.

and we treat a running container without init process as stopped (which btw may be wrong, too). If that is so, should we treat those leftover processes as "resources that were created during the create step"? Hmm.

we treat a running container without init process as stopped (which btw may be wrong, too).

What if we treat such a container as running? See
#4040 (comment)

kolyshkin · 2023-10-02T23:49:21Z

libcontainer/container_linux.go

@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error {
 	// To avoid a PID reuse attack, don't kill non-running container.
 	switch status {
 	case Running, Created, Paused:
+	case Stopped:
+		if c.config.Namespaces.IsPrivate(configs.NEWPID) || !c.cgroupManager.Exists() {


The problem here is we now have two different places in which we check IsPrivate. Also note that these two checks are slightly different (one also checks for cgroup existence, the other checks if s is SIGKILL. This is probably wrong.

Another (minor) problem is with commit 9583b3d is it now calls Thaw even when we use signalAllProcesses, which is unnecessary.

the other checks if s is SIGKILL. This is probably wrong.

This is because you we remove ’-a’ option, so we also need to check SIGKILL here.

What I meant it, there should be only one single check, deciding if we should use the usual logic (kill init) or the special case logic (kill all processes).

Something like

if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { return fmt.Errorf("unable to kill all processes: %w", err) } return nil } // ... normal kill code goes here

plus

--- a/libcontainer/init_linux.go +++ b/libcontainer/init_linux.go @@ -633,6 +633,9 @@ func setupRlimits(limits []configs.Rlimit, pid int) error { // signalAllProcesses freezes then iterates over all the processes inside the // manager's cgroups sending the signal s to them. func signalAllProcesses(m cgroups.Manager, s unix.Signal) error { + if !m.Exists() { + return ErrNotRunning + } // Use cgroup.kill, if available. if s == unix.SIGKILL { if p := m.Path(""); p != "" { // Either cgroup v2 or hybrid.

@kolyshkin I think maybe we should find back runc kill -a?
It is reasonable to remove -a with the signal KILL, but runc kill can pass other signals besides KILL, if we remove -a for runc-kill, it will have no way to send a signal to all the processes in the container.

Signed-off-by: lifubang <[email protected]>

lifubang · 2023-10-10T04:56:28Z

Although I want to find back runc kill -a, but maybe it's not easy to push it forward in opencontainers/runtime-spec#1234 .
But the bug in the main branch should be fixed, and some of these commits should be backported to release 1.1.

kolyshkin · 2023-11-04T01:56:08Z

libcontainer/container_linux.go

@@ -873,6 +873,26 @@ func (c *Container) newInitConfig(process *Process) *initConfig {
 func (c *Container) Destroy() error {
 	c.m.Lock()
 	defer c.m.Unlock()
+	if !c.config.Namespaces.IsPrivate(configs.NEWPID) {


I am very much against doing this. Kill should do all the killing, and Destroy should remove the files. This was all mixed up before, but I got it untangled in #3825 (alas, with a couple of regressions which you have reported in #4047).

The alternative to doing this, without mixing the kill and destroy again, is this: 7de61c4 (PTAL 🙏🏻)

But I think we can't force users to use runc kill before runc delete if the container is in stopped state.

Signed-off-by: lifubang <[email protected]>

lifubang · 2023-11-28T10:56:55Z

As #4102 has merged, close this one.

lifubang force-pushed the fix-NoPIDNsKill branch 2 times, most recently from f5e1a7d to e17f88c Compare October 2, 2023 09:35

lifubang added the regression label Oct 2, 2023

lifubang added this to the 1.2.0 milestone Oct 2, 2023

lifubang force-pushed the fix-NoPIDNsKill branch 3 times, most recently from ee898ab to 6b04d85 Compare October 2, 2023 10:04

lifubang requested review from kolyshkin and cyphar October 2, 2023 10:34

fuweid reviewed Oct 2, 2023

View reviewed changes

lifubang changed the title ~~Fix a regression when killing and deleting a container with shared(hosted) pid namespace~~ Fix a regression when killing and deleting a container with shared(host) pid namespace Oct 2, 2023

lifubang commented Oct 2, 2023

View reviewed changes

kolyshkin reviewed Oct 2, 2023

View reviewed changes

kolyshkin mentioned this pull request Oct 3, 2023

RFC: treat host pidns container with no init process as running if some processes exist in cgroup #4049

Closed

lifubang force-pushed the fix-NoPIDNsKill branch 2 times, most recently from 9b7b013 to 7c20ac3 Compare October 10, 2023 03:47

lifubang added 4 commits October 10, 2023 12:25

kill all processes in container without private pid ns

810c5f0

Signed-off-by: lifubang <[email protected]>

kill all processes in container without private pid ns before destory

120dad4

Signed-off-by: lifubang <[email protected]>

increase the retry times from 5 to 10 when removing cgroup paths

04fc3fe

Signed-off-by: lifubang <[email protected]>

never ignore cgroup destroy error for runc-delete

48ff07c

Signed-off-by: lifubang <[email protected]>

lifubang force-pushed the fix-NoPIDNsKill branch 2 times, most recently from 575bba9 to 48ff07c Compare October 10, 2023 04:53

kolyshkin mentioned this pull request Oct 31, 2023

Fix runc kill and runc delete for containers with no init and no private PID namespace #4102

Merged

kolyshkin reviewed Nov 4, 2023

View reviewed changes

add a testcase for runc delete the container with host pid ns

9225eaa

Signed-off-by: lifubang <[email protected]>

lifubang force-pushed the fix-NoPIDNsKill branch from cc9addd to 9225eaa Compare November 6, 2023 11:43

lifubang closed this Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

lifubang commented Oct 2, 2023 •

edited

Loading

fuweid Oct 2, 2023

lifubang Oct 2, 2023 •

edited

Loading

fuweid Oct 2, 2023

lifubang Oct 2, 2023

lifubang Oct 2, 2023 •

edited

Loading

lifubang Oct 2, 2023

kolyshkin Oct 2, 2023

lifubang Oct 2, 2023

kolyshkin Oct 3, 2023

kolyshkin Oct 3, 2023

kolyshkin Oct 2, 2023

kolyshkin Oct 2, 2023

lifubang Oct 3, 2023

kolyshkin Oct 3, 2023

lifubang Oct 5, 2023

lifubang commented Oct 10, 2023

kolyshkin Nov 4, 2023

lifubang Nov 5, 2023

lifubang commented Nov 28, 2023

Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

Conversation

lifubang commented Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

lifubang Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lifubang Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lifubang commented Oct 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lifubang commented Nov 28, 2023

lifubang commented Oct 2, 2023 •

edited

Loading

lifubang Oct 2, 2023 •

edited

Loading

lifubang Oct 2, 2023 •

edited

Loading