Skip to content

Commit

Permalink
libct: move killing logic to kill
Browse files Browse the repository at this point in the history
By default, the container has its own PID namespace, and killing (with
SIGKILL) its init process from the parent PID namespace also kills all
the other processes.

Obviously, it does not work that way when the container is sharing its
PID namespace with the host or another container, since init is no
longer special (it's not PID 1). In this case, killing container's init
will result in a bunch of other processes left running (and thus the
inability to remove the cgroup).

The solution to the above problem is killing all the container
processes, not just init.

The problem with the current implementation is, the killing logic is
implemented in libcontainer's initProcess.wait (rather than kill), and
therefore was only available to libcontainer users, but not the runc
kill command (which does not call wait as it's not a parent of
container's init).

To untangle this, the patch moves killing all processes from
initProcess.wait to initProcess.kill, and documents the new behavior.

In essence, this also makes `runc kill` to automatically kill all container
processes when the container does not have its own PID namespace.
Document that as well.

Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed May 12, 2023
1 parent 49edadd commit f1a50f1
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 14 deletions.
2 changes: 1 addition & 1 deletion libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ func (c *Container) start(process *Process) (retErr error) {
// Signal sends a specified signal to container's init, or, if all is true,
// true, to all container's processes (as determined by container's cgroup).
//
// Setting all=true is useful when s is SIGKILL and the container does not have
// Note all=true is implied when s is SIGKILL and the container does not have
// its own PID namespace. In this scenario, the libcontainer user may be required
// to implement a proper child reaper.
func (c *Container) Signal(s os.Signal, all bool) error {
Expand Down
10 changes: 6 additions & 4 deletions libcontainer/process_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -616,10 +616,6 @@ func (p *initProcess) start() (retErr error) {

func (p *initProcess) wait() (*os.ProcessState, error) {
err := p.cmd.Wait()
// we should kill all processes in cgroup when init is died if we use host PID namespace
if p.sharePidns {
_ = signalAllProcesses(p.manager, unix.SIGKILL)
}
return p.cmd.ProcessState, err
}

Expand Down Expand Up @@ -678,6 +674,12 @@ func (p *initProcess) signal(sig os.Signal) error {
if !ok {
return errors.New("os: unsupported signal type")
}
// When a container has its own pidns, init pid is 1, and killing
// init with SIGKILL will kill all processes in its PID namespace.
// Otherwise, we should kill all pids to avoid leftover processes.
if p.sharePidns && s == unix.SIGKILL {
return signalAllProcesses(p.manager, unix.SIGKILL)
}
return unix.Kill(p.pid(), s)
}

Expand Down
7 changes: 0 additions & 7 deletions libcontainer/state_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ import (

"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)

Expand Down Expand Up @@ -36,12 +35,6 @@ type containerState interface {
}

func destroy(c *Container) error {
if !c.config.Namespaces.Contains(configs.NEWPID) ||
c.config.Namespaces.PathOf(configs.NEWPID) != "" {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
logrus.Warn(err)
}
}
err := c.cgroupManager.Destroy()
if c.intelRdtManager != nil {
if ierr := c.intelRdtManager.Destroy(); err == nil {
Expand Down
4 changes: 2 additions & 2 deletions man/runc-kill.8.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ to list available signals.
# OPTIONS
**--all**|**-a**
: Send the signal to all processes inside the container, rather than
the container's init only. This option is useful when the container does not
have its own PID namespace.
the container's init only. This option is implied when the _signal_ is **KILL**
and the container does not have its own PID namespace.

: When this option is set, no error is returned if the container is stopped
or does not exist.
Expand Down

0 comments on commit f1a50f1

Please sign in to comment.