Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELinux denial when checkpointing a container #2334

Closed
adrianreber opened this issue Feb 14, 2019 · 14 comments
Closed

SELinux denial when checkpointing a container #2334

adrianreber opened this issue Feb 14, 2019 · 14 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@adrianreber
Copy link
Collaborator

When I try to checkpoint a container on Fedora 29 or RHEL8 with the latest available podman package I get the following SELinux denial:

# podman run -d docker.io/library/alpine:latest top
# podman --log-level debug container checkpoint -l
criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/overlay-containers/c84bb3f5307c63f2df5a441fa599b9517cde456bd5b0c9820e0e69c45da1c005/userdata/dump.log
failed to checkpoint container c84bb3f5307c63f2df5a441fa599b9517cde456bd5b0c9820e0e69c45da1c005: `/usr/bin/runc checkpoint --image-path /var/lib/containers/storage/overlay-containers/c84bb3f5307c63f2df5a441fa599b9517cde456bd5b0c9820e0e69c45da1c005/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/c84bb3f5307c63f2df5a441fa599b9517cde456bd5b0c9820e0e69c45da1c005/userdata c84bb3f5307c63f2df5a441fa599b9517cde456bd5b0c9820e0e69c45da1c005` failed: exit status 1
# ausearch -m AVC -m USER_AVC -ts recent | grep top
type=AVC msg=audit(1550142323.799:1167): avc:  denied  { connectto } for  pid=23569 comm="top" path=002F6372746F6F6C732D70722D3233363139 scontext=system_u:system_r:container_t:s0:c245,c463 tcontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

The same steps are running in CI every time successfully, so something is different during CI and when actually using it. The connectto is coming from CRIU's parasite code which is injected in the destination process trying to talk to the main CRIU binary.
To be able to checkpoint a container I need following two SELinux rules:

allow container_t container_runtime_t:unix_stream_socket connectto;
allow container_t container_var_lib_t:file append;

What is the right place or package to add these two rules.

@mheon mheon added bug kind/bug Categorizes issue or PR as related to a bug. labels Feb 14, 2019
@mheon
Copy link
Member

mheon commented Feb 14, 2019

@rhatdan PTAL

@rhatdan
Copy link
Member

rhatdan commented Feb 15, 2019

We removed this service, because it is a huge security hole. Allowing this access would allow a container process to gain access to the docker.sock and create a --privileged root running container.

The way to fix this is for the tool creating the socket to label it with the containers label or at least
container_t:s0

label.SetSockLabel($CONTAINERPROCESLABEL)
CreateSocket()

This would then require

allow container_t container_t:unix_stream_socket connectto;

Which is allowed in the current policy.

Is this socket being created in runc?

Now container_t appending to a file labeled container_var_lib_t, is another matter? Is this debugging code or something else?

@adrianreber
Copy link
Collaborator Author

We removed this service, because it is a huge security hole. Allowing this access would allow a container process to gain access to the docker.sock and create a --privileged root running container.

The way to fix this is for the tool creating the socket to label it with the containers label or at least
container_t:s0

label.SetSockLabel($CONTAINERPROCESLABEL)
CreateSocket()

This would then require

allow container_t container_t:unix_stream_socket connectto;

Which is allowed in the current policy.

Is this socket being created in runc?

It is created directly from CRIU. I started a discussion upstream CRIU to be able to use an existing socket or to tell CRIU which label the socket should have. Then we could tell runc which label it should have and runc can pass it to CRIU.

Now container_t appending to a file labeled container_var_lib_t, is another matter? Is this debugging code or something else?

This is for debug output which is currently always written. It is hard-coded in runc. If we could add a policy to allow this, this would be great.

@rhatdan
Copy link
Member

rhatdan commented Feb 17, 2019

What is the path to the file. I would rather we label it something like container_log_t so that we would not be able to append to any file in /var/lib/containers.

Does CRIU create the socket or runc? Could you explain the process for the socket to be created?

@adrianreber
Copy link
Collaborator Author

What is the path to the file. I would rather we label it something like container_log_t so that we would not be able to append to any file in /var/lib/containers.

The path to the file is: filepath.Join(c.bundlePath(), "dump.log")

So something like this could be used:

diff --git a/libpod/container_internal_linux.go b/libpod/container_internal_linux.go
index 65cb47c8..b98392fc 100644
--- a/libpod/container_internal_linux.go
+++ b/libpod/container_internal_linux.go
@@ -476,6 +476,18 @@ func (c *Container) checkpoint(ctx context.Context, options ContainerCheckpointO
        if c.state.State != ContainerStateRunning {
                return errors.Wrapf(ErrCtrStateInvalid, "%q is not running, cannot checkpoint", c.state.State)
        }
+
+       // Create the CRIU log file and label it
+       logFile, err := os.Create(filepath.Join(c.bundlePath(), "dump.log"))
+       if err != nil {
+               return errors.Wrapf(err, "failed to create CRIU log file %q", filepath.Join(c.bundlePath(), "dump.log"))
+       }
+       logFile.Close()
+       err = label.SetFileLabel(filepath.Join(c.bundlePath(), "dump.log"), the_correct_container_log_t_label)
+       if err != nil {
+               return errors.Wrapf(err, "failed to label CRIU log file %q", filepath.Join(c.bundlePath(), "dump.log"))
+       }
+
        if err := c.runtime.ociRuntime.checkpointContainer(c, options); err != nil {
                return err
        }

Not sure if that is the correct way to do it. But creating an empty log file with the right label should fix this. Then the processes in the container would need to be able to write that log file.

Does CRIU create the socket or runc? Could you explain the process for the socket to be created?

CRIU creates the socket in the destination network namespace https://github.com/checkpoint-restore/criu/blob/master/compel/src/lib/infect.c#L1030 , once CRIU infects the destination process with the parasite code, the parasite code tries to connect to that socket and that fails (avc: denied { connectto } for pid=23569 comm="top"). CCing @avagin in case this is not correct.

@rhatdan
Copy link
Member

rhatdan commented Feb 18, 2019

The code for libpod/container_internal/linux.go looks pretty good.

Is this code "infect.c" called by runc or podman?
If runc then we need to call setsockcreate(processlabel) before calling it

@adrianreber
Copy link
Collaborator Author

If I do err = label.SetFileLabel(filepath.Join(c.bundlePath(), "dump.log"), c.MountLabel()) writing to the log file works. So this seems solved.

For the socket I have to use setsockcreate(processlabel) in CRIU. If I do it in runc it seems to fail creating runc sockets. For the socket my plan is to add a CRIU feature to specify the SELinux label of the socket. So Podman calls runc checkpoint --criu-socket-selinux-label "processlabel" and runc then tells CRIU which SELinux label to use. This will require changes in Podman, runc and CRIU.

Is the c.ProcessLabel() the right label for the socket, or should I use something else. With that label it already just works.

@rhatdan
Copy link
Member

rhatdan commented Feb 19, 2019

Does runc fail in permissive mode?

@adrianreber
Copy link
Collaborator Author

If I add label.SetSocketLabel(processlabel) to runc just before calling CRIU and if I do setenforce 0 I get the following denials, but runc does not fail:

type=AVC msg=audit(1550644128.094:4176): avc:  denied  { create } for  pid=28779 comm="runc" scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1550644128.094:4177): avc:  denied  { getopt } for  pid=28779 comm="runc" scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1550644128.094:4178): avc:  denied  { getattr } for  pid=28779 comm="runc" scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1550644128.095:4179): avc:  denied  { write } for  pid=28779 comm="runc" path="socket:[1909154]" dev="sockfs" ino=1909154 scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1550644128.095:4180): avc:  denied  { read } for  pid=28779 comm="runc" scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1550644128.322:4186): avc:  denied  { shutdown } for  pid=28779 comm="runc" scontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c289,c751 tclass=unix_stream_socket permissive=1

But if I reset the socket label to default in runc with err = label.SetSocketLabel("") just after calling CRIU it works. So I actually can solve this with a simple runc patch.

I will prepare a Podman pull request for the CRIU log file labelling and a runc pull request for the CRIU socket labelling and CC you. Thanks for your help.

@rhatdan
Copy link
Member

rhatdan commented Feb 20, 2019

Excellent.

@adrianreber
Copy link
Collaborator Author

runc changes for CRIU socket labelling available at: opencontainers/runc#1992

Podman changes to fix CRIU log file labelling and to use new runc features available at: #2382

@adrianreber
Copy link
Collaborator Author

The log file denial is solved with #2382. For the connectto denial I need to work on a solution in CRIU.

adrianreber added a commit to adrianreber/libpod that referenced this issue Feb 27, 2019
There is currently still one SELinux related checkpoint/restore problem:
containers#2334

To avoid unnecessary CI failures the checkpoint/restore tests are
temporarily disabled on Fedora.

It is not necessary to disable the tests on Ubuntu as it is running
without SELinux and it is also not necessary to disable the RHEL 7 tests
as RHEL's CRIU is too old to run the checkpoint/restore tests at all.

Signed-off-by: Adrian Reber <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Mar 8, 2019

@adrianreber Is this complete, can we close this issue now?

@adrianreber
Copy link
Collaborator Author

@rhatdan with your changes to container-selinux and once my CRIU PR (checkpoint-restore/criu#648) gets merged, all should be resolved.

@rhatdan rhatdan closed this as completed Apr 13, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants