Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI-O CI broken due to SELinux AVC Denials with latest runc (main branch) build #274

Closed
sohankunkerkar opened this issue Sep 29, 2023 · 20 comments · Fixed by #280
Closed

Comments

@sohankunkerkar
Copy link
Member

Problem Statement:

For the past few days, the CRI-O CI has been encountering severe issues on both Fedora 36 and RHEL 9.2. Upon investigation, SELinux AVC denials have been identified as the root cause.

----
type=AVC msg=audit(09/29/2023 18:15:50.768:23214) : avc:  denied  { map } for  pid=246169 comm=4 path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=420 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=1 
----
type=AVC msg=audit(09/29/2023 18:15:50.770:23215) : avc:  denied  { read } for  pid=246169 comm=4 path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=420 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=1 
----
type=AVC msg=audit(09/29/2023 18:15:50.770:23216) : avc:  denied  { execute } for  pid=246169 comm=4 path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=420 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=1 
----
type=AVC msg=audit(09/29/2023 18:16:15.850:23337) : avc:  denied  { entrypoint } for  pid=247148 comm=runc:[2:INIT] path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=421 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=1 
----

Setting SELinux to permissive mode resolves the issue.

The following command was used to identify the necessary permissions.

$ sudo grep AVC /var/log/audit/audit.log | audit2allow


#============= container_t ==============
allow container_t container_runtime_tmpfs_t:file { entrypoint execute read };

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow container_t container_runtime_tmpfs_t:file map;
allow container_t proc_t:filesystem associate;

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#	mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#	Possible cause is the source level (s0) and target level (s0:c31,c1005) are different.
allow container_t self:file read;

The problem appears to be related to a recent update in the main branch of runc, specifically in the pull request: opencontainers/runc#3987. According to the runc maintainer, an intermediate binary was added which requires additional permissions to fix this issue.

Expected Behavior:

CRI-O CI tests should run successfully without encountering SELinux AVC denials.

Additional Information:

Operating System: Fedora 36, RHEL 9.2

@kolyshkin
Copy link

Alas, I was unable to repro this with a separate runc binary and runc integration tests.

@rhatdan
Copy link
Member

rhatdan commented Oct 2, 2023

Is this still happening? Does runc write out a binary file and then execute it?

@rhatdan
Copy link
Member

rhatdan commented Oct 2, 2023

This looks like runc creates the binary, and then sets the SELinux label on the container at which point it starts executing the binary. Can we set the SELinux label later, only when it executes PID1 process?

@kolyshkin
Copy link

@rhatdan this is an intermediate init binary (which does nothing else except executing the container init process); the binary is created in container's state directory (/run/runc by default). This directory is not being labeled by runc.

Alas, I can't repro this without cri-o.

@kolyshkin
Copy link

I have a repro now: opencontainers/runc#4053.

I also have a workaround (disable dmz if selinux is in enforcing mode and process.selinuxLabel is set in config), but I am not a huge fan of doing things that way.

@kolyshkin
Copy link

Filed runc issue: opencontainers/runc#4057 which for some reason shows slightly different avc denials than this issue.

@cyphar
Copy link

cyphar commented Oct 8, 2023

I suggested setting the security.selinux xattr of the memfd to the container's label, but you get -EACCES from setxattr. I guess it's possible this would not solve the issue even if you could set the label, but it seems like it should.

@rhatdan Is this something that should be possible or something we can enable, or is there something I'm doing wrong? Since we set the process label to the container label with the same privileges, it seems strange to be unable to set a file to have the same label. The only other alternative is to not use runc-dmz on SELinux-enforcing systems, but I'd prefer not to do that if possible.

@rhatdan
Copy link
Member

rhatdan commented Oct 9, 2023

If you call label.SetFSCreateLabel(mount_label) before creating the object, then the container_t should be able to use it.

@kolyshkin
Copy link

If you call label.SetFSCreateLabel(mount_label) before creating the object, then the container_t should be able to use it.

For some reason it doesn't work. Maybe it's not working with memfd_create?

(will provide logs a bit later)

@kolyshkin
Copy link

I checked the logs and they are the same with and without the patch that adds SetFSCreateLabel (except for obvious differences in pids, inode etc).

Here's the log from a run on CentOS 9, before the proposed fix:

runc run tst (status=1):
writing sync procError: write sync: file already closed
execveat: permission denied
----
type=PROCTITLE msg=audit(10/10/2023 00:23:28.967:10933) : proctitle=/tmp/bats-run-6AjJKB/runc.9vKatN/bundle/runc init
type=SYSCALL msg=audit(10/10/2023 00:23:28.967:10933) : arch=x86_64 syscall=execveat success=no exit=EACCES(Permission denied) a0=0x6 a1=0xc0000c511a a2=0xc0000575c0 a3=0xc000024520 items=0 ppid=105366 pid=105377 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=8 comm=runc:[2:INIT] exe=/tmp/bats-run-6AjJKB/runc.9vKatN/bundle/runc subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 key=(null)
type=AVC msg=audit(10/10/2023 00:23:28.967:10933) : avc:  denied  { entrypoint } for  pid=105377 comm=runc:[2:INIT] path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=271 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0

Here is one after the fix (i.e. with SetFSCreateLabel added):

runc run tst (status=1):
writing sync procError: write sync: file already closed
execveat: permission denied
----
type=PROCTITLE msg=audit(10/09/2023 23:44:03.603:10929) : proctitle=/tmp/bats-run-FSYIY5/runc.HCqujT/bundle/runc init
type=SYSCALL msg=audit(10/09/2023 23:44:03.603:10929) : arch=x86_64 syscall=execveat success=no exit=EACCES(Permission denied) a0=0x6 a1=0xc00012b0fa a2=0xc0001146a0 a3=0xc000024660 items=0 ppid=105403 pid=105414 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=8 comm=runc:[2:INIT] exe=/tmp/bats-run-FSYIY5/runc.HCqujT/bundle/runc subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 key=(null)
type=AVC msg=audit(10/09/2023 23:44:03.603:10929) : avc:  denied  { entrypoint } for  pid=105414 comm=runc:[2:INIT] path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=3341 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0

I assume that it just doesn't work for memfd_create.

@kolyshkin
Copy link

Indeed, it appears that the fd created by memfd_create does not take fscreatecon into account. I added a check in the code to see if the memfd label is as expected, and it is not:

time="2023-10-10T07:10:57Z" level=error msg="runc run failed: unable to create new parent process: runc-dmz file label mismatch: want \"system_u:system_r:container_t:s0:c4,c5\", got \"unconfined_u:object_r:container_runtime_tmpfs_t:s0\""

Here's the code that does it: opencontainers/runc@650a51b

@rhatdan
Copy link
Member

rhatdan commented Oct 10, 2023

That quite possibly is a kernel issue. I will ping the SELinux kernel developers on this. For now we can add allow rules to container-selinux policy, and then over time remove the rules, once we have a fixed kernel.

@rhatdan
Copy link
Member

rhatdan commented Oct 10, 2023

@rhatdan
Copy link
Member

rhatdan commented Oct 10, 2023

@kolyshkin Could you allow the entrpoint rule and see what AVC's get created?

#279

@kolyshkin
Copy link

@kolyshkin Could you allow the entrpoint rule and see what AVC's get created?

With the entrypoint allowed (in opencontainers/runc@48336b6), I get this on CentOS 9:

type=PROCTITLE msg=audit(10/10/2023 19:38:49.181:10952) : proctitle=(null)
type=SYSCALL msg=audit(10/10/2023 19:38:49.181:10952) : arch=x86_64 syscall=execveat success=no exit=EACCES(Permission denied) a0=0x6 a1=0xc00014f11a a2=0xc0000575c0 a3=0xc000024520 items=0 ppid=105361 pid=105372 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=8 comm=6 exe=/memfd:runc_cloned:runc-dmz (deleted) subj=system_u:system_r:container_t:s0:c4,c5 key=(null)
type=AVC msg=audit(10/10/2023 19:38:49.181:10952) : avc:  denied  { map } for  pid=105372 comm=6 path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=3312 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0

It's similar for CentOS 8 and Fedora 38.

On CentOS 7 I see:

# type=PROCTITLE msg=audit(10/10/2023 19:37:00.763:712) : proctitle=(null)
# type=SYSCALL msg=audit(10/10/2023 19:37:00.763:712) : arch=x86_64 syscall=execve success=no exit=EACCES(Permission denied) a0=0xc0000b9730 a1=0xc0000a28d0 a2=0xc0000247e0 a3=0x0 items=0 ppid=550 pid=563 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=7 comm=6 exe=/memfd:runc_cloned:runc-dmz (deleted) subj=system_u:system_r:container_t:s0:c4,c5 key=(null)
# type=AVC msg=audit(10/10/2023 19:37:00.763:712) : avc:  denied  { read execute } for  pid=563 comm=6 path=/memfd:runc_cloned:runc-dmz (deleted) dev="tmpfs" ino=146083 scontext=system_u:system_r:container_t:s0:c4,c5 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0

@kolyshkin
Copy link

With these rules

allow container_t container_runtime_tmpfs_t:file entrypoint;
allow container_t container_runtime_tmpfs_t:file map;
allow container_t container_runtime_tmpfs_t:file { execute read };

it works on all distros we test with selinux (centos 7,8,9 and latest fedora).

I am going to also test it with cri-o ci (where the problem was initially discovered), as I'm not sure my tests cover all the bases.

@kolyshkin
Copy link

I am going to also test it with cri-o ci (where the problem was initially discovered), as I'm not sure my tests cover all the bases.

Tested; looks OK (cri-o/cri-o#7359)

@cyphar
Copy link

cyphar commented Oct 11, 2023

I would prefer if we allowed the label to be changed by the runtime (setting the security.selinux xattr to the container's label), rather than allowing the container label to execute stuff outside of its label -- unless that is less safe than just allowing execution (each container gets its own label, I don't know if you can just allow container_runtime_t -> container_t relabelto operations).

We can set fscreatecon as well, so kernels with this behaviour fixed don't need us to change the label.

@rhatdan
Copy link
Member

rhatdan commented Oct 13, 2023

It is not a huge loosening of security to allow containers to read/execute content created in a tmpfs by the container runtimes. Although I would like to remove it at some point.

@kolyshkin
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants