Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iscsiadm blocked by SELinux from mounting OpenEBS PVs #1438

Closed
ceagan opened this issue Dec 12, 2022 · 21 comments
Closed

iscsiadm blocked by SELinux from mounting OpenEBS PVs #1438

ceagan opened this issue Dec 12, 2022 · 21 comments

Comments

@ceagan
Copy link

ceagan commented Dec 12, 2022

Describe the bug
During the upgrade from 4.11.0-0.okd-2022-11-19-050030 to 4.11.0-0.okd-2022-12-02-145640, we started having problems with OpenEBS PVs mounting. This blocked the upgrade from completing for us because it affected image-registry. We traced the problem down to SELinux blocking iscsiadm from performing dac_override. Disabling SELinux on the host node allowed the mount and upgrade to complete. We had to perform this on each node that had a PV, including those that were not related to the upgrade in order to mount all the OpenEBS PVs used by worker pods. We then re-enabled SELinux on each node.

Version
4.11.0-0.okd-2022-12-02-145640

How reproducible
Unknown

Log bundle
https://drive.google.com/file/d/1PgUlirAJMVFmbdim9QdMXq-HpEkHB-4i/view?usp=share_link

Relevant Logs

Dec 09 18:05:00.451300 okd-node-02.okd.example.com hyperkube[1888]: I1209 18:05:00.451136 1888 reconciler.go:254] "operationExecutor.MountVolume started for volume \"pvc-1ad97794-f713-453e-8044-3b6605abd75c\" (UniqueName: \"kubernetes.io/csi/cstor.csi.openebs.io^pvc-1ad97794-f713-453e-8044-3b6605abd75c\") pod \"example-fcos-moderate-infra-rs-76b58ff799-ntwg5\" (UID: \"bbca7b2b-f3ec-4ffb-9616-fb675357e935\") " pod="openshift-compliance/example-fcos-moderate-infra-rs-76b58ff799-ntwg5"
Dec 09 18:05:01.789000 okd-node-02.okd.example.com audit[216728]: AVC avc: denied { dac_override } for pid=216728 comm="iscsiadm" capability=1 scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0
Dec 09 18:05:01.791499 okd-node-02.okd.example.com kernel: audit: type=1400 audit(1670609101.789:6001): avc: denied { dac_override } for pid=216728 comm="iscsiadm" capability=1 scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0
Dec 09 18:05:01.791818 okd-node-02.okd.example.com kernel: audit: type=1300 audit(1670609101.789:6001): arch=c000003e syscall=83 success=no exit=-13 a0=559752146390 a1=1f8 a2=ffffffffffffff00 a3=0 items=0 ppid=216727 pid=216728 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)
Dec 09 18:05:01.792035 okd-node-02.okd.example.com kernel: audit: type=1327 audit(1670609101.789:6001): proctitle=2F7362696E2F697363736961646D002D6D00646973636F766572796462002D740073656E6474617267657473002D70003137322E33302E3136302E3232350033323630002D490064656661756C74002D2D646973636F766572
Dec 09 18:05:01.789000 okd-node-02.okd.example.com audit[216728]: SYSCALL arch=c000003e syscall=83 success=no exit=-13 a0=559752146390 a1=1f8 a2=ffffffffffffff00 a3=0 items=0 ppid=216727 pid=216728 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)
Dec 09 18:05:01.789000 okd-node-02.okd.example.com audit: PROCTITLE proctitle=2F7362696E2F697363736961646D002D6D00646973636F766572796462002D740073656E6474617267657473002D70003137322E33302E3136302E3232350033323630002D490064656661756C74002D2D646973636F766572
Dec 09 18:05:01.875494 okd-node-02.okd.example.com hyperkube[1888]: E1209 18:05:01.875357 1888 csi_attacher.go:344] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = failed to find device path: [], last error seen: failed to sendtargets to portal 172.30.160.225:3260, err: iscsiadm error: iscsiadm: No records found (exit status 21)
Dec 09 18:05:01.877426 okd-node-02.okd.example.com hyperkube[1888]: E1209 18:05:01.876036 1888 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/cstor.csi.openebs.io^pvc-1ad97794-f713-453e-8044-3b6605abd75c podName: nodeName:}" failed. No retries permitted until 2022-12-09 18:07:03.875972643 +0000 UTC m=+9091.515017073 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-1ad97794-f713-453e-8044-3b6605abd75c" (UniqueName: "kubernetes.io/csi/cstor.csi.openebs.io^pvc-1ad97794-f713-453e-8044-3b6605abd75c") pod "example-fcos-moderate-infra-rs-76b58ff799-ntwg5" (UID: "bbca7b2b-f3ec-4ffb-9616-fb675357e935") : rpc error: code = Internal desc = failed to find device path: [], last error seen: failed to sendtargets to portal 172.30.160.225:3260, err: iscsiadm error: iscsiadm: No records found (exit status 21)
@vrutkovs
Copy link
Member

Package diff:

Upgraded:
	
  aardvark-dns 1.2.0-6.fc36 -> 1.3.0-1.fc36
  amd-gpu-firmware 20221012-141.fc36 -> 20221109-144.fc36
  avahi-libs 0.8-15.fc36 -> 0.8-16.fc36
  bash 5.2.2-2.fc36 -> 5.2.9-2.fc36
  btrfs-progs 6.0-1.fc36 -> 6.0.2-1.fc36
  conmon 2:2.1.4-3.fc36 -> 2:2.1.5-1.fc36
  container-selinux 2:2.191.0-1.fc36 -> 2:2.193.0-1.fc36
  curl 7.82.0-9.fc36 -> 7.82.0-11.fc36
  gnutls 3.7.8-2.fc36 -> 3.7.8-3.fc36
  grub2-common 1:2.06-54.fc36 -> 1:2.06-57.fc36
  grub2-efi-x64 1:2.06-54.fc36 -> 1:2.06-57.fc36
  grub2-pc 1:2.06-54.fc36 -> 1:2.06-57.fc36
  grub2-pc-modules 1:2.06-54.fc36 -> 1:2.06-57.fc36
  grub2-tools 1:2.06-54.fc36 -> 1:2.06-57.fc36
  grub2-tools-minimal 1:2.06-54.fc36 -> 1:2.06-57.fc36
  intel-gpu-firmware 20221012-141.fc36 -> 20221109-144.fc36
  kernel 6.0.8-200.fc36 -> 6.0.10-200.fc36
  kernel-core 6.0.8-200.fc36 -> 6.0.10-200.fc36
  kernel-modules 6.0.8-200.fc36 -> 6.0.10-200.fc36
  krb5-libs 1.19.2-11.fc36 -> 1.19.2-12.fc36
  libatomic 12.2.1-2.fc36 -> 12.2.1-4.fc36
  libcurl 7.82.0-9.fc36 -> 7.82.0-11.fc36
  libgcc 12.2.1-2.fc36 -> 12.2.1-4.fc36
  libgomp 12.2.1-2.fc36 -> 12.2.1-4.fc36
  libnghttp2 1.46.0-2.fc36 -> 1.51.0-1.fc36
  libsmbclient 2:4.16.6-0.fc36 -> 2:4.16.7-0.fc36
  libstdc++ 12.2.1-2.fc36 -> 12.2.1-4.fc36
  libwbclient 2:4.16.6-0.fc36 -> 2:4.16.7-0.fc36
  libxcrypt 4.4.30-1.fc36 -> 4.4.33-1.fc36
  linux-firmware 20221012-141.fc36 -> 20221109-144.fc36
  linux-firmware-whence 20221012-141.fc36 -> 20221109-144.fc36
  netavark 1.2.0-5.fc36 -> 1.3.0-1.fc36
  nvidia-gpu-firmware 20221012-141.fc36 -> 20221109-144.fc36
  podman 4:4.3.0-2.fc36 -> 4:4.3.1-1.fc36
  podman-plugins 4:4.3.0-2.fc36 -> 4:4.3.1-1.fc36
  python-pip-wheel 21.3.1-3.fc36 -> 21.3.1-4.fc36
  python-setuptools-wheel 59.6.0-2.fc36 -> 59.6.0-3.fc36
  python3-libs 3.10.8-1.fc36 -> 3.10.8-3.fc36
  rpm-ostree 2022.15-3.fc36 -> 2022.16-1.fc36
  rpm-ostree-libs 2022.15-3.fc36 -> 2022.16-1.fc36
  samba-client-libs 2:4.16.6-0.fc36 -> 2:4.16.7-0.fc36
  samba-common 2:4.16.6-0.fc36 -> 2:4.16.7-0.fc36
  samba-common-libs 2:4.16.6-0.fc36 -> 2:4.16.7-0.fc36
  vim-data 2:9.0.828-1.fc36 -> 2:9.0.963-1.fc36
  vim-minimal 2:9.0.828-1.fc36 -> 2:9.0.963-1.fc36

Most likely its either

  container-selinux 2:2.191.0-1.fc36 -> 2:2.193.0-1.fc36

or

  rpm-ostree 2022.15-3.fc36 -> 2022.16-1.fc36

@cgwalters could you check if that is not an rpm-ostree regression?

@ArthurVardevanyan
Copy link

Similar behavior using Longhorn: longhorn/longhorn#4988

@AlexanderWurz
Copy link

Similar issue when using istio: istio/istio#42485 - for some reason SELinux now behaves differently

@netwarex
Copy link

Is 4.11.0-0.okd-2023-01-14-152430 fixes that?

@vrutkovs
Copy link
Member

Workaround from longhorn bug: longhorn/longhorn#4988 (comment) (apparently its applicable for iscsi too).

Not sure if its due to app not requesting dac_override or a genuine Fedora bug - lets report it to for container-selinux package in Fedora?

@sfritze
Copy link

sfritze commented Jan 19, 2023

We experience the same issue on 4.11.0-0.okd-2022-12-02-145640 using Netapp Trident v22.10 as storage backend.
Event Message from a Pod trying to use an iSCSI backed PVC shows "iSCSI Login failed".

What i don't understand: the security context for a working directory in /var/lib/iscsi/nodes is the same as the not working directory.
The Filesytem looks like this on the node:

sudo ls -al -Z /var/lib/iscsi/nodes/
total 4
drwxr-xr-x. 6 root root system_u:object_r:iscsi_var_lib_t:s0 4096 Jan 18 15:28 .
drwxr-xr-x. 8 root root system_u:object_r:iscsi_var_lib_t:s0   90 Nov 14 14:07 ..
drw-------. 6 root root system_u:object_r:iscsi_var_lib_t:s0  130 Jan 18 15:38 iqn.1992-08.com.netapp:sn.6f75e51c7a2411ed9b05d039ea43322c:vs.73
drw-------. 2 root root system_u:object_r:iscsi_var_lib_t:s0    6 Jan 18 15:24 iqn.1992-08.com.netapp:sn.70b0eecd967c11ed9b05d039ea43322c:vs.74
drw-------. 2 root root system_u:object_r:iscsi_var_lib_t:s0    6 Jan 18 15:12 iqn.1992-08.com.netapp:sn.c978523d674c11ed9b05d039ea43322c:vs.71

The working directory is ending with vs.73 and was manually created via

iscsiadm -m discoverydb -t st -p 10.32.148.206:3260 -I default -D

After creating the directory via the mentioned command, everything works fine.
Error messages regarding iscsiadm form selinux:

[ 1452.470079] audit: type=1400 audit(1674055739.214:3822): avc:  denied  { dac_override } for  pid=44518 comm="iscsiadm" capability=1  scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0
[ 1452.470082] audit: type=1300 audit(1674055739.214:3822): arch=c000003e syscall=83 success=no exit=-13 a0=55c4daf2b400 a1=1f8 a2=ffffffffffffff00 a3=0 items=0 ppid=3030 pid=44518 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)

@ArthurVardevanyan
Copy link

ArthurVardevanyan commented Jan 21, 2023

Here is more testing information, the 4.12 CI Branch was working up until the release was cut for 4-stable:

I didn't test: CI: 4.12.0-0.okd-2023-01-20-161603,
but it looks like the same build as: Stable 4.12.0-0.okd-2023-01-21-055900

REF: https://amd64.origin.releases.ci.openshift.org

@vrutkovs
Copy link
Member

vrutkovs commented Jan 21, 2023

Thanks! Right before the release we switched from FCOS next-devel as a base to FCOS stable (see
4.12.0-0.okd-2023-01-21-055900 -> 4.12.0-0.okd-2023-01-20-101927 changelog). Most likely its container-selinux 2:2.193.0-1.fc37.noarch → 2:2.198.0-1.fc37.noarch.
That means the fix should be coming in the next FCOS stable bump.

Also, in 4.12 you can now create your own OS image and include FCOS testing fixes sooner

@AlexanderWurz
Copy link

but will there be a fix that does not need to put SELinux to permissive with OKD 4.11? or will this only be tackled in 4.12?

@vrutkovs
Copy link
Member

I can build another machine-os-content for OKD 4.11, but we can't push it to stable channel anymore

@sfritze
Copy link

sfritze commented Jan 24, 2023

but will there be a fix that does not need to put SELinux to permissive with OKD 4.11? or will this only be tackled in 4.12?

This may only help for external iSCSI targets but if you know the portal ip you can do a discovery on all relevant nodes via:
iscsiadm -m discoverydb -t st -p <portl-ip>:3260 -I default -D
This creates the folder correctly and you do not need to set selinux to permissive.

@AlexanderWurz
Copy link

I can build another machine-os-content for OKD 4.11, but we can't push it to stable channel anymore

Thanks, in that case we will take a 4.12 release in stable channel then once it is out - we tested the first 4.12 stable release which still has the SELinux issue, so I guess it will be solved in one of the other upcoming ones.

@ceagan
Copy link
Author

ceagan commented Feb 12, 2023

This issue is still present in Fedora Core 37.20230110.3.1 for us, which is packaged with OKD 4.12.0-0.okd-2023-02-04-212953.

@netwarex
Copy link

netwarex commented Feb 22, 2023

For a temporary fix, I have wrote an article (this is a fix specially for iscsiadm, where dac_override is not enabled), however with small change it can be used to fix other permissions without disabling SELinux:

https://ioflair.com/blog/fix-longhorn-volumes-stuck-in-attach-detach-loop-on-openshift-okd/

@vrutkovs
Copy link
Member

Merged @netwarex's fix (openshift/okd-machine-os#541), should be available in the next 4.12 release

@netwarex
Copy link

netwarex commented Mar 1, 2023

@vrutkovs this won't fix in 4.11, or no more 4.11 OKD coming?

@vrutkovs
Copy link
Member

vrutkovs commented Mar 1, 2023

No more 4.11 stables coming (nightlies would still be released of course). I don't mind cherry-picking it to 4.11 but we'd need a confirmation its fixed in 4.12 first

@vrutkovs
Copy link
Member

vrutkovs commented Mar 6, 2023

Fix available in amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.12.0-0.okd-2023-03-05-022504

Keeping this open to confirm its fixed before cherrypicking to 4.11 nightlies

@AlexanderWurz
Copy link

This fix may only solve the volumes, not the network related issues when using istio service mesh, as indicated here istio/istio#42485

unfortunately we still cannot test 4.12 as we still need to migrate our apis for kubernetes 1.25.

@vrutkovs
Copy link
Member

Reopened #1450 to track istio exception, lets continue there

@JaimeMagiera
Copy link
Contributor

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement
https://okd.io/blog/2024/07/30/okd-pre-release-testing

Please test with the OKD SCOS nightlies and file a new issue as needed.

Many thanks,

Jaime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants