-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160
Comments
I have the same problem. Not only with the image registry! Creating more than one new file on a cephfs volume leads to permission denied errors. Waiting more than a minute between the requests seems to help (but is obviously no solution).
No problems with rdb block volumes. No hints in logs or on the ceph status dashboard. |
Same problem here. okd 4.9 worked, okd 4.10 doesn't: Version Details: |
Reset the bugzilla as a Fedora/FCOS and Ceph issue rather than an image registry issue |
I had some time tonight to look into this a bit further. I first tried to reproduce the issue from a bare CoreOS VM. Since my CephFS can be mounted from outside of my OKD cluster, I was able to mount my image registry's CephFS volume and do some tests. I could not reproduce the issue from a bare FCOS VM. I tried experimentally creating single files one at a time, creating many thousands of files in a loop from the shell, etc. Nothing worked. Okay, that puts this firmly back into the realm of OKD. The next step I took was to start up a standalone pod with the registry cephfs mounted into it and try to reproduce the issue there. I was careful to ensure the security context in my standalone pod matched the image registry pod, just to be extra sure. I could not reproduce the issue from a standalone pod. As a last resort, I swapped my image registry pod back to using the CephFS mount, and the issue was immediately reproducible. While poking around at that, I found something rather interesting. Check this out: A broken directory, as seen from inside the image registry pod:
That same directory, as seen from my standalone debug pod:
Note that from the standalone pod, the SELinux context is absolutely fine! I can touch the file, edit it as usual, everything works. However, you know what's really interesting? If I use the standalone pod to delete and re-create the I am unsure where to go from here. SELinux contexts are stored as xattrs, so perhaps the Ceph MDS is messing up? But in that case, I would have seen this regardless of OKD version 'cause I haven't upgraded my version of Ceph in a bit. This very clearly started with OKD 4.10, though. So perhaps the image registry is doing something new, and that new something is interacting badly with Ceph? |
From the perspective of the host, here is what a failing
Why it has a proper label outside of the pod but question marks inside, I don't know. I'm also wondering if an xattr of all zeroes corresponds to Another note for debugging and/or theorycrafting: my debug pod and my registry pod are two different hosts. |
@SriRamanujam Take a look at this: coreos/fedora-coreos-tracker#1167 I can recreate without using the image registry... What I find odd is that I am running the exact same container image and the exact same cephfs code. The only difference is OKD 4.10 and FCOS 35. Unfortunately, I don't have an external Ceph cluster :( |
I saw that issue, and in fact I was all ready to write up reproduction steps for that issue until I couldn't reproduce it with a bare FCOS VM :(
What were your reproduction steps? I was doing
Same :(
If you are using Rook, you can set |
I wonder if it could be a crio/selinux issue on FCOS 35. Can you try mounting the CephFS mount in a container in the standalone FCOS node? |
Hi, |
Okay, so I have made a breakthrough of sorts. I am able to reproduce the issue from within my registry container. The important thing is that I have to make a new directory beforehand, then When I do this, I am able to see that exactly one of the 1000 files gets a proper context, and the rest are question marks. This exactly matches the behavior I see in the registry-managed folders, where I am still unable to reproduce from outside of the container, neither from my laptop nor from a bare FCOS install (both running Fedora 35). |
This is from my workstation. I have mounted the registry cephfs mount to my workstation, using the same options and such as the CSI does. When I touch a bunch of files from this mount, the file contexts are different from what I expect. Unmounting and re-mounting the cephfs mount magically makes the file contexts line up with the expected value. [email protected]@hapes /tmp/cephfs/docker/registry/v2/test/testing2
❯ for i in $(seq 1 1000); do echo "testing123" > test$i.txt; done
[email protected]@hapes /tmp/cephfs/docker/registry/v2/test/testing2
❯ ls -lashZ | head
total 500K
0 drwxr-xr-x. 2 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 1000 Apr 21 22:05 .
0 drwxrwxrwx. 3 root root unconfined_u:object_r:container_file_t:s0:c12,c18 1 Apr 21 22:05 ..
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:05 test1000.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test100.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test101.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test102.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test103.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test104.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] system_u:object_r:unlabeled_t:s0 11 Apr 21 22:11 test105.txt
[email protected]@hapes /tmp/cephfs/docker/registry/v2/test/testing2
❯ cd /tmp
[email protected]@hapes /tmp
❯ sudo umount cephfs/
[email protected]@hapes /tmp
❯ sudo mount -t ceph "$(oc -n rook-ceph get configmap/rook-ceph-mon-endpoints -o jsonpath={.data.csi-cluster-config-json} | jq -r '.[0].monitors | @csv' | sed 's/"//g')":/volumes/csi/csi-vol-4decf389-d18c-11eb-94dd-0a580a81040a/fb85b670-c8c0-44b2-98db-1c093796145e -o name="$(oc -n rook-ceph get secret/rook-csi-cephfs-node -o jsonpath={.data.adminID} | base64 -d)",secret="$(oc -n rook-ceph get secret/rook-csi-cephfs-node -o jsonpath={.data.adminKey} | base64 -d)",mds_namespace=library-cephfs ./cephfs
[email protected]@hapes /tmp
❯ cd /tmp/cephfs/docker/registry/v2/test/testing2
[email protected]@hapes /tmp/cephfs/docker/registry/v2/test/testing2
❯ ls -lashZ | head
total 500K
0 drwxr-xr-x. 2 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 1000 Apr 21 22:05 .
0 drwxrwxrwx. 3 root root unconfined_u:object_r:container_file_t:s0:c12,c18 1 Apr 21 22:05 ..
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:05 test1000.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test100.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test101.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test102.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test103.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test104.txt
512 -rw-r--r--. 1 [email protected] domain [email protected] unconfined_u:object_r:container_file_t:s0:c12,c18 11 Apr 21 22:11 test105.txt I think this could go some way towards explaining why I am unable to reproduce the denials outside of the container. The security context |
Okay, I figured it out. This commit was merged into the kernel during the 5.16 release cycle, changing the ceph driver's default to be asynchronous dirops. This asynchronicity seems to be the root cause of the missing/incorrect labels. Setting the mount option However, getting your StorageClass to actually use the I also think this might be worth reporting as a bug against the kernel itself, since this, at heart, is probably affecting propagation of extended attributes and selinux contexts for everyone, not just OKD+Ceph users. |
@SriRamanujam When installing ceph-csi-cephfs, there's an option 'kernelMountOptions' which allow to pass kernel mount options. When I set it with 'wsync' it appears to indeed remediate the issue! |
@SriRamanujam great investigation, would you mind reporting this to Fedora / upstream? Bonus points for making a KNOWN_ISSUE.md update PR. |
@SriRamanujam I can confirm that the issue can be solved by passing |
Great work on tracking down the problem @SriRamanujam I have delayed updating to 4.10 to avoid running into this issue but when I delete the cephfs StorageClass and add it back again with kernelMountOptions: wsync in the parameters stanza it is ignored. What am I doing wrong, I am trying to use the rook/ceph installed as part of the Openshift Data Foundation operator (formerly OpenShift Container Storage) if that makes a difference? |
@vrutkovs I have updated the existing bug ticket @fortinj66 filed with this information, will that be sufficient? @darren-oxford You have to delete and re-create your PV once you've added the option to your StorageClass. |
@SriRamanujam Unfortunately recreating the StorageClass as follows...
results in kernelMountOptions: wsync being ignored when creating the StorageClass so is missing from the StorageClass when it is re-created. |
@darren-oxford You might be missing
|
Thanks @SriRamanujam kind: StorageClass is there, just missed it of pasting it here. Tried it again and OpenShift Data Foundation operator is overwriting it. Thanks for confirming that I am doing it right though, seems this workaround may not work with ODF. I guess I should switch to standard rook Ceph as I have found ODF to be incredibly frustrating. |
@darren-oxford if your StorageClass is controlled/maintained by some operator, you need to find a way of configuring the StorageClass in your operator config. That would be you only option aside from not using the ODF operator at all. |
I can also confirm that the fix works on ceph-rook |
Hi, I've got the same problem with my 4.10 test cluster (baremetal UPI on KVM with rook-ceph). Here's my fix:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com # driver:namespace:operator
parameters:
### THIS is the new option
kernelMountOptions: wsync
###
clusterID: rook-ceph # namespace:cluster
fsName: myfs
pool: myfs-replicated
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
After this, everything is working like before 4.10, ... |
I'm not sure it works every time. Mount options are not modified for me. |
It works for me too, but it is a liitle bit hard to reconstruct each PV |
openshift/okd-machine-os#364 should have a new kernel with a fix, new nightly is on the way |
maybe it's a stupid question .. but what happens to the volumes on which we applied the previous fix? kernelMountOptions: wsync thank you |
AFIK, nothing as the problem was caused by a decision to change the default behaviour to wsync off, the fix has been to have it default on again. Your specifying the kernelMountOptions: wsync is essentially just superfluous and will not change the behaviour. |
perfect thank you very much |
Describe the bug
writing to image registry fails after upgrade from OKD 4.9 -> 4.10 using ceph-fs filesystems.
Version
OKD 4.10 VMWare IPI
How reproducible
100%
See #1153 for details
The text was updated successfully, but these errors were encountered: