Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OKD 4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1167

Closed
fortinj66 opened this issue Apr 12, 2022 · 20 comments
Labels

Comments

@fortinj66
Copy link

Describe the bug

Writing to image registry fails after upgrade from OKD 4.9 -> 4.10 using ceph-fs filesystems. Ceph block FS is fine.

See
okd-project/okd#1160
okd-project/okd#1160 (comment)
okd-project/okd#1153

Reproduction steps
Steps to reproduce the behavior:

  1. Install OKD cluster
  2. Install Ceph and create a CephFS filesystem mounted in a Container for Image Registry
  3. run some sort of process which writes to the Registry (usually a build)
  4. See the following:
error: build error: Failed to push image: writing blob: initiating layer upload to /v2/dev-shop-micro/artifact-micro-addressbook-jdk11/blobs/uploads/ in 
image-registry.openshift-image-registry.svc:5000: received unexpected HTTP status: 500 Internal Server Error
time="2022-03-14T15:44:27.02764624Z" 
level=error 
msg="response completed with error" 
err.code=unknown 
err.detail="filesystem: open /registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/data: permission denied" 
err.message="unknown error" 
go.version=go1.17.5 
http.request.host="image-registry.openshift-image-registry.svc:5000" 
http.request.id=a3d79156-6c59-4355-b54f-1b30d409bba1 
http.request.method=POST 
http.request.remoteaddr="10.131.0.142:41420" 
http.request.uri=/v2/dev-shop-micro/artifact-micro-addressbook-jdk11/blobs/uploads/ 
http.request.useragent="containers/5.16.1 (github.com/containers/image)" 
http.response.contenttype="application/json; charset=utf-8" 
http.response.duration=21.812669ms 
http.response.status=500 
http.response.written=279 
openshift.auth.user="system:serviceaccount:dev-shop-micro:builder" 
vars.name=dev-shop-micro/artifact-micro-addressbook-jdk11
sh-4.4$ ls -al /registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/
ls: cannot access '/registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/data': Permission denied
total 1
drwxr-xr-x. 2 1000330000 root  2 Mar 14 15:44 .
drwxr-xr-x. 5 1000330000 root  3 Mar 14 16:32 ..
-?????????? ? ?          ?     ?            ? data
-rw-r--r--. 1 1000330000 root 20 Mar 14 15:44 startedat

Expected behavior
CephFS accepts writes properly

Actual behavior
See logs above

System details
OKD with FCOS 35

I'm hoping to be able to reproduce this in an easier manner soon. My initial thought is that this is some kind of selinux permissions issue.

@travier
Copy link
Member

travier commented Apr 12, 2022

@dustymabe
Copy link
Member

I'm hoping to be able to reproduce this in an easier manner soon. My initial thought is that this is some kind of selinux permissions issue.

That would be extremely useful. If we could reproduce it easily (i.e. single node without OKD) then we can bisect the history and find where it stopped working.

@fortinj66
Copy link
Author

fortinj66 commented Apr 13, 2022

Here is the error:

From pod with CephFS mounted at /var/www/html

sh-4.2# pwd
/var/www/html/test
sh-4.2# echo 5 > test5
sh-4.2# echo 6 > test6
sh: test6: Permission denied

From worker node:

[2064441.030758] audit: type=1400 audit(1649872616.967:79081): avc:  denied  { write open } for  pid=1460274 comm="sh" path="/var/www/html/test/test6" dev="ceph" ino=1099511738089 scontext=system_u:system_r:container_t:s0:c5,c26 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file permissive=0
[2064441.041769] audit: type=1327 audit(1649872616.967:79081): proctitle="sh"
[2064441.043655] audit: type=1400 audit(1649872616.967:79082): avc:  denied  { write } for  pid=1460274 comm="sh" name="test6" dev="ceph" ino=1099511738089 scontext=system_u:system_r:container_t:s0:c5,c26 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file permissive=0

What is odd is that the test5 line worked... If I wait 60 seconds I can create a new file

Unfortunately, FCOS doesn't seem to have the selinux managements tools like sealert

@fortinj66
Copy link
Author

Here are the file permissions on the container:

sh-4.2# ls -aZ
ls: cannot access test6: Permission denied
ls: cannot access test2: Permission denied
ls: cannot access test4: Permission denied
ls: cannot access test1: Permission denied
drwxr-xr-x. 1001 1001 system_u:object_r:container_file_t:s0:c5,c26 .
drwxrwxrwx. 1001 1001 system_u:object_r:container_file_t:s0:c5,c26 ..
-rw-r--r--. root root system_u:object_r:container_file_t:s0:c5,c26 .index.php
?---------  ?    ?                                     test1
?---------  ?    ?                                     test2
?---------  ?    ?                                     test4
-rw-r--r--. root root system_u:object_r:container_file_t:s0:c5,c26 test5
?---------  ?    ?                                     test6

@dustymabe
Copy link
Member

@fortinj66 any luck on getting a smaller reproducer?

@fortinj66
Copy link
Author

I have not yet, unfortunately but @SriRamanujam tried something and was unable to make it break... okd-project/okd#1160 (comment)

More research is needed...

I've stumbled across a new issue which may have higher priority :(

@depouill
Copy link

depouill commented Apr 20, 2022

Hi,
I have a similar problem: on various users PVCs, some files are unlabeled like "system_u:object_r:unlabeled_t:s0" and become unreadable for the pod. A "chcon" on the mount point on the node resolve temporarly the problem.
Not the same files when PVC is cross mounted on various nodes.
Restarting the pod may impact (files may have correct label or other files are impacted)
Whe are using rook with an external ceph cluster. Rook operator only manage csi provisonners (I updated to the latest version of csi-provisionner, but no effect).
It is reproducible when creating a lot of small files on the PVC.
We have recompiled selinux modules on the nodes (whit semodule -B).
Problem appeared after 4.10/FCOS35 on baremetals and VMs (with kvm)

@dustymabe
Copy link
Member

I'm wondering if this is somehow SELinux policy drift. See #701 (comment) for a description and workaround.

@dustymabe
Copy link
Member

I'm wondering if this is somehow SELinux policy drift. See #701 (comment) for a description and workaround.

Another question to ask (similar to the above) is "does this issue happen on a brand new cluster or just one that upgraded?".

@SriRamanujam
Copy link

SriRamanujam commented Apr 20, 2022

@dustymabe Over on the OKD issue, @schuemann has reported that this was reproducible on brand new and upgraded clusters: okd-project/okd#1160 (comment)

Let me try the workaround in the issue you linked and see how that goes.

@SriRamanujam
Copy link

Running sudo ostree admin config-diff | grep selinux/targeted/policy yielded no output on all my nodes. I also experimentally ran semodule -B on all my nodes and that didn't help either. I guess that means the policy linux as described in #701 might not be the issue?

@depouill
Copy link

I'm wondering if this is somehow SELinux policy drift. See #701 (comment) for a description and workaround.

We remove all custom policies on all nodes (finding more rules with sudo ostree admin config-diff | grep selinux). Problem still persists.

@dustymabe
Copy link
Member

Thanks for the info @SriRamanujam @depouill.

@SriRamanujam
Copy link

Figured it out - details in okd-project/okd#1160 (comment).

tl;dr the kernel changed a default from synchronous to asynchronous directory ops and that seems to be breaking selinux contexts in cephfs mounts across the board - probably xattrs in general if I had to guess, though I didn't explicitly test that. So nothing to do with FCOS or selinux specifically.

@dustymabe
Copy link
Member

Looks like we can call this closed when the 5.17.9 kernel lands in FCOS. See https://bugzilla.redhat.com/show_bug.cgi?id=2063929#c15

@dustymabe dustymabe added the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label May 18, 2022
@fortinj66
Copy link
Author

So not likely in a 5.16.x kernel that we would see in FCOS 35? So probably not in OKD 4.10...

@dustymabe
Copy link
Member

@fortinj66 - Fedora CoreOS is in the process of moving to Fedora 36 (next week's stable will complete the process). I'm not sure exactly what the OKD plans are so maybe ask there?

As an aside there is a kernel build of 5.17.9 for F36 and F35:

So you could pick up and use that if you're motivated.

@fortinj66
Copy link
Author

That’s not a bad idea…. And I can test if the kernel panic is fixed too…

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. and removed status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. labels May 19, 2022
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue May 19, 2022
@dustymabe
Copy link
Member

The fix for this went into testing stream release 36.20220522.2.1. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels May 25, 2022
@dustymabe
Copy link
Member

The fix for this went into stable stream release 36.20220522.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jun 7, 2022
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants