Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHCOS 8.6 failing on ext.config.rebuild-selinux-policy #1036

Closed
jlebon opened this issue Oct 25, 2022 · 17 comments · Fixed by coreos/rpm-ostree#4122
Closed

RHCOS 8.6 failing on ext.config.rebuild-selinux-policy #1036

jlebon opened this issue Oct 25, 2022 · 17 comments · Fixed by coreos/rpm-ostree#4122

Comments

@jlebon
Copy link
Member

jlebon commented Oct 25, 2022

Latest 8.6 composes are failing on:

=== RUN   ext.config.rebuild-selinux-policy
systemctl status kola-runext.service:
��� kola-runext.service
   Loaded: loaded (/etc/systemd/system/kola-runext.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2022-10-25 16:01:14 UTC; 1s ago
  Process: 2219 ExecStart=/usr/local/bin/kola-runext-test.sh (code=exited, status=1/FAILURE)
 Main PID: 2219 (code=exited, status=1/FAILURE)

Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ ID_LIKE='rhel fedora'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ VERSION=412.86.202210251535-0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ VERSION_ID=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ PLATFORM_ID=platform:el8
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ PRETTY_NAME='Red Hat Enterprise Linux CoreOS 412.86.202210251535-0 (Ootpa)'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ ANSI_COLOR='0;31'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ CPE_NAME=cpe:/o:redhat:enterprise_linux:8::coreos
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ HOME_URL=https://www.redhat.com/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ DOCUMENTATION_URL=https://docs.openshift.com/container-platform/4.12/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ BUG_REPORT_URL=https://access.redhat.com/labs/rhir/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_BUGZILLA_PRODUCT='OpenShift Container Platform'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_BUGZILLA_PRODUCT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_SUPPORT_PRODUCT='OpenShift Container Platform'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_SUPPORT_PRODUCT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ OPENSHIFT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ OSTREE_VERSION=412.86.202210251535-0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: ++ echo 8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + echo RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + service_should_start=0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + case "${RHEL_VERSION:-}" in
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + service_should_start=1
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + case "${AUTOPKGTEST_REBOOT_MARK:-}" in
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + grep -qFe 'Recompiling policy' logs.txt
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + cat logs.txt
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: -- Logs begin at Tue 2022-10-25 16:00:32 UTC, end at Tue 2022-10-25 16:01:14 UTC. --
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost systemd[1]: Starting RHEL CoreOS Rebuild SELinux Policy If Necessary...
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1481]: RHEL_VERSION=8.6Checking for policy recompilation
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1486]: -rw-r--r--. 1 root root 8912471 Oct 25 15:43 /etc/selinux/targeted/policy/policy.31
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1486]: -rw-r--r--. 2 root root 8912471 Jan  1  1970 /usr/etc/selinux/targeted/policy/policy.31
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1481]: Recompiling policy due to local modifications as workaround for https://bugzilla.redhat.com/2057497
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:59 localhost systemd[1]: Started RHEL CoreOS Rebuild SELinux Policy If Necessary.
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + fatal 'Recompiled policy on first boot'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + echo 'Recompiled policy on first boot'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: Recompiled policy on first boot
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + exit 1
Oct 25 16:01:14 qemu0 systemd[1]: kola-runext.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 16:01:14 qemu0 systemd[1]: kola-runext.service: Failed with result 'exit-code'.
--- FAIL: ext.config.rebuild-selinux-policy (53.85s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2022-10-25T16:01:15Z cli: Unit kola-runext.service exited with code 1
        harness.go:1093: kolet failed: : kolet run-test-unit failed: Process exited with status 1

I.e. it seems like we're recompiling the policy on first boot.

@jlebon
Copy link
Member Author

jlebon commented Oct 25, 2022

Digging into this, I think it's caused by libsemanage-2.9-9.el8_6, which has this patch: https://pkgs.devel.redhat.com/cgit/rpms/libsemanage/commit/?h=rhel-8.6.0&id=7b7f71ce7cdd6187b33b738bb6866a00f2149772.

Current theory is that ostree admin deploy at create_disk.sh time is recompiling the policy (via ostreedev/ostree#2569). Need to investigate why semodule -N --rebuild-if-modules-changed thinks this isn't a no-op.

@jlebon
Copy link
Member Author

jlebon commented Oct 25, 2022

FYI @WOnder93

jlebon added a commit to jlebon/os that referenced this issue Oct 25, 2022
With -9.el8, `ext.config.rebuild-selinux-policy` fails:
openshift#1036

We need to debug this, but for now let's unblock CI and dev pipelines.
@WOnder93
Copy link

Hm... can you point me to the code behind kola-runext-test.sh?

There is a known quirk that after the linked libsemanage patch, the "no-op" path (i.e. when there are no changes in the modules and only the rest of the content is refreshed) produces a different binary policy than the full rebuild "from scratch". The policies are semantically equal, but some things get ordered differently and the resulting policies don't match byte-to-byte. I suppose this might be confusing the test.

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

@lucab
Copy link
Contributor

lucab commented Oct 26, 2022

@WOnder93 this is the test: https://github.com/openshift/os/blob/master/tests/kola/rebuild-selinux-policy/test.sh
The underlying service logic is at https://github.com/openshift/os/blob/master/overlay.d/05rhcos/usr/libexec/rhcos-rebuild-selinux-policy.

Overall, the "recompile on boot" logic is gated by a cmp --quiet /{usr/,}etc/selinux/targeted/policy/policy.31.
The two files seems to have the exact same size, but their contents don't match byte-to-byte.

@lucab
Copy link
Contributor

lucab commented Oct 26, 2022

For reference, all of this comes from #962 as a workaround for https://issues.redhat.com/browse/OCPBUGS-595.

@cgwalters
Copy link
Member

OK, we just need to patch ostree to turn off this logic on the initial deployment.

@cgwalters
Copy link
Member

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

I understand. But longer term, driving binary-level reproducibility into everything we do is important for reproducible builds, binary verification etc.

@jlebon
Copy link
Member Author

jlebon commented Oct 26, 2022

OK, we just need to patch ostree to turn off this logic on the initial deployment.

That will fix the first boot issue, but I think what we want is to make sure we don't regenerate at all even on the next new deployment, no? So then, maybe a better fix is to run semodule -N --rebuild-if-modules-changed right after we do a full policy build.

@jlebon
Copy link
Member Author

jlebon commented Oct 26, 2022

There is a known quirk that after the linked libsemanage patch, the "no-op" path (i.e. when there are no changes in the modules and only the rest of the content is refreshed) produces a different binary policy than the full rebuild "from scratch". The policies are semantically equal, but some things get ordered differently and the resulting policies don't match byte-to-byte. I suppose this might be confusing the test.

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

Is there a ticket somewhere tracking this? Then we could reference it in our code and that way also be able to know when we don't need to work around this issue anymore.

HuijingHei added a commit to HuijingHei/os that referenced this issue Oct 27, 2022
```
With -9.el8, ext.config.rebuild-selinux-policy fails:
openshift#1036

We need to debug this, but for now let's unblock CI and dev pipelines.
```
cherry-pick openshift@247e64a
HuijingHei added a commit to HuijingHei/os that referenced this issue Oct 27, 2022
```
[jlebon]
  With -9.el8, ext.config.rebuild-selinux-policy fails:
  openshift#1036

  We need to debug this, but for now let's unblock CI and dev pipelines.
```
cherry-pick openshift@247e64a
@WOnder93
Copy link

The underlying service logic is at https://github.com/openshift/os/blob/master/overlay.d/05rhcos/usr/libexec/rhcos-rebuild-selinux-policy.

Looking at that logic, shouldn't the RHEL version match pattern be 8.[0-5] instead of 8.[0-6]? I thought the plan was to get the patched ostree & libsemanage & policycoreutils backported/tagged to RHEL-8.6 - if that has been achieved, then the above workaround shouldn't need to be activated on RHEL-8.6. Or am I misunderstanding something?

@travier
Copy link
Member

travier commented Oct 28, 2022

From memory, we did not fully have things ready in 8.6 at the time we made the workaround. This might have change. We would have to take another look.

@travier
Copy link
Member

travier commented Oct 28, 2022

If someone else agrees with my assessment, then we can try a revert of this workaround. It's slightly late in 4.12 to do that now but should be good.

@cgwalters
Copy link
Member

Ugh wait so there's a corollary to this - it looks like for quite some time now we've actually been rebuilding the policy by default on newer systems (current FCOS e.g.). On a stock FCOS I see:

[root@cosa-devsh ~]# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2022-11-01 21:02:30 UTC)
Deployments:
● fedora:fedora/x86_64/coreos/next
                  Version: 37.20221021.1.0 (2022-10-24T18:12:48Z)
                   Commit: 5d50e945e2a3aa5bedadb998bfc3d611cfc628412a52346218575d5733c0407a
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A

  fedora:fedora/x86_64/coreos/next
                  Version: 37.20220918.1.1 (2022-09-21T21:05:43Z)
                   Commit: 9f38af9a6fc0d38acfbd496199b495482266ba0cad3012410b539916001d36e6
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
[root@cosa-devsh ~]# ostree admin config-diff|grep -i selin
M    selinux/targeted/policy/policy.33
M    selinux/targeted/active/commit_num
M    selinux/targeted/active/policy.kern
A    selinux/targeted/semanage.read.LOCK
A    selinux/targeted/semanage.trans.LOCK
[root@cosa-devsh ~]# 

That's quite unfortunate.

Hmm, we have a kola test that verifies the set of files in ostree admin config-diff doesn't grow unexpectedly.

Why is it that we're getting this behavior? We're compiling policy at build time via semodule -B, do we need to also invoke semodule --refresh right after that? If so that'd at least avoid having "pointless" policy modifications for new systems going forward.

@jlebon
Copy link
Member Author

jlebon commented Nov 2, 2022

Ugh wait so there's a corollary to this - it looks like for quite some time now we've actually been rebuilding the policy by default on newer systems (current FCOS e.g.).

Ouch. 😢

I mean, at least we have policy recompilation now, so users won't be missing out on policy updates. But nodes by default not using the canonical policy is very unfortunate indeed.

It'd be nice if we could get those machines back on the canonical policy. I think that's possible and would require implementing some of the follow-up bits we discussed in coreos/fedora-coreos-tracker#701.

Why is it that we're getting this behavior?

I think it's the same issue hitting RHCOS (see #1036 (comment)). Locally inspecting the vanilla qcow2 using guestmount, we can see the policy is already different.

We're compiling policy at build time via semodule -B, do we need to also invoke semodule --refresh right after that? If so that'd at least avoid having "pointless" policy modifications for new systems going forward.

Yeah, I suggested this higher up too. I'll try it out and see if it fixes it, but would be good to have @WOnder93 confirm that's a sane strategy.


OK more information on this. Fedora 36 is not affected, only Fedora 37 (i.e. only next currently). So that leads me to believe the issue was introduced in policycoreutils v3.4 (f36 is on v3.3). So we have a chance to fix this before this hits testing in two weeks when we GA, and stable two weeks after that.

Currently testing the rpm-ostree --refresh workaround.

jlebon added a commit to jlebon/rpm-ostree that referenced this issue Nov 2, 2022
There is a bug in the latest semanage code which causes an invocation of
`semodule --rebuild-if-modules-changed` to still write a policy even
though nothing changed since a full policy build. On FCOS and RHCOS,
this bug is triggered as early as `ostree admin deploy` in cosa when
creating the disk images. This results in shipping images with a policy
diff baked in.

Hack around this by immediately rerunning
`semodule --rebuild-if-modules-changed` after building the policy.

Fixes: openshift/os#1036
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Nov 2, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
@jlebon
Copy link
Member Author

jlebon commented Nov 2, 2022

Currently testing the rpm-ostree --refresh workaround.

OK yup, that does work: coreos/rpm-ostree#4122. Also added an f-c-c test in coreos/fedora-coreos-config#2056.

@jlebon
Copy link
Member Author

jlebon commented Nov 2, 2022

Hmm, we have a kola test that verifies the set of files in ostree admin config-diff doesn't grow unexpectedly.

I was looking for that and couldn't find it. I filed coreos/fedora-coreos-tracker#1335.

jlebon added a commit to jlebon/rpm-ostree that referenced this issue Nov 2, 2022
There is a bug in the latest semanage code which causes an invocation of
`semodule --rebuild-if-modules-changed` to still write a policy even
though nothing changed since a full policy build. On FCOS and RHCOS,
this bug is triggered as early as `ostree admin deploy` in cosa when
creating the disk images. This results in shipping images with a policy
diff baked in.

Hack around this by immediately rerunning
`semodule --rebuild-if-modules-changed` after building the policy.

Fixes: openshift/os#1036
(cherry picked from commit 479050e)
cgwalters pushed a commit to coreos/rpm-ostree that referenced this issue Nov 2, 2022
There is a bug in the latest semanage code which causes an invocation of
`semodule --rebuild-if-modules-changed` to still write a policy even
though nothing changed since a full policy build. On FCOS and RHCOS,
this bug is triggered as early as `ostree admin deploy` in cosa when
creating the disk images. This results in shipping images with a policy
diff baked in.

Hack around this by immediately rerunning
`semodule --rebuild-if-modules-changed` after building the policy.

Fixes: openshift/os#1036
(cherry picked from commit 479050e)
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Nov 3, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Nov 3, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Nov 3, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Nov 3, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 3, 2022
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This is a test for openshift/os#1036. It also
exists in the rpm-ostree CI, but let's have it here too since other
packages can break this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants