Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mounting /sysroot (and /boot) read-only #1767

Merged

Conversation

cgwalters
Copy link
Member

Add a magic file and an API for consumers to tell us it's OK
for us to remount read-write (since it'll just affect our
mount namespace).

@cgwalters
Copy link
Member Author

WIP for #1265

@cgwalters cgwalters force-pushed the sysroot-mnt-namespace branch 2 times, most recently from 6ef8928 to e3e9e90 Compare December 16, 2018 19:43
@cgwalters cgwalters force-pushed the sysroot-mnt-namespace branch 6 times, most recently from 035751f to e5c5e2b Compare December 29, 2018 18:47
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 3ca1035) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

OK did the trivial rebase 🏄‍♂️ for this, and was testing out this patch to cosa:

diff --git a/src/create_disk.sh b/src/create_disk.sh
index 2f9fd779..3f01644c 100755
--- a/src/create_disk.sh
+++ b/src/create_disk.sh
@@ -111,6 +111,9 @@ fi
 
 # we use pure BLS, so don't need grub2-mkconfig
 ostree config --repo rootfs/ostree/repo set sysroot.bootloader none
+# Opt-in to https://github.com/ostreedev/ostree/pull/1767 AKA
+# https://github.com/ostreedev/ostree/issues/1265
+ostree config --repo rootfs/ostree/repo set sysroot.readonly true
 
 if [ "$arch" != "s390x" ]; then
 	# install uefi grub

And FCOS goes into emergency mode since systemd-tmpfiles-setup.service fails due to /var being read-only. I guess the pile of hacks in ostree-remount.c isn't working right.

@cgwalters
Copy link
Member Author

Toying with

diff --git a/src/switchroot/ostree-remount.c b/src/switchroot/ostree-remount.c
index 4b4f5356..f36ecc48 100644
--- a/src/switchroot/ostree-remount.c
+++ b/src/switchroot/ostree-remount.c
@@ -120,47 +120,25 @@ main(int argc, char *argv[])
    */
   const bool sysroot_readonly = sysroot_is_configured_ro ();
   if (!sysroot_readonly)
-    do_remount ("/sysroot", !sysroot_readonly);
+    do_remount ("/sysroot", TRUE);
   else
     {
-      /* Since /sysroot is the real physical root, we can't simply remount it
-       * read-only here, as that'd affect e.g. /etc in and also /var if it's not
-       * a separate mount. Instead, we make new read-only bind mount to it,
-       * unmount the original, then move the bind mount to /sysroot.
+      do_remount ("/sysroot", FALSE);
+
+      /* Now, /etc is not normally a bind mount, but remounting the
+       * sysroot above made it read-only since it's on the same filesystem.
+       * Make it a self-bind mount, so we can then mount it read-write.
        */
-      static const char tmp_sysroot[] = "/etc/ostree-sysroot.tmp";
-      static const char sysroot[] = "/sysroot";
-
-      /* This temporary lives in /etc since it needs to be on the same mount. */
-      if (mkdir (tmp_sysroot, 0755) < 0)
-        err (EXIT_FAILURE, "mkdir(%s)", tmp_sysroot);
-      /* Make it a read-only bind mount to /sysroot */
-      if (mount (sysroot, tmp_sysroot, NULL, MS_BIND | MS_PRIVATE, NULL) < 0)
-        err (EXIT_FAILURE, "failed to bind mount %s %s", sysroot, tmp_sysroot);
-      if (mount (tmp_sysroot, tmp_sysroot, NULL, MS_BIND | MS_RDONLY | MS_REMOUNT, NULL) < 0)
-        err (EXIT_FAILURE, "failed to remount ro %s", tmp_sysroot);
-      if (umount (sysroot) < 0)
-        err (EXIT_FAILURE, "while remounting %s read-only: umount", sysroot);
-
-      /* HACK: We can't move a mount that's under a shared namespace. So we
-       * briefly make the sysroot private so that we can move the mount. This
-       * does introduce a race condition where if e.g. another process mounted
-       * something in / it wouldn't be visible in other mount namespaces.  But
-       * we're running quite early, before e.g. any container runtimes should
-       * be starting.
+      if (mount ("/etc", "/etc", NULL, MS_BIND, NULL) < 0)
+        err (EXIT_FAILURE, "failed to make /etc a bind mount");
+      do_remount ("/etc", TRUE);
+      /* If /var was created as as an OSTree default bind mount (instead of being a separate filesystem)
+       * then remounting the root mount read-only also remounted it.
+       * So just like /etc, we need to make it read-write by default.
+       * If it was a separate filesystem, we expect it to be writable anyways,
+       * so it doesn't hurt to remount it if so.
        */
-      if (mount ("none", "/", NULL, MS_PRIVATE, NULL) < 0)
-        err (EXIT_FAILURE, "making / private temporarily");
-      /* Do the move */
-      if (mount (tmp_sysroot, sysroot, NULL, MS_MOVE, NULL) < 0)
-        err (EXIT_FAILURE, "failed to move read-only %s mount", sysroot);
-      /* Make / shared again */
-      if (mount ("none", "/", NULL, MS_SHARED, NULL) < 0)
-        err (EXIT_FAILURE, "making / shared again");
-
-      /* And clean up */
-      if (rmdir (tmp_sysroot) < 0)
-        err (EXIT_FAILURE, "rmdir(%s)", tmp_sysroot);
+      do_remount ("/var", TRUE);
     }
 
   exit (EXIT_SUCCESS);

instead.

@cgwalters cgwalters force-pushed the sysroot-mnt-namespace branch 2 times, most recently from dd8b336 to 822b364 Compare September 3, 2019 01:53
@cgwalters
Copy link
Member Author

OK, the problem was ordering around var.mount; fixed that and things seem to work. Needs more testing around upgrades, and we now also have the cases of:

  • FCOS but not configured this way
  • FAH/Silverblue which don't start out with rw on kernel cmdline

@cgwalters cgwalters changed the title WIP: Support mounting /sysroot (and /boot) read-only Support mounting /sysroot (and /boot) read-only Sep 5, 2019
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Sep 5, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
@cgwalters
Copy link
Member Author

OK, lifting WIP on this! I still need more testing, but it's ready for review.

I have a working FCOS with this and:

Including e.g. package layering, etc.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably bdbce9d) made this pull request unmergeable. Please resolve the merge conflicts.

src/libostree/ostree-sysroot-private.h Outdated Show resolved Hide resolved
src/libostree/ostree-sysroot.c Outdated Show resolved Hide resolved
src/libostree/ostree-sysroot.c Show resolved Hide resolved
src/libostree/ostree-sysroot.c Outdated Show resolved Hide resolved
src/ostree/ot-main.c Outdated Show resolved Hide resolved
src/switchroot/ostree-remount.c Outdated Show resolved Hide resolved
src/switchroot/ostree-remount.c Show resolved Hide resolved
src/switchroot/ostree-remount.c Outdated Show resolved Hide resolved
src/libostree/ostree-sysroot.c Show resolved Hide resolved
if (!g_key_file_load_from_file (keyfile, config_path, 0, NULL))
return false;

return g_key_file_get_boolean (keyfile, "sysroot", "readonly", NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to introduce a new key here instead of just handling it based on ro being in /etc/fstab? IIUC, for FCOS at least, it'll still be rw at initrd time to avoid the machine-id mess, and then will be re-mounted ro by systemd right before ostree-remount.service (though we could still recheck & remount here as we do right now).

Might've missed discussions around this, though keeping the same API as on other traditional systems I think is worth a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we don't have an /etc/fstab in FCOS anymore.

The other thing is...we're trying to create a "middle ground" around "read-only". Here the physical root is writable - it has /etc and /var by default. Wouldn't you expect having ro in /etc/fstab would be truly read-only?

This is really a magical ostree-specific flag I think to opt-in to new behavior for systems builders, not really something admins should see/change directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is...we're trying to create a "middle ground" around "read-only". Here the physical root is writable - it has /etc and /var by default. Wouldn't you expect having ro in /etc/fstab would be truly read-only?

OK, I follow. Hmm, I think one confusion here is that "read-only root" can mean a lot of things. Particularly /etc could be interpreted as being covered (and I think it'd be really cool to support this too). I wonder if sysroot.readonly shouldn't be an enum instead. E.g. none (status quo), base (everything except /var and /etc) and full (just /var is read-write). WDYT? (Not suggesting to implement full here, just keeping the option open so we leverage the same knob).

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 11, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 11, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 12, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 12, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 12, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
jlebon pushed a commit to cgwalters/rpm-ostree that referenced this pull request Dec 12, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
openshift-merge-robot pushed a commit to coreos/rpm-ostree that referenced this pull request Dec 13, 2019
This is all we need to tell libostree that we support a read-only
`/sysroot` and `/boot`.

See ostreedev/ostree#1265
PR in ostreedev/ostree#1767
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request May 18, 2020
I am seeing this service fail in RHCOS when rebasing ostree
to a version that supports ostreedev/ostree#1767

When using `DefaultDependencies=no`, one really wants to specify *some*
dependency - very few units have no dependencies at all.

In this case, let's run "early" which means after `sysinit.target`
but before `basic.target`, which will ensure we run after `ostree-prepare-root.service`.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request May 18, 2020
I am seeing this service fail in RHCOS when rebasing ostree
to a version that supports ostreedev/ostree#1767

When using `DefaultDependencies=no`, one really wants to specify *some*
dependency - very few units have no dependencies at all.

In this case, let's run "early" which means after `sysinit.target`
but before `basic.target`, which will ensure we run after `ostree-prepare-root.service`.
cgwalters added a commit to coreos/fedora-coreos-config that referenced this pull request May 18, 2020
I am seeing this service fail in RHCOS when rebasing ostree
to a version that supports ostreedev/ostree#1767

When using `DefaultDependencies=no`, one really wants to specify *some*
dependency - very few units have no dependencies at all.

In this case, let's run "early" which means after `sysinit.target`
but before `basic.target`, which will ensure we run after `ostree-prepare-root.service`.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Oct 2, 2020
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Oct 2, 2020
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also andd `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
dustymabe added a commit to dustymabe/ostree-releng-scripts that referenced this pull request Nov 6, 2020
On newer rpm-ostree systems /sysroot can be mounted read-only [1]
which means we can't pull commit metadata into the repo. Let's
remount it read-write if that's the case.

[1] ostreedev/ostree#1767
jlebon pushed a commit to ostreedev/ostree-releng-scripts that referenced this pull request Nov 9, 2020
On newer rpm-ostree systems /sysroot can be mounted read-only [1]
which means we can't pull commit metadata into the repo. Let's
remount it read-write if that's the case.

[1] ostreedev/ostree#1767
kelvinfan001 pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Dec 4, 2020
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also andd `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
kelvinfan001 pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Jan 6, 2021
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also andd `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
kelvinfan001 pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Jan 6, 2021
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also andd `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
kelvinfan001 pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Jan 6, 2021
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also andd `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
kelvinfan001 pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Jan 7, 2021
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also add `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
kelvinfan001 pushed a commit to coreos/fedora-coreos-config that referenced this pull request Jan 7, 2021
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also add `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also add `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
ostree has had support for leaving `/boot` mounted read-only
for a long time: ostreedev/ostree#1767
(And then later extended to `/sysroot`)

Particularly for CoreOS, only a few things should be touching
`/boot`, and we control all of them.  Those projects should
create a new mount namespace and remount these partitions
writable just while they need it.

The main thing we're accomplishing here is making the system
more resilient against accidental damage from a sysadmin
root shell as well as configuration management tools like
Puppet/Ansible.  None of those should be directly manipulating
files on these partitions, they should go through the API
of one of our projects (e.g. `rpm-ostree kargs`, `bootupctl`) etc.

While we're here, also add `nodev,nosuid` because some
OS hardening scanners like to see this.  IMO it's of minimal
value, but hey, might as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants