-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay/live: support booting from live ISO without networking #326
overlay/live: support booting from live ISO without networking #326
Conversation
e38115d
to
c1262f0
Compare
This looks sane but it heavily overlaps with #321 right? |
I guess #321 is trying to be more ambitious. But, you're missing an important case - this needs to support Something like
(Edit, the |
Is there a case where someone would provide both |
I can't think of a good use case for that, feels like we can leave it as unspecified behavior. |
Ignition prioritizes the karg over |
Makes sense to me. Given the
|
So in this scenario you are assuming that |
WDYT about coreos/ignition#956 instead of this? This would solve the generic conditional networking problem. Though we could get something like this in as a short-term fix too if there isn't consensus on the approach taken there. |
Cross-linking coreos/ignition#956 (comment). I sanity checked that the live ISO now boots fully offline. (Haven't tried with an embedded Ignition config that pulls in networking yet, but that should Just Work.) |
That's what |
OK i'm going to revive this PR and move it forward. One thing I'm thinking about doing here is leave Thoughts? |
I think the argument for doing this is incase someone provided |
Doesn't seem like a strong argument to me. Using |
A couple of us stayed longer in the open discussion today and came up with a way to get the behavior we wanted without having leaving |
c1262f0
to
b1cc742
Compare
ok - marking this as ready for review. The logic and the reasoning is captured in this description:
In this I also added a unit to change NetworkManager-wait-online.service in the real root to not show a failure if it can't get the network and also to timeout in 5 vs 30 seconds. This is done to improve the user experience of a user that boots the Live ISO without networking. |
@@ -0,0 +1,23 @@ | |||
# Configure NetworkManager-wait-online in the real root for the | |||
# Live ISO timeout quicker and also not explicitly fail since | |||
# booting the Live ISO without network is a valid use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support an explicit API to turn off networking instead of this.
Basically all we need is something like:
coreos-installer iso embed --no-initramfs-network
which would also create /etc/coreos-install-nonet
in the cpio archive along with /config.ign
.
Then we'd just have a:
ConditionPathExists=!/etc/coreos-install-nonet
in the other service above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this configures the service in the real root, not in the initrd.
I think we should match what the Fedora live ISO does here, which is to opportunistically set up networking if available, but otherwise not fail. Was playing around with that in the Fedora 32 Workstation live ISO in both a VM without any network adapters and one connected to an isolated network and NetworkManager-wait-online.service
worked just fine either way. I don't see any timeout overrides or configuration tweaks. But clearly, something is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this configures the service in the real root, not in the initrd.
Oh, right.
I think we should match what the Fedora live ISO does here, which is to opportunistically set up networking if available, but otherwise not fail.
Hmm. OK yeah I guess so.
b1cc742
to
34e790c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good to me! I assume you tested it manually?
It shouldn't be terribly hard to add kola testiso --offline
that starts the VM with no NICs at all; I actually did this recent PR with that in mind because we can use virtio-channels to talk to a VM with no NIC.
thanks @cgwalters! Yep, I've been testing this over and over today. Note that I'm testing this by setting up libvirt network with no DHCP and then running nmtui (future PR) once I get the system up to configure a static IP. Here's the libvirt network XML I'm using:
I'll have to checkout @jlebon if you have any more suggestions on what I can change to make it more like the Live ISO let me know. When you boot the Fedora live ISO does it timeout before you're able to get to a console? Maybe you just don't see the timeout/failure because it's a GUI interface? |
Since I know he had voiced earlier concerns about this I talked with @arithx earlier today and he said he was good with the following logic:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlebon if you have any more suggestions on what I can change to make it more like the Live ISO let me know. When you boot the Fedora live ISO does it timeout before you're able to get to a console? Maybe you just don't see the timeout/failure because it's a GUI interface?
I add console=ttyS0
and attach to the VM console so I can see the logs. And yeah it's weird, it doesn't time out, it just... succeeds. Doesn't even seem to take more than a few seconds either. I'll have to poke around more on this, but anyway, I don't think it's a blocker!
install_and_enable_unit "coreos-liveiso-network-kargs.service" \ | ||
"initrd.target" | ||
|
||
install_and_enable_unit "coreos-liveiso-reconfigure-nm-wait-online.service" \ | ||
"initrd.target" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion here: there's a generator already for the live ISO, so we could instead dynamically enable this. An advantage of doing that is that systemd doesn't even bother with the service unit and doesn't spam the "Skipped ..." message on every boot on all other platforms. Definitely not a blocker of course. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought of that. If it's just an extra message in the log I think I'd like to keep the unit separate and prevent yet another heredoc in the generator and also retesting this all again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, definitely don't want heredocs either. I meant more doing the ln -s
in the generator. See e.g. https://github.com/coreos/ignition-dracut/blob/6136be3d9d38d7926a61cd4d1b4ba5f9baf0892f/dracut/30ignition/ignition-generator#L39-L40. Anyway, as is is fine with me too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I just realized you were referring to both units and not just the reconfigure-nm-wait-online
one. ok let me look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so right now the add_requires in the live generator for live makes them all a requirement of initrd-root-fs.target
. Currently I had made them a target of initrd.target
. I could modify the add_requires()
function like is done in other generators or I could just make them required by initrd-root-fs.target
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think enhancing add_requires
like in Ignition to allow specifying the target makes sense (and keeping the new units to initrd.target
).
# This is all done because we want to support a mode where | ||
# the user can boot the live ISO and get to an interactive | ||
# prompt without requiring networking on boot. The user can | ||
# then configure the networking interactively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome documentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks :)
# then configure the networking interactively. | ||
# | ||
[Unit] | ||
Description=conditionally add networking kargs for live ISO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Request live ISO networking" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this message show up in the logs even if it doesn't run? If so then I'd like to keep some words in there that indicates it is conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it'll print:
Condition check resulted in being skipped.
If so then I'd like to keep some words in there that indicates it is conditional
Is it though? Once the unit is activated, we're definitely activating networking, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it though? Once the unit is activated, we're definitely activating networking, no?
It think that is a "no" answer to my question: "Does this message show up in the logs even if it doesn't run?" If that is the case I'm 👍 to changing the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh hehe, the GitHub markdown elided the critical part of my message. It prints:
Condition check resulted in $descrition being skipped.
What I mean is that all the conditionals are part of the systemd unit itself already, so whether the unit is skipped or not is already reflected in what systemd does (and prints). Since the script itself unconditionally enables rd.neednet
, it'd be cleaner to say something like "Request live ISO networking".
# Note that because of the priority of /etc/cmdline.d/*.conf it doesn't | ||
# matter if we do this check or if we unconditionally write ip=dhcp,dhcp6 | ||
# because it will never take precedence over an ip= arg on the kernel | ||
# command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure I follow this... doesn't that mean we don't need to check for ip
at all then and can just unconditionally print ip=dhcp,dhcp6
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, basically because of the way /etc/cmdline.d/*.conf gets merged with the kernel command line we could unconditionally write ip=dhcp,dhcp6
and the right thing would still happen but I think it would be misleading. IMHO future developers who come in here and try to figure out what is going on are better off with the current conditional logic and comment.
scratch all of that. I now think it does matter because it's valid for a user to provide multiple ip=
kargs so if we unconditionally add it then we'd end up with something like
ip=dhcp,dhcp6 ip=192.168.130.2::192.168.130.1:255.255.255.0:fcos:eth0:none:192.168.130.1
that dracut/NM would then parse and that's not what we want.
I'll delete all but the first line of the comment if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yup that makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see.. if I hadn't try to "do the right thing" and also comment the hell out of this (even though the comment was wrong) I would have unconditionally added ip=dhcp,dhcp6
😜
This is on a system with a network without DHCP? See #326 (comment) for how I'm setting up a libvirt network to test this. On the Fedora Live ISO what does |
Ahh yup, good catch! I tested no network adapters, and with a network adapter isolated, but still with DHCP and those worked fine. An isolated network adapter without DHCP does indeed cause |
34e790c
to
c174ca4
Compare
ok pushed up a change to address code review comments. Once you say it looks good I'll give a final round of testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
This matches what is done in our other generators already and will make it easier to enable units for different targets.
c174ca4
to
0906851
Compare
rebased on top of latest testing-devel - now doing final testing |
There is a scenario where the user wants to configure networking after they get to the interactive bash prompt. Let's support this. Fixes coreos/fedora-coreos-tracker#349
This adds coreos-liveiso-reconfigure-nm-wait-online.service which will configured NetworkManager-wait-online.service in the real root timeout quicker and also not show a failure if there is no connection. Doing this for the Live ISO improves the user experience when booting the Live ISO without network.
0906851
to
6237683
Compare
ok pushed one final commit to fix a problem introduced in the last change.. merging |
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
We originally did this in #326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483). (cherry picked from commit dd54e8c)
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483). (cherry picked from commit dd54e8c)
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483). (cherry picked from commit dd54e8c)
We originally did this in #326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483). (cherry picked from commit dd54e8c)
We originally did this in #326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483). (cherry picked from commit dd54e8c)
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
We originally did this in coreos#326 because we wanted to support booting the live ISO without networking. This was solved on the initramfs side by the conditional networking work (coreos#426). But for the real root, this was still useful because if booting the ISO interactively on a system without any network, or a non-DHCP network, we didn't want the user to have to wait until the service timed out before getting a shell. The core issue however is that we're requesting `network-online.target` at all. It's an "active unit" which means that it's only pulled in the transaction, possibly delaying boot, if another systemd unit needs it. And ideally, no service would need it as per: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ In our case, this unit was fedora-coreos-pinger. We drop that requirement here: coreos/fedora-coreos-pinger#41 With that, we no longer pull in `network-online.target` and so no longer delay reaching the console even if NetworkManager isn't able to get an active connection for whatever reason. This matches how it works on traditional Fedora as well. Having a short timeout actually also had a counterproductive effect in the automated install case. There, `coreos-installer.service` does pull in `network-online.target` (which with coreos/coreos-installer#565 we could consider dropping as advised by systemd, though we probably should bump the number of retries some more in that case), but because of the short timeout, we genuinely may not yet have the network fully up before we run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
There is a scenario where the user wants to configure networking
after they get to the interactive bash prompt. Let's support this.
Fixes coreos/fedora-coreos-tracker#349