Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

35coreos-live: stop overriding NetworkManager-wait-online timeout to 5s #1088

Merged
merged 2 commits into from
Jul 8, 2021

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 7, 2021

We originally did this in #326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting network-online.target
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in network-online.target and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, coreos-installer.service does pull
in network-online.target (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).

@jlebon
Copy link
Member Author

jlebon commented Jul 7, 2021

Requires: coreos/fedora-coreos-pinger#41 (or alternatively, disable or nuke fedora-coreos-pinger as per coreos/fedora-coreos-tracker#770).

@dustymabe
Copy link
Member

Wow - really nice work and investigation here @jlebon @bengal

The core issue however is that we're requesting network-online.target
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

Might it be nice to have a test that tries to detect if any units (including ones we don't own) pull in network-online.target to warn us about the side effects in the future?

@dustymabe
Copy link
Member

I still support adding coreos/coreos-installer#565 as it just helps us make things more resilient in the future.

@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2021

Started a backport for the pinger patch in https://src.fedoraproject.org/rpms/rust-fedora-coreos-pinger/pull-request/4 so we can unblock this.

Might it be nice to have a test that tries to detect if any units (including ones we don't own) pull in network-online.target to warn us about the side effects in the future?

Yeah, I think that makes sense.

We originally did this in coreos#326 because we wanted to support booting the
live ISO without networking. This was solved on the initramfs side by
the conditional networking work (coreos#426). But for the real root, this was
still useful because if booting the ISO interactively on a system
without any network, or a non-DHCP network, we didn't want the user to
have to wait until the service timed out before getting a shell.

The core issue however is that we're requesting `network-online.target`
at all. It's an "active unit" which means that it's only pulled in the
transaction, possibly delaying boot, if another systemd unit needs it.
And ideally, no service would need it as per:

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

In our case, this unit was fedora-coreos-pinger. We drop that
requirement here:

coreos/fedora-coreos-pinger#41

With that, we no longer pull in `network-online.target` and so no longer
delay reaching the console even if NetworkManager isn't able to get an
active connection for whatever reason. This matches how it works on
traditional Fedora as well.

Having a short timeout actually also had a counterproductive effect in
the automated install case. There, `coreos-installer.service` does pull
in `network-online.target` (which with
coreos/coreos-installer#565 we could consider
dropping as advised by systemd, though we probably should bump the
number of retries some more in that case), but because of the short
timeout, we genuinely may not yet have the network fully up before we
run (see https://bugzilla.redhat.com/show_bug.cgi?id=1967483).
@jlebon jlebon force-pushed the pr/drop-liveiso-NM-timeout branch from a672d84 to a0a785f Compare July 8, 2021 17:38
@jlebon
Copy link
Member Author

jlebon commented Jul 8, 2021

Updated this now with a test and fast-tracking the fixed fedora-coreos-pinger!

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebon jlebon merged commit 9942612 into coreos:testing-devel Jul 8, 2021
@jlebon jlebon deleted the pr/drop-liveiso-NM-timeout branch July 8, 2021 20:34
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
jlebon added a commit to jlebon/coreos-installer that referenced this pull request Nov 6, 2023
The `Wants=network-online.target` predates the addition of fetch retries.
Now that we retry HTTP requests indefinitely in the automated flow, let's
follow best practices and stop pulling in `network-online.target`.

Still keep it as an `After` though; *if* something pulls in
`network-online.target`, then we might as well be smarter and start our
retries after we know we're online. Also add `network.target` so that
if `network-online.target` *isn't* pulled in, we still have a reasonable
lower bound on when we start retrying.

Related: coreos/fedora-coreos-config#1088
Related: coreos#565 (comment)
Closes: https://github.com/coreos/coreos-installer/issues/1334
jlebon added a commit to jlebon/coreos-installer that referenced this pull request Nov 6, 2023
The `Wants=network-online.target` predates the osmet work which enabled
the now default fully offline install flow.

It also predates the addition of fetch retries. So even in an online
flow, now that we retry HTTP requests indefinitely, we don't really
need this.

Let's follow best practices and stop pulling in `network-online.target`.

Still keep it as an `After` though; *if* something pulls in
`network-online.target`, then we might as well be nicer and start our
retries after we know we're online. Also add `network.target` so that
if `network-online.target` *isn't* pulled in, we still have a reasonable
lower bound on when we start retrying.

Related: coreos/fedora-coreos-config#1088
Related: coreos#565 (comment)
Related: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Closes: https://github.com/coreos/coreos-installer/issues/1334
jlebon added a commit to jlebon/coreos-installer that referenced this pull request Nov 9, 2023
The `Wants=network-online.target` predates the osmet work which enabled
the now default fully offline install flow.

It also predates the addition of fetch retries. So even in an online
flow, now that we retry HTTP requests indefinitely, we don't really
need this.

Let's follow best practices and stop pulling in `network-online.target`.

We still need to keep pulling in `NetworkManager.service`
though. It's enabled by default in `multi-user.target` but not
`coreos-installer-post.target` (which is what we boot to in an automated
install).

NetworkManager is capable of handling offline environments just fine and
won't block the install just because a connection isn't available (yet
or ever).

Related: coreos/fedora-coreos-config#1088
Related: coreos#565 (comment)
Related: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Closes: https://github.com/coreos/coreos-installer/issues/1334
jlebon added a commit to jlebon/coreos-installer that referenced this pull request Nov 9, 2023
The `Wants=network-online.target` predates the osmet work which enabled
the now default fully offline install flow.

It also predates the addition of fetch retries. So even in an online
flow, now that we retry HTTP requests indefinitely, we don't really
need this.

Let's follow best practices and stop pulling in `network-online.target`.

We still need to keep pulling in `NetworkManager.service`
though. It's enabled by default in `multi-user.target` but not
`coreos-installer-post.target` (which is what we boot to in an automated
install).

NetworkManager is capable of handling offline environments just fine and
won't block the install just because a connection isn't available (yet
or ever).

Related: coreos/fedora-coreos-config#1088
Related: coreos#565 (comment)
Related: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Closes: https://github.com/coreos/coreos-installer/issues/1334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants