Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply: bring "lo" back up if it's managed by NM (LP# 2034595) #408

Merged
merged 2 commits into from
Dec 5, 2023

Conversation

daniloegea
Copy link
Collaborator

@daniloegea daniloegea commented Sep 6, 2023

NM is not bringing it back automatically after we flush its addresses. It's not clear if it's a bug or intended behavior.

It's bad because we have services listening on the loopback interface, such as resolved.

How to reproduce:

  1. Install network-manager in Mantic and let it manage at least one of your interfaces
  2. You'll see it's managing 'lo' automatically
  3. Try to change the 'lo' connection, something like nmcli con mod lo mtu 1500
  4. NM will emit a YAML (an nm-device) for 'lo'
  5. Run netplan apply
  6. You'll see that the 'lo' connection is down and the 'lo' interface has no IPs

As an alternative (was my original patch) we could just stop flushing 'lo', but that might be what the user want to do, because they can add extra addresses to it for example.

This might actually be a bug in NM and if it's fixed in the future we could drop this change.

Description

Checklist

  • Runs make check successfully.
  • Retains 100% code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

NM is not bringing it back automatically after we flush its addresses.
It's not clear if it's a bug or intended behavior.
@daniloegea daniloegea changed the title apply: bring "lo" back up if it's managed by NM apply: bring "lo" back up if it's managed by NM (LP# 2034595) Sep 6, 2023
@daniloegea daniloegea marked this pull request as ready for review September 6, 2023 15:09
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very clean reproducer!

I think this workaround is acceptable, but I'd like to give two additional ideas for consideration:

Instead of flushing the IPs, we could consider dropping just any additional IPs from the loopback interfaces (ip addr del), additional IPs being the ones which are not from the loopback range (127.0.0.0/8 and ::/128), or alternatively remember and add back IPs from that address range on the loopback interface manually (ip addr add ... dev lo) after flushing.

This is because when I install network-manager I cann see the loopback interface is managed externally (just tracked by NM) (nmcli d output):
lo loopback connected (externally) lo
After modifications through nmcli/netplan it's fully managed by NM:
lo loopback connected lo
When using nmcli con up lo it stays fully managed by NM:
lo loopback connected lo
When adding loopback IPs manually (ip addr add ...) it says externally managed:
lo loopback connected (externally) lo

The latter case resembles the initial scenario (without modification to that proflie) more closely, so maybe that's the cleaner alternative?

That being said, I think all solutions/workarounds will provide the desired outcome of lo being UP and having the relevant IPs assigned.

WDYT?

# connection with nmcli for example, NM will create a persistent nmconnection file and emit a YAML for it.
if 'lo' in nm_interfaces:
cmd = ['nmcli', 'con', 'up', 'lo']
subprocess.call(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Copy link
Collaborator

@slyon slyon Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: We should consider using the utils.nmcli() helper here. Especially due to #336 (although, I think the implemented solution for PATH should work with the individual subprocess calls, too).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils.nmcli() calls check_call, which will throw an exception if nmcli fails for some reason, maybe we should use call to avoid breaking apply (or to avoid a try-except here...).

@daniloegea
Copy link
Collaborator Author

After some more digging, there are few more issues to consider:

Consider this scenario:

# nmcli d
DEVICE  TYPE      STATE      CONNECTION
enp5s0  ethernet  connected  ethernet-enp5s0
lo      loopback  connected  lo

lo is managed after I had changed it.

netplan apply will disconnect the lo device, stop NM, remove /var/run/NetworkManager/devices/* (which includes the lo state) and start NM.

The call to nmcli device disconnect lo would make the interface unmanaged:

# nmcli d
DEVICE  TYPE      STATE      CONNECTION
enp5s0  ethernet  connected  ethernet-enp5s0
lo      loopback  connected  lo

root@mantic:~# nmcli device disconnect lo
Device 'lo' successfully disconnected.

root@mantic:~# nmcli con show
NAME             UUID                                  TYPE      DEVICE
ethernet-enp5s0  b5be7fd0-5c17-4d8e-8f9b-91c0f59569b7  ethernet  enp5s0
lo               7350e7d9-8802-4aff-8ebd-2110bf8166f5  loopback  lo
lo               4a6ef6a1-90c9-4322-b8a5-4e393bfbbbe3  loopback  --

root@mantic:~# nmcli device
DEVICE  TYPE      STATE                   CONNECTION
enp5s0  ethernet  connected               ethernet-enp5s0
lo      loopback  connected (externally)  lo

Skipping the nmcli device disconnect would help, but we also delete it's state after NM is stopped, which leads to the same problem:

# nmcli con show
NAME             UUID                                  TYPE      DEVICE
ethernet-enp5s0  b5be7fd0-5c17-4d8e-8f9b-91c0f59569b7  ethernet  enp5s0
lo               4a6ef6a1-90c9-4322-b8a5-4e393bfbbbe3  loopback  lo

root@mantic:~# nmcli d
DEVICE  TYPE      STATE      CONNECTION
enp5s0  ethernet  connected  ethernet-enp5s0
lo      loopback  connected  lo

root@mantic:~# systemctl stop NetworkManager.service

root@mantic:~# rm /var/run/NetworkManager/devices/1

root@mantic:~# systemctl start NetworkManager.service

root@mantic:~# nmcli d
DEVICE  TYPE      STATE                   CONNECTION
enp5s0  ethernet  connected               ethernet-enp5s0
lo      loopback  connected (externally)  lo

root@mantic:~# nmcli con show
NAME             UUID                                  TYPE      DEVICE
ethernet-enp5s0  b5be7fd0-5c17-4d8e-8f9b-91c0f59569b7  ethernet  enp5s0
lo               e314c527-4b3e-4001-9fd3-f470128aa401  loopback  lo
lo               4a6ef6a1-90c9-4322-b8a5-4e393bfbbbe3  loopback  --

So far the only two solutions I see are manually bringing lo back or don't disconnect it and don't remove the state file linked to lo (which I'm not completely sure is a safe thing to do...)

@daniloegea
Copy link
Collaborator Author

daniloegea commented Sep 7, 2023

While we don't find a better solution, I changed the patch to take into account cases where the loopback connection is not called "lo". In fact it can have any name.

It might have any name, so we need to get the connection name
dynamically.
@slyon
Copy link
Collaborator

slyon commented Sep 14, 2023

I think this is a good workaround for the issue we face, so let's cherry-pick it into the package: https://salsa.debian.org/debian/netplan.io/-/commit/5b2eeb4

Still keeping it open here, waiting if we can get some input from NM upstream to see if this behavior is intentional or should rather be fixed in NetworkManager itself.

@slyon
Copy link
Collaborator

slyon commented Dec 5, 2023

Quoting man nmcli(1), this behavior seems to be intentional:

       down ifname...
           Disconnect a device and prevent the device from automatically activating further connections without
           user/manual intervention. Note that disconnecting software devices may mean that the devices will
           disappear.

           If --wait option is not specified, the default timeout will be 10 seconds.

       disconnect ifname...
           Alias for command down. Before version 1.34.0 down was not supported.

We had the patch in Ubuntu Mantic & Noble and Debian testing & unstable since almost 3 months without any issues. Let's merge it.

@slyon slyon merged commit 8c7103a into canonical:main Dec 5, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants