[machine-config-operator/baremetal] MCO declarative network configuration #399

bcrochet · 2020-07-13T18:47:57Z

Enhancement proposal to extend MCO for declaritive network configuration.

bcrochet · 2020-07-13T20:10:23Z

/cc @cgwalters

enhancements/machine-config/mco-network-configuration.md

danwinship · 2020-07-18T15:14:37Z

/retitle [machine-config-operator] MCO declarative network configuration

russellb · 2020-07-20T20:28:20Z

There was a past nmstate related enhancement. Why did this one get created vs evolving the existing one?

crawford · 2020-07-20T23:31:39Z

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

My biggest concern with this approach is that this takes us from MachineConfigs (typically) being used to describe a pool of machines to them describing a single machine. It's going to be very cumbersome to use this mechanism to configure a cluster full of unique machines (e.g. each machine needs to be configured with a static IP address) and I imagine we are almost immediately going to be asked to build a templating mechanism so that customers don't need to create a MachineConfigPool for every machine.

russellb · 2020-07-21T14:55:13Z

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

Right, but I remember feedback pushing that enhancement toward discussing the problem more generally and also evolving the proposal to be better integrated with MCO, so I expected something like this just to come in as another revision of the existing enhancement. At a minimum I'd expect them to link to each other to explain the relationship. Is the other one now deprecated and should be closed? Are they to be considered as alternatives to each other?

#161

celebdor · 2020-07-21T15:10:19Z

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

Yes. The biggest objection to the original enhancement, which proposed adding the kubernetes upstream kubernetes-nmstate, was that it added a new way to configure Nodes separate from MachineConfig and that it potentially delayed problems to the following reboot.

It had other issues such as not having a rollout functionality (though it has rollback).

My biggest concern with this approach is that this takes us from MachineConfigs (typically) being used to describe a pool of machines to them describing a single machine. It's going to be very cumbersome to use this mechanism to configure a cluster full of unique machines (e.g. each machine needs to be configured with a static IP address) and I imagine we are almost immediately going to be asked to build a templating mechanism so that customers don't need to create a MachineConfigPool for every machine.

This issue was not present in the kubernetes-nmstate Enhancement proposal. The current proposal basically keeps the API that kubernetes-nmstate users are familiar with and adds and solves the other objections. The biggest change for kubernetes-nmstate users, an a significant one at that, is the granularity. Doing anything with MachineConfig that does not target the entire MachinePool is not really possible at the moment.

Of course, there's ways to work around it. Of the top of my head, we could keep having the user do NodeNetworkConfigurationPolicy and the controller (as in the current proposal), writes multiple files to the MachineConfig for the appropriate MachineConfigPool. That implies that each node in a MachinePool will have several of these files written to its network configuration directory, with each file being targeted at a specific node. I believe that this is fine since:

the API is NNCP and all MachineConfig writes are performed by the operator controller, so the MachineConfig with all the files for different nodes are just an implementation detail.
In the enhancement MCD is stated to be able to detect when a config change is network related, so changing the config for other nodes, will not only not mean a reboot, but if the node network config file did not change for that node (and the NodeNetworkState matches), it will just be a noop that bumps the currentConfig.

celebdor · 2020-07-22T14:53:14Z

Is the other one now deprecated and should be closed? Are they to be considered as alternatives to each other?

That is a good question. I could see both, but I thought that this supersedes the old one. I certainly hoped for the approach proposed here to get an approval quickly since it addresses the objections to the other.

ashcrow · 2020-07-23T14:51:21Z

/cc @kikisdeliveryservice @ericavonb @sinnykumari @yuqi-zhang @runcom

kikisdeliveryservice

Trying to get up to speed: left some questions/comments. I'm not clear on :is this 1 new rendered-config per node? or 1 new rendered config with a bunch of different NM settings for each node? The former seems like a very confusing ui, the latter means that a MC is no longer an easy to grok state for a set of nodes but needs to be parsed closely to figure out what is on each node with different nodes potentially succeeding and failing from the same config? (Note: if 1 node fails on a config we stop applying it to the pool - what do we do in this case?)

This is proposing 2 major things the MCO doesn't actually do yet at once: per node configuration and rebootless updates. Is there a way to tackle this in stages?

I don't doubt that this can be implemented in some way, but I do worry about the maintainability & what issues this will raise when troubleshooting - MCO team will be tasked with maintaining all of this when we have no expertise along with the issues that will come from those 2 very major changes. I also have a concern that the MCO will start being tasked with per node configurations across entire clusters which is the opposite of the MCO (and the problem that I thought we were moving away from).

I'd like to hear about the alternative Option A: "include kubernetes-nmstate as a standalone component within OpenShift" Would this make the network configuration more easily maintainable, etc… and why this is a bad choice for this specific chagne. Especially if as Alex mentioned this might require some tooling to create configs, it seems like this nmstate standalone component might make sense from a maintainability perspective?

enhancements/machine-config/mco-network-configuration.md

cgwalters · 2020-07-30T15:27:34Z

There's a whole lot going on here, but overall
/approve
Basically as long as we have a plan for how nmstate and the MCO interact and can coordinate, that addresses my main concern.

enhancements/machine-config/mco-network-configuration.md

bcrochet · 2020-08-20T15:03:47Z

/cc @cgwalters @runcom @kikisdeliveryservice @ericavonb
Please have a new look at the proposal. We are putting forth an alternative that does not require immediate changes to MCO that go against it's current purpose.

kikisdeliveryservice

Ok so I watched the video and re-read this several times and Option C seems like a pretty stable approach at least vis-a-vis the MCO

On the MCO side, you'd be making changes to /templates/common/baremetal (or somewhere in templates) to laydown the files/create .mount unit, then post kubelet leverage kubernetes-nmstate to make changes to that /tmp/nm-system-connections?

If eventually the user lays down some other MC that causes a reboot the MCO goes thru the same thing above, /tmp/nm-system-connection changes would be lost and the kubernetes-nmstate post-kubelet would do it's thing and reconcile the /tmp/nm-system.. again?

But just to be clear those initial template changes laid down by the MCO would happen during an upgrade to 4.x (which would obviously incur a reboot), but the idea is that when the user decides to leverage kubernetes-nmstate they won't incur an additional reboot to set that using NodeNetworkConfigurationPolicy via nmstate.io ? or that if they want to make changes they'd be using the NNCP to make the change also not incurring a reboot.

Do I have it correct?

kikisdeliveryservice · 2020-08-29T00:15:35Z

enhancements/machine-config/mco-network-configuration.md

+If you need persistent changes, you do them with MCO.
+Kubernetes-nmstate will only affect post-kubelet so we comply with the "MC owns every configuration until boot time".
+
+[Link](https://asciinema.org/a/uMqwIpvfhuI67ShT12csYXm3h) to an asciicast showing this option.


This video is really really helpful

Yes, you have it correct. We skip the reboot, and the configuration for kubernetes-nmstate is ephemeral on disk.

awesome thanks!

cgwalters · 2020-08-31T23:52:10Z

enhancements/machine-config/mco-network-configuration.md

+2. Create two temporary directories for overlay purposes.
+3. Mount an overlay combining the NetworkManager standard /etc/NetworkManager/system-connections and mount it at the path pointed to by 1. e.g., /etc/NetworkManager/system-connections-merged
+
+kubernetes-nmstate will then operate as is, but the keyfiles that ultimately are written by nmstate would effectively be ephemeral. When a node is rebooted, the kubernetes-nmstate-handler will re-process any existing NodeNetworkConfigurationPolicy CRs, and put the configuration back in place.


Aside but recently I wrote: https://blog.verbum.org/2020/08/22/immutable-%E2%86%92-reprovisionable-anti-hysteresis/

It's not that we want the configuration written by nmstate to be ephemeral per se - it's that the "source of truth" should live in one place - in this case, etcd (and not /etc). By having the implementation of this be ephemeral we're avoiding hysteresis during a node boot - there's no dependency during the boot on the previous configuration.

enhancements/machine-config/mco-network-configuration.md

zshi-redhat · 2020-09-04T00:15:40Z

enhancements/machine-config/mco-network-configuration.md

+
+* Replace SRIOV operator
+* Configure and control primary interface on Day 1, including bonds and VLANs.
+  * Part of the effort for that: [Add MCO Flattened Ignition proposal](https://github.com/openshift/enhancements/pull/467)


@bcrochet thanks for adding the notes here, one minor follow-up, I think the proposal is not going to work in day 2 for primary node interface, agree?

I would assume not. But don't quote me on that.

Would it make sense to update this statement to Configure and control primary interface on Day 1 and Day 2 as the non-goal?

Is there any protection against trying to use this to adjust primary interface configuration if it's not intended to be used that way?

I don't think there is a protection mechanism in current proposal (this is also true for additional host interfaces if multiple components try to configure the same device), my understanding of the flow here is it will fail in the case that network configuration is wrong, deleting the NNCP + rebooting node(s) would erase the ephemeral network configuration and recover.

In the failure scenario, it wouldn't even require a reboot. But to answer @russellb , no, there is no protection other than that the admin would have to affirmatively try to do it.

…tion Enhancement proposal to extend MCO for declarative network configuration.

bcrochet · 2020-09-10T17:19:19Z

/assign @kikisdeliveryservice

ashcrow · 2020-09-11T13:27:20Z

enhancements/machine-config/mco-network-configuration.md

+authors:
+  - "@bcrochet"
+reviewers:
+  - "@cwalters"


Let's get some MCO team members added here as well. @runcom, @kikisdeliveryservice, @yuqi-zhang, @ericavonb, @sinnykumari

kikisdeliveryservice · 2020-09-11T16:56:41Z

I'm happy with Option C and really excited you were able to come with it. Thank you for your hard work on this @bcrochet !

/lgtm

kikisdeliveryservice · 2020-09-11T16:57:54Z

somehow @cgwalters approval didnt end up tagging this?

kikisdeliveryservice · 2020-09-11T16:58:45Z

ooh we need someone from the Owners file for that final approval..

/assign @dgoodwin

dgoodwin · 2020-09-11T17:01:01Z

/approve

cgwalters

/approve

openshift-ci-robot · 2020-09-11T17:02:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bcrochet, cgwalters, dgoodwin, kikisdeliveryservice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot requested review from kbsingh and smarterclayton July 13, 2020 18:48

openshift-ci-robot requested a review from cgwalters July 13, 2020 20:10

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

yboaron reviewed Jul 16, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

hardys mentioned this pull request Jul 16, 2020

bond support on bare metal IPI openshift/installer#3876

Closed

openshift-ci-robot changed the title ~~[machine-config-operator] MCO declaritive network configuration~~ [machine-config-operator] MCO declarative network configuration Jul 18, 2020

openshift-ci-robot requested review from ericavonb, kikisdeliveryservice, runcom, sinnykumari and yuqi-zhang July 23, 2020 14:51

kikisdeliveryservice reviewed Jul 23, 2020

View reviewed changes

danwinship reviewed Jul 30, 2020

View reviewed changes

bcrochet force-pushed the mco-network branch from 185ba73 to ca4617d Compare August 3, 2020 14:31

russellb mentioned this pull request Aug 13, 2020

Deploy Kubernetes-nmstate with openshift #161

Closed

zshi-redhat reviewed Aug 17, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

bcrochet force-pushed the mco-network branch from ca4617d to fa2f548 Compare August 19, 2020 19:55

openshift-ci-robot changed the title ~~[machine-config-operator] MCO declarative network configuration~~ [machine-config-operator/baremetal] MCO declarative network configuration Aug 20, 2020

openshift-ci-robot requested a review from kikisdeliveryservice August 20, 2020 15:03

bcrochet force-pushed the mco-network branch from fa2f548 to ac9712e Compare August 28, 2020 14:15

kikisdeliveryservice reviewed Aug 29, 2020

View reviewed changes

cgwalters reviewed Aug 31, 2020

View reviewed changes

zshi-redhat reviewed Sep 1, 2020

View reviewed changes

enhancements/machine-config/mco-network-configuration.md Show resolved Hide resolved

bcrochet mentioned this pull request Sep 2, 2020

[baremetal] Create and enable overlay mount point for NetworkManager openshift/machine-config-operator#2017

Merged

bcrochet force-pushed the mco-network branch from ac9712e to c62ef34 Compare September 3, 2020 18:09

zshi-redhat reviewed Sep 4, 2020

View reviewed changes

[machine-config-operator/baremetal] MCO declarative network configura…

bcb5c85

…tion Enhancement proposal to extend MCO for declarative network configuration.

bcrochet force-pushed the mco-network branch from c62ef34 to bcb5c85 Compare September 4, 2020 14:30

bcrochet mentioned this pull request Sep 9, 2020

Bug 1854306: Initialize host ovs differently for Openshift-SDN and Ovn-kubernetes by ovs-configuration.service openshift/machine-config-operator#2066

Merged

openshift-ci-robot assigned kikisdeliveryservice Sep 10, 2020

ashcrow reviewed Sep 11, 2020

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 11, 2020

openshift-ci-robot assigned dgoodwin Sep 11, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 11, 2020

cgwalters approved these changes Sep 11, 2020

View reviewed changes

openshift-merge-robot merged commit c4afb16 into openshift:master Sep 11, 2020

hardys mentioned this pull request Apr 22, 2021

Enable Kubernetes NMstate by default for selected platforms #747

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[machine-config-operator/baremetal] MCO declarative network configuration #399

[machine-config-operator/baremetal] MCO declarative network configuration #399

bcrochet commented Jul 13, 2020

bcrochet commented Jul 13, 2020

danwinship commented Jul 18, 2020

russellb commented Jul 20, 2020

crawford commented Jul 20, 2020

russellb commented Jul 21, 2020

celebdor commented Jul 21, 2020

celebdor commented Jul 22, 2020

ashcrow commented Jul 23, 2020

kikisdeliveryservice left a comment

cgwalters commented Jul 30, 2020

bcrochet commented Aug 20, 2020

kikisdeliveryservice left a comment •

edited

Loading

kikisdeliveryservice Aug 29, 2020

bcrochet Sep 2, 2020

kikisdeliveryservice Sep 3, 2020

cgwalters Aug 31, 2020

zshi-redhat Sep 4, 2020

bcrochet Sep 4, 2020

zshi-redhat Sep 4, 2020 •

edited

Loading

russellb Sep 4, 2020

zshi-redhat Sep 4, 2020

bcrochet Sep 4, 2020

bcrochet commented Sep 10, 2020

ashcrow Sep 11, 2020

kikisdeliveryservice commented Sep 11, 2020

kikisdeliveryservice commented Sep 11, 2020

kikisdeliveryservice commented Sep 11, 2020

dgoodwin commented Sep 11, 2020

cgwalters left a comment

openshift-ci-robot commented Sep 11, 2020

[machine-config-operator/baremetal] MCO declarative network configuration #399

[machine-config-operator/baremetal] MCO declarative network configuration #399

Conversation

bcrochet commented Jul 13, 2020

bcrochet commented Jul 13, 2020

danwinship commented Jul 18, 2020

russellb commented Jul 20, 2020

crawford commented Jul 20, 2020

russellb commented Jul 21, 2020

celebdor commented Jul 21, 2020

celebdor commented Jul 22, 2020

ashcrow commented Jul 23, 2020

kikisdeliveryservice left a comment

Choose a reason for hiding this comment

cgwalters commented Jul 30, 2020

bcrochet commented Aug 20, 2020

kikisdeliveryservice left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zshi-redhat Sep 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcrochet commented Sep 10, 2020

Choose a reason for hiding this comment

kikisdeliveryservice commented Sep 11, 2020

kikisdeliveryservice commented Sep 11, 2020

kikisdeliveryservice commented Sep 11, 2020

dgoodwin commented Sep 11, 2020

cgwalters left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Sep 11, 2020

kikisdeliveryservice left a comment •

edited

Loading

zshi-redhat Sep 4, 2020 •

edited

Loading