Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[machine-config-operator/baremetal] MCO declarative network configuration #399

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

bcrochet
Copy link
Member

Enhancement proposal to extend MCO for declaritive network configuration.

@bcrochet
Copy link
Member Author

/cc @cgwalters

@danwinship
Copy link
Contributor

/retitle [machine-config-operator] MCO declarative network configuration

@openshift-ci-robot openshift-ci-robot changed the title [machine-config-operator] MCO declaritive network configuration [machine-config-operator] MCO declarative network configuration Jul 18, 2020
@russellb
Copy link
Member

There was a past nmstate related enhancement. Why did this one get created vs evolving the existing one?

@crawford
Copy link
Contributor

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

My biggest concern with this approach is that this takes us from MachineConfigs (typically) being used to describe a pool of machines to them describing a single machine. It's going to be very cumbersome to use this mechanism to configure a cluster full of unique machines (e.g. each machine needs to be configured with a static IP address) and I imagine we are almost immediately going to be asked to build a templating mechanism so that customers don't need to create a MachineConfigPool for every machine.

@russellb
Copy link
Member

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

Right, but I remember feedback pushing that enhancement toward discussing the problem more generally and also evolving the proposal to be better integrated with MCO, so I expected something like this just to come in as another revision of the existing enhancement. At a minimum I'd expect them to link to each other to explain the relationship. Is the other one now deprecated and should be closed? Are they to be considered as alternatives to each other?

#161

@celebdor
Copy link
Contributor

@russellb as I recall, the previous enhancement was for kubernetes-nmstate, whereas this one only mentions nmstate (the non-kubernetes variant) as an implementation detail.

Yes. The biggest objection to the original enhancement, which proposed adding the kubernetes upstream kubernetes-nmstate, was that it added a new way to configure Nodes separate from MachineConfig and that it potentially delayed problems to the following reboot.

It had other issues such as not having a rollout functionality (though it has rollback).

My biggest concern with this approach is that this takes us from MachineConfigs (typically) being used to describe a pool of machines to them describing a single machine. It's going to be very cumbersome to use this mechanism to configure a cluster full of unique machines (e.g. each machine needs to be configured with a static IP address) and I imagine we are almost immediately going to be asked to build a templating mechanism so that customers don't need to create a MachineConfigPool for every machine.

This issue was not present in the kubernetes-nmstate Enhancement proposal. The current proposal basically keeps the API that kubernetes-nmstate users are familiar with and adds and solves the other objections. The biggest change for kubernetes-nmstate users, an a significant one at that, is the granularity. Doing anything with MachineConfig that does not target the entire MachinePool is not really possible at the moment.

Of course, there's ways to work around it. Of the top of my head, we could keep having the user do NodeNetworkConfigurationPolicy and the controller (as in the current proposal), writes multiple files to the MachineConfig for the appropriate MachineConfigPool. That implies that each node in a MachinePool will have several of these files written to its network configuration directory, with each file being targeted at a specific node. I believe that this is fine since:

  • the API is NNCP and all MachineConfig writes are performed by the operator controller, so the MachineConfig with all the files for different nodes are just an implementation detail.
  • In the enhancement MCD is stated to be able to detect when a config change is network related, so changing the config for other nodes, will not only not mean a reboot, but if the node network config file did not change for that node (and the NodeNetworkState matches), it will just be a noop that bumps the currentConfig.

@celebdor
Copy link
Contributor

Is the other one now deprecated and should be closed? Are they to be considered as alternatives to each other?

That is a good question. I could see both, but I thought that this supersedes the old one. I certainly hoped for the approach proposed here to get an approval quickly since it addresses the objections to the other.

@ashcrow
Copy link
Member

ashcrow commented Jul 23, 2020

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to get up to speed: left some questions/comments. I'm not clear on :is this 1 new rendered-config per node? or 1 new rendered config with a bunch of different NM settings for each node? The former seems like a very confusing ui, the latter means that a MC is no longer an easy to grok state for a set of nodes but needs to be parsed closely to figure out what is on each node with different nodes potentially succeeding and failing from the same config? (Note: if 1 node fails on a config we stop applying it to the pool - what do we do in this case?)

This is proposing 2 major things the MCO doesn't actually do yet at once: per node configuration and rebootless updates. Is there a way to tackle this in stages?

I don't doubt that this can be implemented in some way, but I do worry about the maintainability & what issues this will raise when troubleshooting - MCO team will be tasked with maintaining all of this when we have no expertise along with the issues that will come from those 2 very major changes. I also have a concern that the MCO will start being tasked with per node configurations across entire clusters which is the opposite of the MCO (and the problem that I thought we were moving away from).

I'd like to hear about the alternative Option A: "include kubernetes-nmstate as a standalone component within OpenShift" Would this make the network configuration more easily maintainable, etc… and why this is a bad choice for this specific chagne. Especially if as Alex mentioned this might require some tooling to create configs, it seems like this nmstate standalone component might make sense from a maintainability perspective?

@cgwalters
Copy link
Member

There's a whole lot going on here, but overall
/approve
Basically as long as we have a plan for how nmstate and the MCO interact and can coordinate, that addresses my main concern.

@openshift-ci-robot openshift-ci-robot changed the title [machine-config-operator] MCO declarative network configuration [machine-config-operator/baremetal] MCO declarative network configuration Aug 20, 2020
@bcrochet
Copy link
Member Author

/cc @cgwalters @runcom @kikisdeliveryservice @ericavonb
Please have a new look at the proposal. We are putting forth an alternative that does not require immediate changes to MCO that go against it's current purpose.

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I watched the video and re-read this several times and Option C seems like a pretty stable approach at least vis-a-vis the MCO

On the MCO side, you'd be making changes to /templates/common/baremetal (or somewhere in templates) to laydown the files/create .mount unit, then post kubelet leverage kubernetes-nmstate to make changes to that /tmp/nm-system-connections?

If eventually the user lays down some other MC that causes a reboot the MCO goes thru the same thing above, /tmp/nm-system-connection changes would be lost and the kubernetes-nmstate post-kubelet would do it's thing and reconcile the /tmp/nm-system.. again?

But just to be clear those initial template changes laid down by the MCO would happen during an upgrade to 4.x (which would obviously incur a reboot), but the idea is that when the user decides to leverage kubernetes-nmstate they won't incur an additional reboot to set that using NodeNetworkConfigurationPolicy via nmstate.io ? or that if they want to make changes they'd be using the NNCP to make the change also not incurring a reboot.

Do I have it correct?

If you need persistent changes, you do them with MCO.
Kubernetes-nmstate will only affect post-kubelet so we comply with the "MC owns every configuration until boot time".

[Link](https://asciinema.org/a/uMqwIpvfhuI67ShT12csYXm3h) to an asciicast showing this option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This video is really really helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you have it correct. We skip the reboot, and the configuration for kubernetes-nmstate is ephemeral on disk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thanks!

2. Create two temporary directories for overlay purposes.
3. Mount an overlay combining the NetworkManager standard /etc/NetworkManager/system-connections and mount it at the path pointed to by 1. e.g., /etc/NetworkManager/system-connections-merged

kubernetes-nmstate will then operate as is, but the keyfiles that ultimately are written by nmstate would effectively be ephemeral. When a node is rebooted, the kubernetes-nmstate-handler will re-process any existing NodeNetworkConfigurationPolicy CRs, and put the configuration back in place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside but recently I wrote: https://blog.verbum.org/2020/08/22/immutable-%E2%86%92-reprovisionable-anti-hysteresis/

It's not that we want the configuration written by nmstate to be ephemeral per se - it's that the "source of truth" should live in one place - in this case, etcd (and not /etc). By having the implementation of this be ephemeral we're avoiding hysteresis during a node boot - there's no dependency during the boot on the previous configuration.


* Replace SRIOV operator
* Configure and control primary interface on Day 1, including bonds and VLANs.
* Part of the effort for that: [Add MCO Flattened Ignition proposal](https://github.com/openshift/enhancements/pull/467)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcrochet thanks for adding the notes here, one minor follow-up, I think the proposal is not going to work in day 2 for primary node interface, agree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume not. But don't quote me on that.

Copy link
Contributor

@zshi-redhat zshi-redhat Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to update this statement to Configure and control primary interface on Day 1 and Day 2 as the non-goal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any protection against trying to use this to adjust primary interface configuration if it's not intended to be used that way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a protection mechanism in current proposal (this is also true for additional host interfaces if multiple components try to configure the same device), my understanding of the flow here is it will fail in the case that network configuration is wrong, deleting the NNCP + rebooting node(s) would erase the ephemeral network configuration and recover.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the failure scenario, it wouldn't even require a reboot. But to answer @russellb , no, there is no protection other than that the admin would have to affirmatively try to do it.

…tion

Enhancement proposal to extend MCO for declarative network configuration.
@bcrochet
Copy link
Member Author

/assign @kikisdeliveryservice

authors:
- "@bcrochet"
reviewers:
- "@cwalters"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get some MCO team members added here as well. @runcom, @kikisdeliveryservice, @yuqi-zhang, @ericavonb, @sinnykumari

@kikisdeliveryservice
Copy link
Contributor

I'm happy with Option C and really excited you were able to come with it. Thank you for your hard work on this @bcrochet !

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 11, 2020
@kikisdeliveryservice
Copy link
Contributor

somehow @cgwalters approval didnt end up tagging this?

@kikisdeliveryservice
Copy link
Contributor

ooh we need someone from the Owners file for that final approval..

/assign @dgoodwin

@dgoodwin
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 11, 2020
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bcrochet, cgwalters, dgoodwin, kikisdeliveryservice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.