-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespaces for On-Prem Networking Services #412
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cybertron The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When support for on-premise deployments on baremetal, OpenStack, OVirt, and VSphere was added, some additional networking services were needed to provide functionality that would come from cloud services on other platforms. These services were added in namespaces called openshift-kni-infra, openshift-openstack-infra, etc. There are two problems with this: 1) The namespaces are created in all deployments, including ones on platforms other than the one that uses the namespace and 2) The openshift-x-infra name scheme does not match the existing pattern for OpenShift namespaces. This enhancement proposes a solution to both of those problems.
ddb00ae
to
844c00f
Compare
f3c0311
to
a105e8a
Compare
I've updated the doc to address the comments so far. Thanks for the input! |
I don't have a problem with the new name and fix (suggested doing this enhancement to settle it once and for all). I'm only concerned about upgrades/downgrades but if you feel that you can take those into consideration and have adequate e2e, it's fine with me. I don't think it's a terrible idea to leverage sync.go since that's where we do sort out some platoform/cloudconfig stuff. @runcom WDYT? |
/cc: @runcom |
/cc @runcom |
This seems like a lot of addition complexity for a cosmetic change. There are additional concerns for trying to change a namespace. A few I can think of:
|
|
||
For the naming, we propose `openshift-hosted-network-services` for all | ||
platforms that use these components. There is no need for a platform-specific | ||
name since only one platform can exist at a time anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why but by reading this, my thought was: "should we have a dedicated operator that does just this and have the MCO help with, perhaps, deploying the needed files when it has to"? This is more to have the MCO being the API for system configuration and other components leverage it w/o adding anything super specific to it. The counter-argument would be "but you already ship per-platform configurations". Right, we do but we have specialized controllers (kube, crio) that handle the lifecycle of these configurations and the lifecycle of other configurations is simple enough that an upgrade and an overwrite is sufficient. For things like these network services it may warrant a better lifecycle handler to me since those are effectively namespaced services.
I'm committing to finding a very old conversation where I'm pretty sure shipping these services (and ns) was something which we would have iterated on to find them a definite better place.
I'm familiar with this namespace issue/bz and what we're trying to solve but I think a better design, if possible, would be towards finding a better home, specialized for all these. Could be something like what happened with etcd-quorum-guard, here, we're talking about network services for on-premise platforms which sounds like a great specialization that we can provide a better and dedicated tool to handle the task. Should we explore a path like this?
I guess what I'm trying to say/ask is should we re-focus on a broader goal w/o focusing on the namespaces only which already sound like something "external" or pretty much a "side" of the MCO? I'm happy to help scope the issue further and provide any MCO help needed (even on our side if this helps with having a better MCO API that others can use w/o plugging directly into the MCO)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I've gotten used to those things living in the MCO repo, but as they get refined a baremetal & friends (openshift-hosted-network-services) network configs operator works too and makes a lot of sense - great points @runcom
IIRC the namespaces were eventually going to be moved to a better place? Could make more sense to make that move now rather than shoehorning this into the MCO as this seems to expand over time and overloading the MCO. Might be better suited in a dedicated operator that handles the namespace & networking for onprem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of separate operator has been in the pipeline since the inception, but we know there are pretty big hurdles to get there,
- work up a workflow that works for bootstrapping no dependent on scheduling a workload on control-plane because we need these services to run for the node to join the cluster itself.. and today the only mechanism is MCO and machine config.
- If the operator wants to manage the lifecycle of these components using machine configs, that means that we need some way during upgrades to syncronize the update of other configuration of machine with the changes as required by new version of the operator.. this is tricky because of reboots to apply the change by machine configs.
- If the operator decides to manage the components itself by manipulating the files on disk like other static-pod-operators like etcd, kube-apiserver, kube-scheduler, kube-controller-manger {operator} that means that there is one workflow to install using machine configs but lifecycle is managed by different process.
Figuring out and getting this separate operator is not going to be easy and quick. So as long as we know how difficult it would be to move to stand alone operator we need to weigh that to how quickly we need to get the namespaces fixed.
And we could probably move namespaces and then work on migrating to operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q is how much effort this implementation will be vs operator and if, after this implementation, any will to do the operator actually exists to make the bigger jump... things tend to expand and never leave the MCO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figuring out and getting this separate operator is not going to be easy and quick. So as long as we know how difficult it would be to move to stand alone operator we need to weigh that to how quickly we need to get the namespaces fixed.
And we could probably move namespaces and then work on migrating to operator.
as Kirsten already asked, I think it's crucial to understand a path to get there. I would love to hear more on the priority of moving those namespaces though, we've been living with them for quite a while now, I'd be surprised by the urgent priority all of sudden, which makes me think that we might better use our time to tackle the broader problem. I think it has been said that "moving the namespaces" isn't as easy as a rename and as Erica pointed out above, this requires logic in the MCO to support a migration, which again tells me we might pursue the broader issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our perspective, we're trying to respond to https://bugzilla.redhat.com/show_bug.cgi?id=1815561 appropriately based on the priority it was assigned and the timeframe presented in the description (which we've already missed). I tend to agree that it's a largely cosmetic issue, but it isn't up to me. :-)
I believe that as a team, we would still like to pursue the separate operator approach at some point. However, as Abhinav noted above that's going to be a large piece of work and it's not likely to happen in the near or even medium term, if at all given the response to the last new operator we proposed. I think recent discussions have raised questions for me as to whether it even makes sense to create a new operator that is going to be that tightly coupled to MCO.
I'm happy to have these discussions, but I'm pretty sure I don't have the authority to indefinitely postpone fixing this bug so if that's the way we choose to go I need someone else to make that call.
openshift-openstack-infra, etc. There are two problems with this: 1) The | ||
namespaces are created in all deployments, including on platforms other | ||
than the one that uses the namespace and 2) The openshift-x-infra name scheme | ||
does not match the existing pattern for OpenShift namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existing pattern for OpenShift namespaces
Do we have link to these existing pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just what was written in https://bugzilla.redhat.com/show_bug.cgi?id=1815561 AFAIK:
"In general, namespaces must define a component (infra is not a component) and follow an existing pattern (-operator for operators, * for operand). Also, optional component namespaces should not be created for platforms that are not installed. If the component is a singleton per platform, then they should reuse the same namespace (since a cluster can have only one platform)."
[resources to be synced](https://github.com/openshift/machine-config-operator/blob/master/pkg/operator/sync.go) | ||
in the machine-config-operator. This will allow us to add logic that considers | ||
the platform before creating them. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So static pods are used to provide these services right? and these static pods require the namespace to exist before they are run right? What about service accounts and rbac i'm hoping the same thing right?
If we are depending on machine-config-operator sync.go that means it is going to create and install the namespaces when it starts running, but to run the operator you need the control-plane nodes. So i think we need to document how this will be handled during bootstrapping too as that is important part of the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some of these static pods also run on bootstrap host and the requirement of namespaces, rbac etc would be true, how do we plan to handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure they do require the namespace before they run. I don't entirely understand how, but according to the commit message on openshift/machine-config-operator@b3ab7d8 we were previously running these pods without the namespace. Apparently the only issue was accounting for the resource usage on the nodes.
I also don't believe we have any service accounts or rbac for these. Any interaction with the api is done via the host's kubeconfig.
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
The services in question provide things like internal DNS resolution between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph seems out of place in this section of implementation as it explains what these internal services are?
what we should include here is,
-
how we create the resources on bootstrap host
who(binary) is going to be responsible for this
what inputs we use and from where,
where we write files as output -
how we create the necessary resources on bootstrap host for static pods that will run on the control plane
who(binary) is going to be responsible for this
what inputs we use and from where,
where we write files as output -
how do we migrate existing clusters to move from previous namespaces to this new namespace
who(binary) is going to be responsible for this
What exactly we will be doing, and when (during the upgrade time).
What are our failure cases and how we intend to get these fix them.
How are we going to make sure the migration is zero down time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was just trying to provide context for what we're talking about here. I can move it somewhere else and try to address those points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we would want the MCO to handle this. CVO creates the manifests today, it should be in charge of filtering what gets applied and what doesn't. We already do that today based on annotations: https://github.com/openshift/cluster-version-operator/blob/master/pkg/payload/payload.go#L183
exclude.release.openshift.io/internal-openshift-hosted: "true"
If there are certain operators we know will only be included on a particular infrastructure type, we could make an include-when type annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't operators though, they're static pods defined by a manifest in /etc/kubernetes/manifests. To my knowledge, MCO has been responsible for things like that in the past. If we want to move the manifests to CVO that may be possible, but it's worth noting that MCO also templates the configurations for some system services related to this so there is some value to having all of that happen in the same place.
### Risks and Mitigations | ||
|
||
The only significant risk should be the migration from the existing names | ||
to the new ones. We are already creating namespaces for these services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already creating namespaces for these services.
This proposal only changes how we do that and what those namespaces are
called.
I do not think it is that simple right? we need to migrate all our existing cluster from previous namespaces to this namespace as part of the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong, but since these are static pods I thought that on upgrade MCO would write the new manifests and kubelet would kill the old version and start the new one. AFAIK we don't depend on anything in the namespace, it's just an accounting requirement for resource tracking.
|
||
### Version Skew Strategy | ||
|
||
Version skew should not be a major concern. The pods may be running in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be 2 sets of components running at the same time, old version in previous namespace and new version in the new namespace?
Will that be problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. As I noted in my comment above, I believe kubelet will kill the old pod once we write the new manifest and then start the new pod in the new namespace. That should take care of any overlap within the node, and across nodes we don't really care. It doesn't matter what namespace our pods are running in, so if one node has them in the old namespace and another has moved to the new one it shouldn't matter to us.
|
||
### Upgrade / Downgrade Strategy | ||
|
||
On upgrade, we will want to clean up the old namespace after moving the static |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will want to clean up the old namespace after moving the static
The design doesn't talk anything about the removal of namesapces, so are we or we not? because if we intent to we should probably include that in implementation.. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree and I think they will need to be removed to make this a clean and clear transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the old namespaces is a key part of this I think, thus ending up with 1 ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to remove the old namespaces. Their existence on all platforms was one of the objections raised in the bug. I'll clarify that in the design section.
On upgrade, we will want to clean up the old namespace after moving the static | ||
pods to the new one. | ||
|
||
Downgrade may be tricky. The old version of MCO won't have logic to re-create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our intention then wrt downgrade, say they are not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any way to support downgrades, but I don't know a lot about how that process works either. If there's some way for us to recreate the old namespaces before the downgrade then we could make it work. I guess I need to add this to the open question section.
|
||
## Alternatives | ||
|
||
It's possible the installer could do something with these namespaces. It will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is real alternative? Would love to see some details on how you think installer could change the namespaces? And what about existing cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was considering the installer for this I was only looking at initial installation. You're right that it doesn't make sense for the migration case. This was mostly to show that I had considered other options and chose to go with MCO instead.
|
||
### Test Plan | ||
|
||
This would be covered by the same tests as the existing namespaces. Each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there adequate e2e upgrade coverage for these platforms? with tests that are consistently green? Existing test against MCO doesn't cover upgrade cases just a deployment no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I know we have upgrade testing because we're getting upgrade bugs reported, but it isn't integrated into ci right now. I'll add a note about that.
I've touched on this in some of my other comments, but just to explicitly respond to these points: I don't see this namespace as a user-facing thing. It's for hosting OpenShift infrastructure pods (hence the old name ;-). I admit to not knowing what our policy on namespaces is, and if they're all considered fair game for users to do with as they see fit then maybe we do have a problem. I would not expect users to be running things in these, however. The tooling point is interesting though. You're right that if a user were doing some sort of monitoring of these pods then that would break if the namespace changes. Is the namespace considered part of the API and if so is there any mechanism to ease this transition? I'm not immediately coming up with a good way to handle this that doesn't essentially require users to update their tooling when they update to the new version. This will be an issue regardless of when/how we move the namespace. Moving these components to a new operator is still going to break anything that looks for them in the old location. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When support for on-premise deployments on baremetal, OpenStack, OVirt, and
VSphere was added, some additional networking services were needed to provide
functionality that would come from cloud services on other platforms. These
services were added in namespaces called openshift-kni-infra,
openshift-openstack-infra, etc. There are two problems with this: 1) The
namespaces are created in all deployments, including ones on platforms other
than the one that uses the namespace and 2) The openshift-x-infra name scheme
does not match the existing pattern for OpenShift namespaces.
This enhancement proposes a solution to both of those problems.