RFE: Move user-data secret creation from the installer into the MCO #683

wking · 2019-04-30T08:41:30Z

Currently the installer creates pointer Ignition configs which end up as user data secrets referenced by the installer-generated Machine(Sets). That makes things like the Ignition v2 (spec 3) transition tricky, because you need to touch both the installer (openshift/installer#1468) and MCO (#583) with some kind of ratcheting.

I think we should drop the installer's pointer Ignition-config secrets and instead have the MCO create pointer Ignition-config secrets for any roles it supports (currently master and worker). This consolidates knowledge of supported roles their configuration in the MCO and simplifies the installer->MCO interface. The MCO can put the MCS cert into the pointer configs without having to worry about cross-repo communication (and it is better positioned to update the secrets if/when the MCS cert rotates). The MCO can switch Ignition versions without involving the installer. The MCO can move the MCS to a new URI scheme without involving the installer. The interface is reduced to just the installer relying on "the cluster will have master-user-data and worker-user-data secrets".

Thoughts?

The text was updated successfully, but these errors were encountered:

cgwalters · 2019-05-13T19:11:40Z

I had forgotten about this issue, but actually came back to this idea as part of #736

cgwalters · 2019-05-15T14:34:07Z

I thought about this more, and it makes sense to do this - but I don't see how we get out of the installer understanding Ignition because it needs to generate it for the bootstrap machine and currently the masters.

And improving this actually introduces a new cross-dependency between the MCO and machineAPI operator. Which is a good thing I think but currently the installer pregenerating the userdata implicitly avoids a dependency.

This is prep for fixing openshift#683 Today the installer generates this "pointer Ignition" which ends up as e.g. user-data in AWS, which is just a pair of (MCS URL, root CA). We need to do this because of size limitations on AWS user data. It makes a lot of sense for the MCO to be in control of generating the pointer ignition config too, as it helps centralize knowledge of Ignition. Ultimately, the goal here is tighter integration between machineAPI and the the MCO; e.g. the machineAPI asks to generate the pointer Ignition config, rather than having it stored as a static secret. This could then unlock the ability for the MCO to inject e.g. one time auth tokens.

cgwalters · 2019-05-15T19:21:03Z

PR in #759

cgwalters · 2019-05-16T19:06:19Z

Discussed this with some machineAPI folks today. One thing that came out of that is that a good likely design is for the MCO to watch for creation of new machine objects (note: targeting our own cluster), and then create a secret/userdata-$machine in the openshift-machine-api namespace. The machineAPI would spin until that secret was created, and then launch the underlying instance.

And we'd clean up the generated secrets when a machine was deleted.

derekwaynecarr · 2019-05-16T19:27:11Z

@cgwalters i like that option. seems promising.

runcom · 2020-06-05T09:05:30Z

The machineAPI would spin until that secret was created, and then launch the underlying instance.

by reading the code in the machine-api-operator - it's not the MAO which would spin but all the providers must be tweaked to spin (https://github.com/openshift/installer/blob/master/pkg/asset/machines/aws/machines.go#L106) which makes it trickier afaict - I'll sync with Alberto on this

runcom · 2020-06-05T09:18:11Z

and then create a secret/userdata-$machine

the change of name here to be per-machine will imply also that MachineSets aren't holding that anymore (meaning more changes in MachineSets as well I guess)

enxebre · 2020-06-05T09:54:14Z

by reading the code in the machine-api-operator - it's not the MAO which would spin but all the providers must be tweaked to spin (https://github.com/openshift/installer/blob/master/pkg/asset/machines/aws/machines.go#L106) which makes it trickier afaict - I'll sync with Alberto on this

yeh you are right, but that's our burden, it's a limitation in our API and we'll have to deal with it, e.g we could let providers return a well known error type and let the core machine controller react accordingly, or consider deprecating the providers userData field, or have a new backward compatible machine.enableAutomaticBootstrapping=true...

One thing that came out of that is that a good likely design is for the MCO to watch for creation of new machine objects (note: targeting our own cluster), and then create a secret/userdata-$machine in the openshift-machine-api namespace. The machineAPI would spin until that secret was created, and then launch the underlying instance.

sgtm. Do you think we could try to extend this identity awareness to completely remove the need for the machineApprover?
One thing to keep in mind while making MCO aware of machines is that MCO still needs to support scenarios where the machine api might not be operational.

Would the MCO have the ability to track if that owned secret resulted in a successful node or a failure? If so could we let this new controller this signal back to the machine object i.e as a condition to leave a breadcrumb for users on why that machine didn't become a node?

JoelSpeed · 2020-06-05T10:29:43Z

As far as I understand this, as Machine API, we would need to drop the user-data secret reference in each of our providers and instead have some sort of role field so that MCO knows whether the machine is a worker or master. This would probably be better on the machine rather than the providerspec so that MCO can read it easily and MAPI can spin waiting for it before calling to the actuators. We wouldn't need to update the actuator interface, they would just need to know how to find the secret from the MachineSpec rather than ProviderSpec.

If we didn't deprecate the user-data-secret field, it would somehow need to be modified as machines are created wouldn't it? We would still want it to point to a valid secret name, but the MachineSet can't have it set to the right name. A webhook could do this, but it feels a little clunky?

sgtm. Do you think we could try to extend this identity awareness to completely remove the need for the machineApprover?

Would your suggestion be to move to unique boostrap tokens per machine and move back to the built in K8s CRS approving mechanisms?

Would the MCO have the ability to track if that owned secret resulted in a successful node or a failure? If so could we let this new controller this signal back to the machine object i.e as a condition to leave a breadcrumb for users on why that machine didn't become a node?

If it didn't track that a node joined the cluster, how would it clean up resources that are no longer needed? Time based? Owner Ref to Machine?

wking · 2020-06-05T13:43:41Z

This issue predates openshift/enhancements, but if there is renewed interest it might make sense to port to an enhancement so you can freeze out places which have consensus. I'm fine either way, but thought I'd mention it in case folks were interested.

runcom · 2020-06-08T09:45:00Z

I'm going to create an enhancements but I've been thinking a lot about this the past few days. Given the current schedule I think coupling MCO/MAO is going to be non-trivial to accomplish in the timeframe we have to ship ignition v3, or well, it will require more than than this idea I came up with w/o tightly coupling Machines/Sets into MCO:

The secret changes namespace to openshift-machine-config-operator
The MAO is adjusted to read the secret from that namespace (1)
MCO manages the lifecycle of that secret directly in the MCO (as it would have been anyway) so we can support v3 upgrades
The MAO grab the secret
Everything works the same w/o renaming the secret to be per-machine (so the MCO doesn't depend on MAO and viceversa)

I think the above is a good approach and we leverage the controllers we already have in place in MCO/MAO and we use the Status's of the CRD to synchronize between controllers and it requires less changes to both projects (MAO: 2 major? changes)

Let me know what you all think meanwhile (/cc @yuqi-zhang)

cgwalters · 2020-06-08T13:27:38Z

Overall 👍 from me! This sounds like a simple integration, which is good because it'll be easy to implement and understand, and we can do something more sophisticated later.

But, let's at least briefly look at

Basically if we can create a flow where it's more like the MAO can use an annotation on its machineset that selects the secret (which we need for control plane vs worker anyways) then we can extend it to generic pools and ideally eventually machine-specific configs.

runcom · 2020-06-08T13:59:42Z

But, let's at least briefly look at

enhance flow for custom machineconfigs for specific machinesets #1619

machine-specific machineconfigs #1720

Basically if we can create a flow where it's more like the MAO can use an annotation on its machineset that selects the secret (which we need for control plane vs worker anyways) then we can extend it to generic pools and ideally eventually machine-specific configs.

don't get me wrong, I 100% support that per-machine secret and it indeed would allow for more sophisticated scenarios like the one highlighted there! I think that design would be best once we move forward but I feel the migration to ignition v3 brings alone a lot of complexity and pitfalls that I'd love to remedy by starting with something simpler (hope it makes sense). I also think the per-machine secret would still be valid once we go the simpler path, the simpler path itself isn't a stopgap but a simpler implementation so it makes sense in the future to allow per-machine secrets to implement things like #1619 #1720

runcom · 2020-06-09T13:05:01Z

I think we should drop the installer's pointer Ignition-config secrets and instead have the MCO create pointer Ignition-config secrets for any roles it supports (currently master and worker).

I just realized we can do this only in the code but the installer should still create the master|worker.ign for UPI to work correctly and the MCO isn't running until bootkube.

michaelgugino · 2020-06-23T17:55:33Z

Discussed this with some machineAPI folks today. One thing that came out of that is that a good likely design is for the MCO to watch for creation of new machine objects (note: targeting our own cluster), and then create a secret/userdata-$machine in the openshift-machine-api namespace. The machineAPI would spin until that secret was created, and then launch the underlying instance.

And we'd clean up the generated secrets when a machine was deleted.

The abstraction that I'd like to see implemented can be summarized as such:

ObjRef:
    Name: string
    Namespace: string
    Kind: string (or some other concrete kind type)
    KeyName: string
    PrependSelfName: bool

Machine:
...
ProviderSpec: {}
UserObjects: [] ObjRef

The machine-controller could fetch 1 or more objects per machine with the dynamic client during create. These objects would be passed to each actuator (cloud provider) as map[string]interface. Each actuator would verify the type and presence of the items fetch.

Consider the fields in ObjRef. Name and Namespace are obvious. KeyName is what goes in the key portion of 'map[string]interface'. The actuator knows already what keys it wants. PrependSelfName is meant to prepend the machine's own name to the object's Name it's looking for.

Example:

Machine:
  ObjectMeta:
    Name: worker-1
  Spec:
  ...
  userObjects:
  - Name: "-user-data"
    Namespace: "openshift-machine-api"
    Kind: "secret"
    KeyName: "instance-user-data"
    PrependSelfName: true

This would cause the machine-controller to request the secret named "worker-1-user-data" from namespace openshift-machine-api. If that object is found, that object is placed into the map we pass to the actuator. EG:

c.Get(mything)
userObjects["instance-user-data"] = mything
actuator.Create(userObjects ...)

This kind of abstraction could be widely supported, I'm sure there are other k8s components that need to do similar things.

In this manner, we could support all the clouds similarly, and since it supports arbitrary objects, is quite extensible.

openshift-bot · 2020-10-23T01:05:07Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-11-22T02:56:33Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-12-22T04:49:23Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-12-22T04:49:39Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters mentioned this issue May 15, 2019

WIP: server: Add /pointerconfig API #759

Closed

runcom mentioned this issue Jun 10, 2020

Manage the ignition stub config #1792

Merged

runcom mentioned this issue Jun 23, 2020

Automatically create user-data secret for each MCP openshift/enhancements#319

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 22, 2020

openshift-ci-robot closed this as completed Dec 22, 2020

zaneb mentioned this issue Nov 9, 2021

Manage user data #2827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: Move user-data secret creation from the installer into the MCO #683

RFE: Move user-data secret creation from the installer into the MCO #683

wking commented Apr 30, 2019

cgwalters commented May 13, 2019

cgwalters commented May 15, 2019

cgwalters commented May 15, 2019

cgwalters commented May 16, 2019

derekwaynecarr commented May 16, 2019

runcom commented Jun 5, 2020

runcom commented Jun 5, 2020

enxebre commented Jun 5, 2020 •

edited

Loading

JoelSpeed commented Jun 5, 2020

wking commented Jun 5, 2020

runcom commented Jun 8, 2020 •

edited

Loading

cgwalters commented Jun 8, 2020

runcom commented Jun 8, 2020

runcom commented Jun 9, 2020

michaelgugino commented Jun 23, 2020

openshift-bot commented Oct 23, 2020

openshift-bot commented Nov 22, 2020

openshift-bot commented Dec 22, 2020

openshift-ci-robot commented Dec 22, 2020

RFE: Move user-data secret creation from the installer into the MCO #683

RFE: Move user-data secret creation from the installer into the MCO #683

Comments

wking commented Apr 30, 2019

cgwalters commented May 13, 2019

cgwalters commented May 15, 2019

cgwalters commented May 15, 2019

cgwalters commented May 16, 2019

derekwaynecarr commented May 16, 2019

runcom commented Jun 5, 2020

runcom commented Jun 5, 2020

enxebre commented Jun 5, 2020 • edited Loading

JoelSpeed commented Jun 5, 2020

wking commented Jun 5, 2020

runcom commented Jun 8, 2020 • edited Loading

cgwalters commented Jun 8, 2020

runcom commented Jun 8, 2020

runcom commented Jun 9, 2020

michaelgugino commented Jun 23, 2020

openshift-bot commented Oct 23, 2020

openshift-bot commented Nov 22, 2020

openshift-bot commented Dec 22, 2020

openshift-ci-robot commented Dec 22, 2020

enxebre commented Jun 5, 2020 •

edited

Loading

runcom commented Jun 8, 2020 •

edited

Loading