Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a provisioning token for the Machine Config Server #443

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.


For installer-provisioned infrastructure, this will be much easier to implement for installations where the `user-data` provided to nodes is owned by the MCO: https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/user-data-secret-managed.md

In order to avoid race conditions, the MCS might support the "previous" token for a limited period of time (e.g. one hour/day).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rotating the token will have implications for requirements for the ability to stage bare metal nodes before delivering them to remote sites for telco customers. If the worker node is installed on day 1, then it takes 3 days for it to be delivered to the site of a tower where it is powered back on and connected to the cluster, the token it has will have expired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this!

The short answer would be "don't rotate the token".

A better fix needs a lot more information on exactly what "stage" means. I think a generally good workflow we want to support in CoreOS is using coreos-installer in this "staging" site and configuring networking etc. and have the "pointer ignition config" ready to fetch the remote target. In this scenario, the staging site would definitely need to use the correct token in the pointer config.

I think the answer is probably just "don't rotate the token while any nodes are being staged".

Particularly for bare metal as I mention at the end, I think the clearly best solution is using TPM2 (e.g. in this case the "staging site" can communicate the hardware identity of the machine to the target in advance even), but that's much more involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this!

The short answer would be "don't rotate the token".

I think that means there would be no IPI for bare metal with remote workers, which would defeat the goal.

A better fix needs a lot more information on exactly what "stage" means. I think a generally good workflow we want to support in CoreOS is using coreos-installer in this "staging" site and configuring networking etc. and have the "pointer ignition config" ready to fetch the remote target. In this scenario, the staging site would definitely need to use the correct token in the pointer config.

I'm not sure what the policy about linking to product requirement tickets in GitHub reviews is, so I'll try to summarize. We need to install the host at one location so that it is configured to join its cluster properly (that may mean the pointer config, or something else, it's unclear). Then some of the hosts will be physically moved to different remote sites, plugged into the network, and they need to rejoin the cluster without any user intervention (we can't assume the person installing the hardware is qualified to do more than rack it and plug it in).

IPI for bare metal does not use coreos-installer, it writes a disk image to a block device, including the pointer ignition configuration. See https://github.com/openshift/enhancements/blob/master/enhancements/baremetal/baremetal.md

It sounds like you're suggesting that during staging the image and ignition pointer should be written to the host, but the host shouldn't be booted until it is at the remote location to preserve the pointer configuration. Is that right? I don't expect that to be acceptable given the usual burn-in and validation requirements for these sorts of use cases.

I think the answer is probably just "don't rotate the token while any nodes are being staged".

Is the lifetime for the automated rotation configurable?

What happens if the token does rotate after a host leaves the staging site and before it is connected back to the cluster? Does the user ship the host back to the staging site to be re-imaged?

Particularly for bare metal as I mention at the end, I think the clearly best solution is using TPM2 (e.g. in this case the "staging site" can communicate the hardware identity of the machine to the target in advance even), but that's much more involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I had missed you're talking about IPI metal, not UPI. I think we probably need a whole section on this - the core problem here is that the "automatic IPI" case hinges on the MCO managing the user data, but it's not clear to me that applies in IPI/machineAPI metal (does it?).

In other words, forget about staging for a second - would this enhancement work as described in IPI metal or not?

Then once we have that part figured out...we can think about how we handle rotation.

It might be that for IPI metal, we don't rotate by default - or we add an install-config.yaml option to disable automatic rotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, forget about staging for a second - would this enhancement work as described in IPI metal or not?

To rephrase; does https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/user-data-secret-managed.md apply in IPI metal today?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal IPI uses the user-data secret to pass the ignition pointer configuration to the host via cloudinit. So, yes, I think the user-data-secret-managed enhancement applies to metal IPI as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metal-ipi we also need to consider the issue of how to access the MCO managed config from the cluster-api-provider ref openshift/machine-config-operator#1690 - today we only provide the pointer config via the config-drive, but that means a chicken/egg issue with advanced network configurations.

With this proposal, would we expect a privileged in-cluster controller to be able to access the MCS, or would we instead have to retrieve the underlying MachineConfig directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this proposal makes hacking things absent openshift/machine-config-operator#1690 significantly more difficult - a privileged pod could get the token to talk to the MCS too.

(It doesn't help either of course)


## Summary

This enhancement proposes that for user-provisioned installations we optionally support generating a "provisioning token":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we have customers with the use case "As an administrator, I want unprivileged pods to be able to bootstrap themselves to cluster-administrator, so that untrusted users and outside attackers can totally pwn me more easily", this should not be optional.

If there are problems with this scheme such that we can't force everyone to use it then I don't think it solves the underlying problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, I have adjusted things to make clear this is on by default.

enhancements/machine-config/auth-token.md Show resolved Hide resolved
@cgwalters cgwalters force-pushed the mco-auth-token branch 2 times, most recently from 125a5b1 to 9b733e7 Compare August 19, 2020 15:53
@runcom
Copy link
Member

runcom commented Aug 24, 2020

cc @ericavonb

@derekwaynecarr
Copy link
Member

@csrwng please review.

as part of rhcos enablement in ibmcloud managed service, we have discussed putting an auth-proxy in front of ignition server.

@csrwng
Copy link
Contributor

csrwng commented Aug 24, 2020

Thanks @derekwaynecarr, this looks to do exactly what we want to do for ibmcloud.
@cgwalters What's the mechanism that would be used to pass the token to the MCS? Is it a header? url param? something else?

@cgwalters
Copy link
Member Author

@cgwalters What's the mechanism that would be used to pass the token to the MCS? Is it a header? url param? something else?

This proposal is suggesting a URL parameter; doesn't require any changes to Ignition. I'd be fine with a header too if we could think of a good reason for that.

(Long term again I think we clearly want a better "node identity" system, so for this particular enhancement let's do the simple thing that will solve the immediate problem across the board)

@derekwaynecarr
Copy link
Member

for the managed service context, i feel like we may want to manage the secret external to the cluster. we need to work through the details of that before proceeding too much further.

@cgwalters
Copy link
Member Author

for the managed service context, i feel like we may want to manage the secret external to the cluster. we need to work through the details of that before proceeding too much further.

Isn't this the same as the UPI case? I suspect externally managed clusters are also probably fine with not rotating the token to start - but if they chose to do so it'd just be a matter of the external management doing oc -n openshift-config edit secret/machineconfig-auth (a kube resource) and also updating the user-data provided to nodes (however that happens, usually in UPI scenarios it's not a kube resource).

@csrwng
Copy link
Contributor

csrwng commented Aug 24, 2020

for the managed service context, i feel like we may want to manage the secret external to the cluster. we need to work through the details of that before proceeding too much further.

@derekwaynecarr I would assume that if we're thinking of running the MCS on the control plane side, the secret would live there as well.

@danwinship
Copy link
Contributor

This proposal is suggesting a URL parameter

Hm... no, the original version said

When configured, requests to the Machine Config Server will require this token to be provided via e.g. GET /config/worker&token=<value>.

but that text got removed and now it doesn't discuss how the token is provided.

Anyway, does the MCS still support plain http? And if so, should it stop now to avoid exposing the token (and the secret data in the response)? (Although if an attacker is able to snoop traffic on the node network, then you're probably already well on your way to losing anyway...)

@derekwaynecarr
Copy link
Member

@csrwng

keeping the secret on the management plane (external to the user cluster) makes sense, i just wanted to make sure that we can keep that as an option.

@kikisdeliveryservice
Copy link
Contributor

@ericavonb
Copy link
Contributor

I'm coming around to to this proposal. The main advantage being that it doesn't require changes to ignition, which most of the more complete solutions would. It moves us a bit closer to a more secure design by at least making it a bit harder to grab the config just by having network access.

With that in mind, I want to note some weaknesses so we're clear on what level of security we get with this:

  • Sending token via a URL param in a GET request isn't recommended usually
  • We don't have strong guarantees that the machines we supplied the token to are the ones using it at any given time. Some sort of challenge/response authentication method could solve this, but of course that means chances to ignition.
  • It seems like rotating the token automatically may cause more trouble than its worth in many cases. I think not rotating is ok, making the token per-cluster is better than what we have. We can work on TPM2 and other machine attestation for part 2.

Some small suggestions to improve this design further:

  • We should add rate limiting to prevent brute force attacks.
  • HTTPS is a must or you lose most of the benefits.
  • Use constant time compare when validating the token.
  • Even if the token needs to be stored in the cluster itself to be provided to new machine, could we use some RBAC to restrict it from even MCO components? Having the MCS use a salted and hashed token instead?
  • Let's be clear in docs etc that this does not bring it to the level necessary to securely supply user data or secrets via ignition. No secrets in ignition should still be the default stance.

@cgwalters
Copy link
Member Author

cgwalters commented Sep 1, 2020

Let's be clear in docs etc that this does not bring it to the level necessary to securely supply user data or secrets via ignition. No secrets in ignition should still be the default stance.

That's kind of a root issue because we are injecting secrets in Ignition today (pull secret etc.). I don't have a clear vision for how to get to the point where we're not doing that. Maybe...hmm, I could imagine that we switch to doing "pull through" for images until kubelet is up (and hence the node has joined the cluster and can sync secrets that way) - basically we:

  • inject pull secret into bootstrap node
  • ignition served to control plane doesn't include pull secret
  • instead we run a "pull through" registry on the bootstrap node that is whitelisted to only allow pulling images referenced in the release payload
  • control plane ignition includes an ICSP pointing to bootstrap's pull through
  • OS update happens, reboot
  • ICSP pointing to bootstrap is dropped
  • real pull secret is written
  • Similar flow happens for workers except maybe we just patch the main openshift-registry to be the "pull through" point for workers

So we can design something for the pull secret, but I'm uncertain whether we can require system administrators not to put any other secrets in Ignition.

@ericavonb
Copy link
Contributor

That's kind of a root issue because we are injecting secrets in Ignition today (pull secret etc.). I don't have a clear vision for how to get to the point where we're not doing that.

I think we should distinguish the pull secret from other types of secrets; it's not the same as say, PII or business intelligence. Leaking it has real economic consequences for Red Hat but we can limit and track any damage starting with rotating it for that customer. It has a different threat model; the ability to launch pods, even unprivileged ones, gives you the same level of access as the pull secret. When we're talking about in-cluster threats such as here, protecting the pull secret it's as important.

@cgwalters
Copy link
Member Author

. It has a different threat model; the ability to launch pods, even unprivileged ones, gives you the same level of access as the pull secret.

I think unfortunately that's not quite true; the pull secret somehow acts as an identifier/login that's connected with cloud.redhat.com. Launching containers accessible from the pull secret doesn't let you see the secret itself.

@romfreiman
Copy link

romfreiman commented Sep 5, 2020

I'm trying to verify it will not affect the assisted installer flow (seems to me that it should not, assuming no rotation, but want to be sure):

  1. Installer generates the token
  2. It injects it into a secret within the control plane from bootstrap.sh + adds it to master/worker pointers ignitions? {"source":"https://192.168.126.100:22623/config/worker?token=1pY12C5VWKwqeyK4itt8iaFBnqSqwD/sW2lNDezTHPU="} for example.
  3. MCS will verify this token once it will get the pointer request (same as now, for example, it rejects to ignite workers while in bootstrap more)
    Am I missing anything?
    BTW, rotation will break late binding (aka a node will try to join previously established cluster) or will require the token to be updated in the AI service that will serve the pointer ignition.

@cgwalters
Copy link
Member Author

I'm trying to verify it will not affect the assisted installer flow (seems to me that it should not, assuming no rotation, but want to be sure):

Yep, sounds right to me.

@cgwalters
Copy link
Member Author

I discovered GCP has a really nice system for proving instance identity: https://cloud.google.com/compute/docs/instances/verifying-instance-identity

@cgwalters
Copy link
Member Author

That said I am increasingly thinking that we should try to avoid storing the pull secret in Ignition at all via a design per above. It may not be really hard and would obviate a lot of the need for this.

cgwalters added a commit to cgwalters/installer that referenced this pull request Nov 12, 2020
Part of implementing: openshift/enhancements#443

The installer generates a random token (~password) and injects
it into the Ignition pointer configuration and as a secret into the
cluster.

The MCO will check it.
@cgwalters
Copy link
Member Author

openshift/installer#4372

cgwalters added a commit to cgwalters/installer that referenced this pull request Nov 13, 2020
Part of implementing: openshift/enhancements#443

The installer generates a random token (~password) and injects
it into the Ignition pointer configuration and as a secret into the
cluster.

The MCO will check it.
@LorbusChris
Copy link
Member

This will also have implications for other parts of OpenShift that request and consume Ignition config from the MCS, such as the WMCO.

@cgwalters
Copy link
Member Author

This will also have implications for other parts of OpenShift that request and consume Ignition config from the MCS, such as the WMCO.

I think this is the same as openshift/machine-config-operator#1690 AKA #467

@LorbusChris
Copy link
Member

LorbusChris commented Nov 16, 2020

I'm not sure it's exactly that, although it certainly looks related.
Today the Windows machines download the entire Ign config from the MCS (like Ignition does on Linux machines), but they really only consume a small subset of it, like e.g. the kubelet config.

Copying from a comment I made on the internal chat:

Currently the WMCO itself is not aware of the contents of the Ignition config at all, as it is downloaded by the provisioned Windows machine itself. This proppsed change might be a strong reason to funnel the Ign config through the WMCO, making it the instance that requests the Ignition config from the MCS, and then pass it on to the Windows machines, so it can react to changes in both the Ignition config contents, as well as the token that'll be required to request Ign config in the first place with the above change.
Alternatively, the token could be written to the Windows machines by the WMCO, so they can request the Ign config themselves as they are doing now.

@cgwalters
Copy link
Member Author

Today the Windows machines download the entire Ign config from the MCS (like Ignition does on Linux machines), but they really only consume a small subset of it, like e.g. the kubelet config.

Right. I think the common thread across all of these though is that in cluster components shouldn't need to hit the MCS - e.g. openshift-ansible switched to getting the rendered MC object. See https://github.com/openshift/openshift-ansible/blob/2cf2144dff1241702a0f2506f6dd66334341a919/roles/openshift_node/tasks/apply_machine_config.yml#L7 and below.


For IPI/machineAPI scenarios, this will be handled fully automatically.

For user-provisioned installations, openshift-install will generate a token internally via the equivalent of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as possible, OpenShift generates secrets within the cluster so that it doesn't need to worry about securing that information at rest or over the wire. Adding this new functionality would impose a new constraint on administrators (if they want to enjoy the benefit of this mechanism): they would need to run the installation from a secure machine and they would need to securely store the pointer Ignition configs. Without a robust rotation mechanism, that's going to be a pretty tough sell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they would need to run the installation from a secure machine

What does this mean? Surely if the installation machine is compromised in some way then that undermines everything?

and they would need to securely store the pointer Ignition configs.

That is taken care of already in clouds - accessing the user data is a privileged operation and we block it from pods. The in-cluster pointer config is already a secret type.

In bare metal UPI yes; we'd need to document best practices here. In most bare metal UPI the simple fix here would be to just turn off the webserver hosting the pointer config until you need to do provisioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anywhere the token is stored, it needs to be secured - the chain is only as strong as its weakest link. If we are generating the token on someone's laptop, then they need to ensure that there is nothing snooping, otherwise, they cannot be sure that the token is safe. If there was a rotation mechanism, we could significantly reduce the risk for this particular vector, but it still doesn't address the other locations that this token is stored. Turning on and off a web server based on when the cluster needs to be scaled is not an option. Not only is that cumbersome and error-prone, but most customers' IT is not structured in a way that would even make this possible.

enhancements/machine-config/auth-token.md Outdated Show resolved Hide resolved

See https://github.com/openshift/machine-config-operator/pull/784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this a bit? It's long been the intention and recommendation that Ignition configs do not contain secrets. This proposes moving in the opposite direction (or at least making it very easy to do so), so I want to understand why you think we should change course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Ignition config has contained the pull secret and the kubeconfig from the start in 4.1. There's no course change here.
I've reworked the section to clarify that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a change in course though. As I said, the intention from the start has been to remove secrets from the configs. This is a reversal from that goal and I would like to see it justified since it is pretty foundational to the design.


#### Non-Ignition components fetching the Ignition config

As part of the Ignition spec 3 transition, we needed to deal with non-Ignition consumers of the Ignition config, such as [openshift-ansible](github.com/openshift/openshift-ansible/) and Windows nodes. These should generally instead fetch the rendered `MachineConfig` object from the cluster, and not access the MCS port via TCP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that how these components work today and if not, is it possible for them to do so? There's a relatively large amount of setup that needs to happen before a new client is able to fetch objects from the API server and I don't want to gloss over that without being sure it won't be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's how it works today, see the code around here:

There's a relatively large amount of setup that needs to happen before a new client is able to fetch objects from the API server

Looks like the ansible code accepts a kubeconfig. Things like rhel7 and Windows MCO should also just be able to run as a pod; what's the problem with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's good to see that the existing implementations are compatible. Can you update the wording to reflect that? Specifically, reading "should generally" doesn't give me confidence.


See: [Disaster recovery](https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/about-disaster-recovery.html)

This proposes that the secret for provisioning a node is stored in the cluster itself. But so is the configuration for the cluster, so this is not a new problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an adequate analysis of the potential failure modes for this design. I would expect this section to talk about the user experience in various failure scenarios (e.g. How would an admin troubleshoot auto-scaling failures due to a token mismatch? How would the auth token be recovered from a control plane which has lost quorum?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the auth token be recovered from a control plane which has lost quorum?).

I added a section on this:

Note that the token is only necessary when Ignition is run - which is the first boot of a node before it joins the cluster. If a node is just shut down and restarted, access to the token isn't required. Hence there are no concerns if e.g. the whole control plane (or full cluster) is shut down and restarted.

In the case of e.g. trying to reprovision a control plane node, doing that today already requires the Machine Config Server to run, which requires a cluster functioning enough to run it. The auth token doesn't impose any additional burden to that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g. How would an admin troubleshoot auto-scaling failures due to a token mismatch?

This one was kind of covered below, I reworked it and moved it up.

Troubleshooting

Debugging nodes failing in Ignition today is painful; usually doing so requires looking at the console. There's no tooling or automation around this. We will need to discuss this in a "node provisioning" section in the documentation.

This scenario would be easier to debug if Ignition could report failures to the MCS.


That said, we will need to discuss this in a "node provisioning" section in the documentation, and also link to it from the disaster recovery documentation to ensure that an administrator trying to re-provision a failed control plane node has some documentation if they forgot to update the pointer configuration for the control plane.

This scenario would be easier to debug if [Ignition could report failures to the MCS](https://github.com/coreos/ignition/issues/585).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is sufficient. In order to safely ingest the reports from Ignition, MCS would need to do authentication. Otherwise, an attacker could attempt to phish with a misleading report or flood the endpoint with bogus reports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, part of the idea here is that if we have this auth token we can use it to authenticate failure reports. That clearly doesn't help for token rotation problems, but it would for all the other cases at least (failed os update pivot, broken Ignition partitioning, etc) that we've hit over time.


### Upgrades

This will not apply by default on upgrades, even in IPI/machineAPI managed clusters (to start). However, an administrator absolutely could enable this "day 2". Particularly for "static" user provisioned infrastructure where the new nodes are mostly manually provisioned, the burden of adding the token into the process of provisioning would likely be quite low.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a solid reason, we should avoid fracturing the fleet of clusters by their initial version. This especially applies to security improvements. We don't need to commit to upgrading everyone into this new scheme in the first iteration, but I would like to see a plan for getting out of the hole before we jump into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UPI cases we don't control the pointer config, so there's no way we can automate this right?

We could add some sort of alerting and ask admins to do the switch I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there's no way we can automate this right?

Right, not with this proposal. And that's my concern. I'd prefer a solution that is less hands-on and possible for us to roll out as an upgrade.


#### Move all secret data out of Ignition

This would then gate access on CSR approval, but this introduces bootstrapping problems for the kubelet such as the pull secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is the current best practice, can you expand on this a bit more? What sort of bootstrapping problems do you envision? Some folks on the installer team had looked into this in the past and had a working PoC, so I don't buy that this is an insurmountable challenge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for one thing we apply OS updates before starting kubelet - https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md
And the MCO has systemd units that pull other containers (e.g. coredns) on certain platforms etc.

Trying to move entirely away from pulling container images before kubelet would seem to require a model where we run all containers via kubelet - seems possible but needs design. The kubelet itself currently wants to pull a "pause" image although maybe only in non-hostNetwork: true cases?

A lot of unknowns down this path - I'd have no problem if someone else wanted to try driving this direction though.


#### Encrypt secrets in Ignition

Similar to above, we have a bootstrapping problem around fetching the key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we? As long as the secrets aren't required to gain access to the API server, we could protect the decryption key behind the CSR process. Can you expand on this alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems equivalent to the above - the pull secret is a secret so we'd need to solve all the problems there.


Similar to above, we have a bootstrapping problem around fetching the key.

#### Improved node identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how this is an alternative to the proposal. I agree we should do more to attest, but I don't see how this protects the Ignition configs served by the MCS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

In this model we'd change the Ignition process to require the node prove is identity, and the MCS would verify that.

See https://cloud.google.com/compute/docs/instances/verifying-instance-identity#verifying

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
To complete the pull request process, please assign frobware after the PR has been reviewed.
You can assign the PR to them by writing /assign @frobware in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@crawford
Copy link
Contributor

This enhancement is too light on detail for how much trouble it could potentially cause. We need to think about this holistically, as it never ends well when security is bolted on after the fact.

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 24, 2020
@cgwalters
Copy link
Member Author

This enhancement is too light on detail for how much trouble it could potentially cause. We need to think about this holistically, as it never ends well when security is bolted on after the fact.

OK but we need some path forward here though right? Of these, which is closest to what you're saying?

  1. This isn't a big problem and we can continue to ignore it for a while
  2. I don't like like this and want to do something else but I'm not sure what
  3. I don't like this and would prefer someone investigate the "kubelet only" path
  4. This enhancement mostly sounds like it will work but needs tweaks/clarifications

@cgwalters
Copy link
Member Author

cgwalters commented Nov 25, 2020

OK I thought about the "kubelet only" path a bit more and had an idea: What if we provided a "synthetic" pull secret via Ignition that only allowed pulling the default container images from the in-cluster registry? Then once kubelet joins, the "real" pull secret is pulled from the cluster and installed via say the MCD and kubelet live-updated to use it.

That addresses the main secret (pull secret); though we'd still eventually need to disallow anything that can access Ignition from being able to spam CSRs - though solving that problem I think would be best done with something like cloud node identity systems/TPMs.

We're already blazing the trail of live updates in the MCO and also another benefit here is this would also force us to implement "pull through" in our registry which I suspect would be a large benefit.

@crawford
Copy link
Contributor

What if we provided a "synthetic" pull secret via Ignition that only allowed pulling the default container images from the in-cluster registry? Then once kubelet joins, the "real" pull secret is pulled from the cluster and installed via say the MCD and kubelet live-updated to use it.

That was roughly the direction we were moving. The plan was to teach the kubelet how to do the CSR flow and fetch the pull secret before it attempted to pull any containers. If possible, I'd like to leave the Machine Config components out of this flow (less moving parts, less complexity).

That addresses the main secret (pull secret); though we'd still eventually need to disallow anything that can access Ignition from being able to spam CSRs - though solving that problem I think would be best done with something like cloud node identity systems/TPMs.

What's the concern around spamming the CSR endpoint? It's a given that we need a smarter approver (i.e. identity and attestation systems, like you mentioned), but is that not sufficient?

@cgwalters
Copy link
Member Author

Pre-enhancement: Move secrets from MCS Ignition to pointer config

This is an alternative to #443

Background

In order for CoreOS nodes to join the cluster, we need to provide Ignition that contains at least:

  • Stock OpenShift infrastructure (kubelet systemd unit file)
  • Per-cluster configuration (config files with domain names, etc.)
  • Pull secret to download container images
  • Bootstrap kubeconfig

Problem statement

This Ignition is provided on a port 22623 that is currently blocked by the OpenShift SDN - but that isn't implemented by 3rd party SDN layers. It may be possible to access the MCS endpoint as well from outside the cluster; while our default cloud IPI deployments use firewalling to block ingress to it, not all UPI deployments do so.

We particularly want to avoid leaking the pull secret outside the cluster because it is tied to the cluster provisioner's Red Hat account.

It'd also be nice to avoid leaking the bootstrap kubeconfig; while the CSR approval mechanism is reasonably secure, it's clearly better to avoid scenarios where a malicious actor could even get to the point of creating CSRs.

Proposed solution

Move the pull secret and bootstrap kubeconfig into the "pointer Ignition". In all cloud environments that we care about, access to the "user data" in which the pointer Ignition is stored is secured. It can't be accessed outside the cluster because it's a link local IP address, and the OpenShift SDN blocks it from the pod network (see references).

Risks and mitigations

In UPI scenarios, particularly e.g. bare metal and vSphere type UPI that often involves a manually set up webserver, we do not currently require confidentiality for the pointer configuration. This would need to change - it'd require documentation.

Notes and references

  1. OpenShift SDN blocking link local addresses: openshift/origin@9a9f30f
    The latest version of which currently lives here https://github.com/openshift/sdn/blob/bba15e2d344a6729d5aa7ac7d1ec14d2022219ab/cmd/sdn-cni-plugin/openshift-sdn_linux.go#L129
  2. RHCOS IMDSv2: RHBZ#1899220

See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 1, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 31, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.