single-node production deployment approach #560

dhellmann · 2020-12-10T22:24:46Z

This enhancement describes the approach to deploying single-node
production OpenShift instances without using a cluster profile.

cgwalters

This is extremely well written, makes total sense to me.

The only thing I found myself wanting here is the specific example of the capabilities API; it seems like that's going to be openshift/api#816 ? Let's either link to that or explicitly demo what the "user interface" is for this in the install config?

dhellmann · 2020-12-10T22:52:21Z

enhancements/single-node-production-deployment-approach.md

+1. Telco workloads typically require special network setups
+   for a host to boot, including bonded interfaces, access to multiple
+   VLANs, and static IPs. How do we anticipate configuring those?
+2. How do we hand off ownership of the `etcd-quorum-guard` Deployment


@mrunalp @staebler I've added an open question about the CVO-CEO hand-off that we discussed earlier. Did we have an answer for that, or is it still an open question?

@romfreiman mentioned that he and @hexfusion have discussed it and have a plan.

is there a link?

@romfreiman @hexfusion can you summarize the plan in a comment or provide a link to a design doc so I can put either/both into this enhancement?

dhellmann · 2020-12-10T23:24:50Z

This is extremely well written, makes total sense to me.

The only thing I found myself wanting here is the specific example of the capabilities API; it seems like that's going to be openshift/api#816 ? Let's either link to that or explicitly demo what the "user interface" is for this in the install config?

D'oh! There's a link to that enhancement in the metadata, but probably not in the body of the text. It's #555 and I'll make sure that is called out more clearly.

dhellmann · 2020-12-10T23:29:31Z

/cc @markmc

enhancements/single-node-production-deployment-approach.md

markmc · 2020-12-14T12:46:41Z

This is extremely well written, makes total sense to me.

+1 the effort from many people on this is very well captured here

For me, the tl;dr that this is the minimal list of changes we believe would be needed by operators to respond to a "this is a non-HA cluster" API. Unless there are major objections to that approach, I think we should be able to merge this enhancement quite quickly.

markmc · 2020-12-14T12:49:48Z

@dhellmann there was a bunch of thoughtful discussion in #504 about configuration changes. I'm not sure I follow the conclusions 100%, so I'm curious in your mind the result of that discussion is captured in this enhancement? Thanks.

(See #504 (comment))

markmc · 2020-12-14T12:55:07Z

enhancements/single-node-production-deployment-approach.md

+would be expected to run without issue during this interval. Workloads
+that do depend on apiserver availability would need to be resilient to
+these events. OpenShift core components are already resilient in this
+way.


(Maybe this closely related to my question on the configuration changes discussion)

This section begs more questions for me than answers - e.g. what do we mean by "rollouts"? Reconfiguration for key rotations I think I get, but are there other examples of "periodically reconfigured"? Inaccessible for up to 2 minutes seems very specific - is this 2 minute timeframe somehow fundamental, or something that can be improved? Can we be more specific about how OpenShift core components are resilient to this, and how other workloads are going to need to be adapted?

Thanks,

@deads2k @cgwalters I think these details came from one of you, can you help answer the question?

Pinging @deads2k and @cgwalters for help here.

As I understand it, the new encryption keys cause a restart, or at least a temporary pause while the new keys are loaded. I guess restarting takes around 2 minutes? @deads2k is that right?

enhancements/single-node-production-deployment-approach.md

dhellmann · 2020-12-14T16:56:10Z

@dhellmann there was a bunch of thoughtful discussion in #504 about configuration changes. I'm not sure I follow the conclusions 100%, so I'm curious in your mind the result of that discussion is captured in this enhancement? Thanks.

(See #504 (comment))

In those earlier discussions we were still assuming we might end up with a version of this that cut some operators out of the cluster completely. The proposal has evolved significantly since then, and I've tried to capture that in an update to the goals/non-goals section.

@deads2k , the comment @markmc linked to was yours. Could you take a look at the goals/non-goals list and confirm that I've captured the details to alleviate your earlier concerns?

dhellmann · 2020-12-14T16:57:07Z

This is extremely well written, makes total sense to me.

+1 the effort from many people on this is very well captured here

Yes, this was definitely a team effort!

enhancements/single-node-production-deployment-approach.md

hexfusion

few notes, thanks for the details.

enhancements/single-node-production-deployment-approach.md

derekwaynecarr

@dhellmann this is very well done, had a few questions/clarifications.

enhancements/single-node-production-deployment-approach.md

derekwaynecarr · 2021-01-07T20:00:50Z

enhancements/single-node-production-deployment-approach.md

+new capabilities API to change the replica count to 1 when the
+high-availability mode is none.
+
+#### cluster-machine-approver


during bootstrapping, we approve everything, post bootstrapping, we need a corresponding machine record in order to auto-approve.

https://github.com/openshift/cluster-machine-approver/blob/master/README.md#openshift-and-csrs

enhancements/single-node-production-deployment-approach.md

derekwaynecarr · 2021-01-07T20:05:32Z

enhancements/single-node-production-deployment-approach.md

+1. Telco workloads typically require special network setups
+   for a host to boot, including bonded interfaces, access to multiple
+   VLANs, and static IPs. How do we anticipate configuring those?
+2. How do we hand off ownership of the `etcd-quorum-guard` Deployment


is there a link?

derekwaynecarr · 2021-01-07T20:06:16Z

enhancements/single-node-production-deployment-approach.md

+(https://github.com/openshift/release/pull/14552) tests using the
+bootstrap-in-place approach described in
+https://github.com/openshift/enhancements/pull/565 on Packet and
+e2e-aws-single-node (https://github.com/openshift/release/pull/14556)


it is accurate to assume that the aws cloud provider will be enabled when running on aws infra?

@eranco74 does the installer use the right platform setting or does it use an empty platform like UPI?

The installer will use the platform specified in the install-config.yaml, for bootstrap-in-place the platform should be None - same as UPI.
You can still install single node with the openshift-installer regular flow (with bootstrao node) on AWS, the aws cloud provider will be enabled.

enhancements/single-node-production-deployment-approach.md

dhellmann · 2021-01-08T17:02:29Z

Thanks everyone for your reviews! I have updated the text based on the actionable feedback, so please take another look if you have reviewed an earlier draft. There are still several threads with open questions or requests for help, but I think we're a lot closer to being able to merge this.

markmc · 2021-01-08T18:15:48Z

Thanks everyone for your reviews! I have updated the text based on the actionable feedback, so please take another look if you have reviewed an earlier draft. There are still several threads with open questions or requests for help, but I think we're a lot closer to being able to merge this.

Agree, thanks Doug.

/approve

@derekwaynecarr please lgtm if/when you're happy with Doug's responses to your feedback

This enhancement describes the approach to deploying single-node production OpenShift instances without using a cluster profile. Signed-off-by: Doug Hellmann <[email protected]>

darkmuggle · 2021-01-08T19:34:28Z

enhancements/single-node-production-deployment-approach.md

+to stop all workloads safely as part of the reboot. That feature is
+alpha in kubernetes 1.20 and disabled by default, so we will need to
+add a feature gate to enable it.
+


Suggested change

Any operator that generates a MachineConfig and templates in the machine-config-operator must be high-availability mode agnostic.

I missed this comment earlier. Maybe we want to fold it into #587?

derekwaynecarr · 2021-01-13T14:17:37Z

thanks @dhellmann , this looks good to merge and iterate. if we find more is necessary, we can update.

/lgtm

pweil-

Console updates LGTM

openshift-ci-robot · 2021-01-13T14:24:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, markmc, pweil-

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [markmc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spadgett · 2021-01-13T14:25:59Z

Console updates LGTM

+1, thanks @dhellmann

cgwalters · 2021-01-15T14:43:31Z

And the best part is now that this is merged, once we stand up CI and there are occasional failures...we can call those SNOflakes.

romfreiman · 2021-01-15T19:19:51Z

@cgwalters and we have a logo

Incorporate feedback from openshift#560 (comment) Signed-off-by: Doug Hellmann <[email protected]>

openshift-ci-robot requested review from deads2k and jwmatthews December 10, 2020 22:25

dhellmann mentioned this pull request Dec 10, 2020

single-node production deployments #504

Closed

cgwalters approved these changes Dec 10, 2020

View reviewed changes

dhellmann force-pushed the single-node-production-deployment-approach branch from 017205a to 52c5ea8 Compare December 10, 2020 22:51

dhellmann commented Dec 10, 2020

View reviewed changes

dhellmann force-pushed the single-node-production-deployment-approach branch 2 times, most recently from 0d0c1e9 to 0003860 Compare December 10, 2020 23:28

openshift-ci-robot requested a review from markmc December 10, 2020 23:29

mrunalp reviewed Dec 11, 2020

View reviewed changes

enhancements/single-node-production-deployment-approach.md Outdated Show resolved Hide resolved

dhellmann force-pushed the single-node-production-deployment-approach branch 2 times, most recently from 88fbe8d to dbb028d Compare December 11, 2020 21:27

markmc reviewed Dec 14, 2020

View reviewed changes

enhancements/single-node-production-deployment-approach.md Outdated Show resolved Hide resolved

dhellmann force-pushed the single-node-production-deployment-approach branch 3 times, most recently from 003367c to 35df4e1 Compare December 14, 2020 17:04

dhellmann force-pushed the single-node-production-deployment-approach branch from ea99322 to ce35544 Compare January 6, 2021 21:34

pweil- reviewed Jan 6, 2021

View reviewed changes

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

hexfusion reviewed Jan 6, 2021

View reviewed changes

enhancements/single-node-production-deployment-approach.md Outdated Show resolved Hide resolved

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

crawford reviewed Jan 7, 2021

View reviewed changes

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

crawford reviewed Jan 7, 2021

View reviewed changes

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

derekwaynecarr reviewed Jan 7, 2021

View reviewed changes

kikisdeliveryservice reviewed Jan 7, 2021

View reviewed changes

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

enhancements/single-node-production-deployment-approach.md Show resolved Hide resolved

dhellmann force-pushed the single-node-production-deployment-approach branch from ce35544 to e13ed3e Compare January 8, 2021 16:58

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2021

single-node production deployment approach

d5e748f

This enhancement describes the approach to deploying single-node production OpenShift instances without using a cluster profile. Signed-off-by: Doug Hellmann <[email protected]>

dhellmann force-pushed the single-node-production-deployment-approach branch from e13ed3e to d5e748f Compare January 8, 2021 19:04

darkmuggle reviewed Jan 8, 2021

View reviewed changes

openshift-ci-robot assigned derekwaynecarr Jan 13, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2021

pweil- approved these changes Jan 13, 2021

View reviewed changes

openshift-merge-robot merged commit 48ae0cf into openshift:master Jan 13, 2021

dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Jan 21, 2021

clarify wording about cluster machine approver for single-node

82df8d2

Incorporate feedback from openshift#560 (comment) Signed-off-by: Doug Hellmann <[email protected]>

dhellmann mentioned this pull request Jan 21, 2021

clarify wording about cluster machine approver for single-node #596

Merged

VaishnaviHire pushed a commit to VaishnaviHire/enhancements that referenced this pull request Feb 11, 2021

clarify wording about cluster machine approver for single-node

27cddd2

Incorporate feedback from openshift#560 (comment) Signed-off-by: Doug Hellmann <[email protected]>

sinnykumari mentioned this pull request Mar 9, 2021

Skip drain on Single Node deployment openshift/machine-config-operator#2457

Merged

cgwalters mentioned this pull request Mar 18, 2021

Add container-images to the compose / treefile coreos/rpm-ostree#2675

Open

mansikulkarni96 pushed a commit to mansikulkarni96/enhancements that referenced this pull request Mar 26, 2021

clarify wording about cluster machine approver for single-node

561ba50

Incorporate feedback from openshift#560 (comment) Signed-off-by: Doug Hellmann <[email protected]>

dmi3mis mentioned this pull request May 13, 2021

machine-config-operator doesn't work on single node cluster cgruver/okd4-single-node-cluster#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single-node production deployment approach #560

single-node production deployment approach #560

dhellmann commented Dec 10, 2020

cgwalters left a comment

dhellmann Dec 10, 2020

mrunalp Dec 11, 2020

derekwaynecarr Jan 7, 2021

dhellmann Jan 7, 2021

dhellmann commented Dec 10, 2020

dhellmann commented Dec 10, 2020

markmc commented Dec 14, 2020

markmc commented Dec 14, 2020

markmc Dec 14, 2020

dhellmann Dec 14, 2020

dhellmann Jan 6, 2021

dhellmann Jan 14, 2021

dhellmann commented Dec 14, 2020

dhellmann commented Dec 14, 2020

hexfusion left a comment

derekwaynecarr left a comment

derekwaynecarr Jan 7, 2021

derekwaynecarr Jan 7, 2021

derekwaynecarr Jan 7, 2021

dhellmann Jan 7, 2021

eranco74 Jan 17, 2021 •

edited

Loading

dhellmann commented Jan 8, 2021

markmc commented Jan 8, 2021

darkmuggle Jan 8, 2021 •

edited

Loading

dhellmann Jan 14, 2021

derekwaynecarr commented Jan 13, 2021

pweil- left a comment

openshift-ci-robot commented Jan 13, 2021

spadgett commented Jan 13, 2021

cgwalters commented Jan 15, 2021

romfreiman commented Jan 15, 2021



	Any operator that generates a MachineConfig and templates in the machine-config-operator must be high-availability mode agnostic.

single-node production deployment approach #560

single-node production deployment approach #560

Conversation

dhellmann commented Dec 10, 2020

cgwalters left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhellmann commented Dec 10, 2020

dhellmann commented Dec 10, 2020

markmc commented Dec 14, 2020

markmc commented Dec 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhellmann commented Dec 14, 2020

dhellmann commented Dec 14, 2020

hexfusion left a comment

Choose a reason for hiding this comment

derekwaynecarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eranco74 Jan 17, 2021 • edited Loading

Choose a reason for hiding this comment

dhellmann commented Jan 8, 2021

markmc commented Jan 8, 2021

darkmuggle Jan 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Jan 13, 2021

pweil- left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jan 13, 2021

spadgett commented Jan 13, 2021

cgwalters commented Jan 15, 2021

romfreiman commented Jan 15, 2021

eranco74 Jan 17, 2021 •

edited

Loading

darkmuggle Jan 8, 2021 •

edited

Loading