-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow installer to include/exclude capabilities based on user selections #922
Allow installer to include/exclude capabilities based on user selections #922
Conversation
128c796
to
e279320
Compare
a83a738
to
67fca73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me
67fca73
to
16a151f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this draft. I have a couple of comments about non-goals, but nothing that would block merging this.
beee88e
to
c55c326
Compare
* Providing a way to install OLM operators as part of the initial cluster install. This EP is about making | ||
the install experience around the existing CVO-based components more flexible, not adding new components to the | ||
install experience. | ||
* Allowing components to be disabled post-install. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Allowing components to be disabled post-install. | |
* Allowing components to be disabled post-install. |
* Allowing components to be disabled post-install. | |
* Allowing components to be disabled post-install. | |
* Defining disabled components via any means other than the ExcludeAddons list (ie: no defaulting to disable a second level operator based on platform. |
Maybe this is implicit in the the next item but it seemed worth calling out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staebler similar to comments you made below. Do you feel like this is worth calling out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say there's no reason the installer couldn't pre-configure the install-config to exclude certain components based on the target platform. So not a "goal" of the EP, but something the EP, if implemented, could enable to be built and i wouldn't be against it being built.
not sure if you're for or against such a capability being built on top of this functionality in the future, @sdodson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staebler similar to comments you made below. Do you feel like this is worth calling out?
I think there is value in calling that out explicitly. I don't see it being implied by the next item, beyond one particular way of disabling components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say there's no reason the installer couldn't pre-configure the install-config to exclude certain components based on the target platform
The installer cannot easily support defaults for the exclusions, because the installer cannot distinguish between an empty field and an omitted field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fields in the ClusterVersion stay the same as outlined elsewhere. The additional fields are strictly for the install-config.yaml. Specifically, the installer will coalesce the values in the three install-config fields into a single "exclude" field for the CVO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require the installer to know about all of the capabilities, right? And it wouldn't solve the upgrade case.
The more I think about this, the more I'm convinced the installer should just pass values through to the CVO so we only have complex logic and knowledge related to the set of capabilities in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem of "what if a new component is introduced in an upgrade" exists regardless, with the current "exclude-only" api at the CVO level. The admin can only tell us what they don't want, and if they didn't know to add "new-component" to that list before they upgraded, they're going to get "new-component" after the upgrade.
I agree that i'm increasingly inclined to say that the installer should have no logic for choosing specific components at all. If the user wants to exclude components, they provide an appropriate install-config. If not, they get everything. And if someone wants to write a wrapper/generator for install-configs that intelligently excludes certain components, great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staebler @dhellmann i've added a commit in an attempt to define an api that satisfies the use cases of:
- let the admin choose the default behavior for "unknown" capabilities (such as those introduced during a future upgrade)
- allow the admin to start with an empty set and add what they want (default: exclude, then list includes)
- allow the admin to start with a full set, and then exclude stuff (default: include, then list excludes)
For the case where the installer itself needs to know if the admin provided any values or not, i think we should just make the include and exclude lists pointer/optional fields. Nil means "user didn't specify anything", empty array means "user explicitly wants nothing listed here". And i'm still hopeful that we keep the installer out of that business(populating default lists of things to exclude/include) entirely.
I hope that lets us resolve this thread, if not we can discuss some more when i'm back next week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can resolve this thread and continue any discussion of the API closer to where it is described in this doc.
d592fd6
to
ba8a1d8
Compare
/cc @LalatenduMohanty |
ba8a1d8
to
c373e95
Compare
@derekwaynecarr @sdodson @staebler @LalatenduMohanty ok i think this is in a finalized form that takes into account the feedback i've gotten. Please give it a look when you have a chance. I will separately be reaching out to the installer/cvo team and their PM to discuss getting this on the roadmap. |
ca97308
to
de9b274
Compare
9e0a1c1
to
a8dce46
Compare
|
||
5) After installing a cluster, disable an addon. The configuration change should be | ||
rejected by the CVO. Disabling a component post-install is not supported. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bparees per the 4.11 kick off, if samples operator ends up being the first candidate for this endeavor, do you want to capture the fact that there will be e2e adjustments needed in openshift/origin here in this EP?
or do you feel that is "too detailed" for this EP (or this current iteration of this EP), or do you anticipate that being captured either in another EP, or just captured in Jira items that ultimately accompany the implementation of this EP?
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe your item 1) is a very high level summary for capturing my point ^^ .... but I know the openshift/origin e2e's is a detail you and I have discussed in the past, so was half expecting some reference to it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah item (1) was intended to cover that point, but i'll add a sentence:
This will necessitate disabling or modifying tests that depend on the disabled
component(s).
45a4627
to
a03d50a
Compare
/lgtm |
@bparees: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Fleet capability choices should make their way up to Red Hat via Insights uploads. But some clusters have Telemetry enabled but Insights disabled [1]. For these clusters, we'll want this data in Telemetry, and getting it into metrics (this commit) is the first step. After this lands, we'll need to update the monitoring operator like [2] to get these shipped up off of cluster. [1]: openshift/enhancements#922 (comment) [2]: openshift/cluster-monitoring-operator#1477
The cluster-version operator is growing this new metric [1], and Ben Parees wants it included in Telemetry so we collect it for clusters that have Telemetry enabled, even if Insights is disabled [2]. Cardinality is expected to be very small, with a single label ('name' [1]) and a restricted value set (only three registered capabilities expected for 4.11). [1]: openshift/cluster-version-operator#755 [2]: openshift/enhancements#922 (comment)
Fleet capability choices should make their way up to Red Hat via Insights uploads. But some clusters have Telemetry enabled but Insights disabled [1]. For these clusters, we'll want this data in Telemetry, and getting it into metrics (this commit) is the first step. After this lands, we'll need to update the monitoring operator like [2] to get these shipped up off of cluster. [1]: openshift/enhancements#922 (comment) [2]: openshift/cluster-monitoring-operator#1477
Fleet capability choices should make their way up to Red Hat via Insights uploads. But some clusters have Telemetry enabled but Insights disabled [1]. For these clusters, we'll want this data in Telemetry, and getting it into metrics (this commit) is the first step. After this lands, we'll need to update the monitoring operator like [2] to get these shipped up off of cluster. [1]: openshift/enhancements#922 (comment) [2]: openshift/cluster-monitoring-operator#1477
The cluster-version operator is growing this new metric [1], and Ben Parees wants it included in Telemetry so we collect it for clusters that have Telemetry enabled, even if Insights is disabled [2]. Cardinality is expected to be very small, with a single label ('name' [1]) and a restricted value set (only three registered capabilities expected for 4.11). [1]: openshift/cluster-version-operator#755 [2]: openshift/enhancements#922 (comment)
The cluster-version operator is growing this new metric [1], and Ben Parees wants it included in Telemetry so we collect it for clusters that have Telemetry enabled, even if Insights is disabled [2]. Cardinality is expected to be very small, with a single label ('name' [1]) and a restricted value set (only three registered capabilities expected for 4.11). [1]: openshift/cluster-version-operator#755 [2]: openshift/enhancements#922 (comment)
The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motivation in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)
The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motivation in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)
No description provided.