Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow installer to include/exclude capabilities based on user selections #922

Merged
merged 1 commit into from
Jan 13, 2022

Conversation

bparees
Copy link
Contributor

@bparees bparees commented Oct 5, 2021

No description provided.

@bparees bparees force-pushed the installdomains branch 3 times, most recently from a83a738 to 67fca73 Compare October 7, 2021 21:56
Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me

enhancements/installer/component-selection.md Outdated Show resolved Hide resolved
enhancements/installer/component-selection.md Show resolved Hide resolved
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this draft. I have a couple of comments about non-goals, but nothing that would block merging this.

enhancements/installer/component-selection.md Outdated Show resolved Hide resolved
enhancements/installer/component-selection.md Show resolved Hide resolved
@bparees bparees force-pushed the installdomains branch 5 times, most recently from beee88e to c55c326 Compare October 11, 2021 21:13
* Providing a way to install OLM operators as part of the initial cluster install. This EP is about making
the install experience around the existing CVO-based components more flexible, not adding new components to the
install experience.
* Allowing components to be disabled post-install.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Allowing components to be disabled post-install.
* Allowing components to be disabled post-install.
Suggested change
* Allowing components to be disabled post-install.
* Allowing components to be disabled post-install.
* Defining disabled components via any means other than the ExcludeAddons list (ie: no defaulting to disable a second level operator based on platform.

Maybe this is implicit in the the next item but it seemed worth calling out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler similar to comments you made below. Do you feel like this is worth calling out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say there's no reason the installer couldn't pre-configure the install-config to exclude certain components based on the target platform. So not a "goal" of the EP, but something the EP, if implemented, could enable to be built and i wouldn't be against it being built.

not sure if you're for or against such a capability being built on top of this functionality in the future, @sdodson

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler similar to comments you made below. Do you feel like this is worth calling out?

I think there is value in calling that out explicitly. I don't see it being implied by the next item, beyond one particular way of disabling components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say there's no reason the installer couldn't pre-configure the install-config to exclude certain components based on the target platform

The installer cannot easily support defaults for the exclusions, because the installer cannot distinguish between an empty field and an omitted field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields in the ClusterVersion stay the same as outlined elsewhere. The additional fields are strictly for the install-config.yaml. Specifically, the installer will coalesce the values in the three install-config fields into a single "exclude" field for the CVO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require the installer to know about all of the capabilities, right? And it wouldn't solve the upgrade case.

The more I think about this, the more I'm convinced the installer should just pass values through to the CVO so we only have complex logic and knowledge related to the set of capabilities in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem of "what if a new component is introduced in an upgrade" exists regardless, with the current "exclude-only" api at the CVO level. The admin can only tell us what they don't want, and if they didn't know to add "new-component" to that list before they upgraded, they're going to get "new-component" after the upgrade.

I agree that i'm increasingly inclined to say that the installer should have no logic for choosing specific components at all. If the user wants to exclude components, they provide an appropriate install-config. If not, they get everything. And if someone wants to write a wrapper/generator for install-configs that intelligently excludes certain components, great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler @dhellmann i've added a commit in an attempt to define an api that satisfies the use cases of:

  1. let the admin choose the default behavior for "unknown" capabilities (such as those introduced during a future upgrade)
  2. allow the admin to start with an empty set and add what they want (default: exclude, then list includes)
  3. allow the admin to start with a full set, and then exclude stuff (default: include, then list excludes)

For the case where the installer itself needs to know if the admin provided any values or not, i think we should just make the include and exclude lists pointer/optional fields. Nil means "user didn't specify anything", empty array means "user explicitly wants nothing listed here". And i'm still hopeful that we keep the installer out of that business(populating default lists of things to exclude/include) entirely.

1a9c0b7

I hope that lets us resolve this thread, if not we can discuss some more when i'm back next week

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can resolve this thread and continue any discussion of the API closer to where it is described in this doc.

@sdodson
Copy link
Member

sdodson commented Oct 15, 2021

/cc @LalatenduMohanty
PTAL

@bparees
Copy link
Contributor Author

bparees commented Oct 15, 2021

@derekwaynecarr @sdodson @staebler @LalatenduMohanty ok i think this is in a finalized form that takes into account the feedback i've gotten. Please give it a look when you have a chance.

I will separately be reaching out to the installer/cvo team and their PM to discuss getting this on the roadmap.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2022
@bparees bparees added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2022
@openshift-ci openshift-ci bot removed lgtm Indicates that a PR is ready to be merged. labels Jan 13, 2022
@bparees bparees force-pushed the installdomains branch 2 times, most recently from 9e0a1c1 to a8dce46 Compare January 13, 2022 02:16

5) After installing a cluster, disable an addon. The configuration change should be
rejected by the CVO. Disabling a component post-install is not supported.

Copy link
Contributor

@gabemontero gabemontero Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees per the 4.11 kick off, if samples operator ends up being the first candidate for this endeavor, do you want to capture the fact that there will be e2e adjustments needed in openshift/origin here in this EP?

or do you feel that is "too detailed" for this EP (or this current iteration of this EP), or do you anticipate that being captured either in another EP, or just captured in Jira items that ultimately accompany the implementation of this EP?

thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe your item 1) is a very high level summary for capturing my point ^^ .... but I know the openshift/origin e2e's is a detail you and I have discussed in the past, so was half expecting some reference to it :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah item (1) was intended to cover that point, but i'll add a sentence:

This will necessitate disabling or modifying tests that depend on the disabled
component(s).

@sdodson
Copy link
Member

sdodson commented Jan 13, 2022

/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 13, 2022

@bparees: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2022
@openshift-merge-robot openshift-merge-robot merged commit fdac2b4 into openshift:master Jan 13, 2022
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 23, 2022
Fleet capability choices should make their way up to Red Hat via
Insights uploads.  But some clusters have Telemetry enabled but
Insights disabled [1].  For these clusters, we'll want this data in
Telemetry, and getting it into metrics (this commit) is the first
step.  After this lands, we'll need to update the monitoring operator
like [2] to get these shipped up off of cluster.

[1]: openshift/enhancements#922 (comment)
[2]: openshift/cluster-monitoring-operator#1477
wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 23, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 23, 2022
Fleet capability choices should make their way up to Red Hat via
Insights uploads.  But some clusters have Telemetry enabled but
Insights disabled [1].  For these clusters, we'll want this data in
Telemetry, and getting it into metrics (this commit) is the first
step.  After this lands, we'll need to update the monitoring operator
like [2] to get these shipped up off of cluster.

[1]: openshift/enhancements#922 (comment)
[2]: openshift/cluster-monitoring-operator#1477
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 23, 2022
Fleet capability choices should make their way up to Red Hat via
Insights uploads.  But some clusters have Telemetry enabled but
Insights disabled [1].  For these clusters, we'll want this data in
Telemetry, and getting it into metrics (this commit) is the first
step.  After this lands, we'll need to update the monitoring operator
like [2] to get these shipped up off of cluster.

[1]: openshift/enhancements#922 (comment)
[2]: openshift/cluster-monitoring-operator#1477
wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 23, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
wking added a commit to wking/cluster-monitoring-operator that referenced this pull request Mar 30, 2022
The cluster-version operator is growing this new metric [1], and Ben
Parees wants it included in Telemetry so we collect it for clusters
that have Telemetry enabled, even if Insights is disabled [2].
Cardinality is expected to be very small, with a single label ('name'
[1]) and a restricted value set (only three registered capabilities
expected for 4.11).

[1]: openshift/cluster-version-operator#755
[2]: openshift/enhancements#922 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Aug 1, 2022
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132)
we've blocked on ClusterVersion going Available=True. So the current
dependency chain is:

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motivation in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
TrilokGeer pushed a commit to TrilokGeer/installer that referenced this pull request Sep 14, 2022
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132)
we've blocked on ClusterVersion going Available=True. So the current
dependency chain is:

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motivation in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
@bparees bparees deleted the installdomains branch October 24, 2022 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.