-
Notifications
You must be signed in to change notification settings - Fork 471
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow installer to include/exclude components based on user select in…
…stall solution
- Loading branch information
Showing
1 changed file
with
358 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,358 @@ | ||
--- | ||
title: component-selection-during-install | ||
authors: | ||
- "@bparees" | ||
reviewers: | ||
- install team - need agreement on install-config api updates and CVO config rendering | ||
- ota/cvo team - need agreement on CVO resource filtering api and new behavior | ||
approvers: | ||
- @decarr - support for configurable CVO-managed content set feature | ||
- @sdodson - as the staff engineer most closely tied to install experience | ||
creation-date: 2021-05-04 | ||
last-updated: 2021-05-04 | ||
status: provisional | ||
--- | ||
|
||
# User Selectable Install Solutions | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
This enhancement proposes a mechanism for cluster installers to exclude one or more optional components for | ||
their installation which will determine which payload components are/are not installed in their cluster. | ||
Core components are defined as the set of Second Level Operators managed by the Cluster Version Operator | ||
which today cannot be disabled until after completing the install and editing a CVO override, or editing | ||
the CVO overrides as part of rendering+editing manifests. In addition, using CVO overrides put the cluster | ||
into an unsupported and un-upgradeable state, making it insufficient as a solution. | ||
|
||
The proposed UX is to make this a first class part of the install config api with the implementation | ||
being arguments supplied to the CVO to filter out the user-selected manifests. | ||
|
||
## Motivation | ||
|
||
There is an increasing desire to move away from "one size fits all" cluster installations, and | ||
towards flexibility about what should/should not exist in a new cluster out of the box. This can | ||
be seen in efforts such as hypershift, single node, and code-ready-containers. Each of these | ||
efforts has done some amount of one-off work to enable their requirements. This EP proposes a | ||
mechanism that allows components to be disabled in a first class way that the installer exposes. | ||
|
||
### Goals | ||
|
||
* Admins can easily explicitly exclude specific "optional" components from their cluster, at install time. | ||
* Admins can enable a previously excluded optional component, at runtime. | ||
* Install wrappers like assisted-installer can define an install-config that excludes specific components | ||
* Define an api that could be used in the future to exclude cluster capabilities based on things other than | ||
CVO filtering, such as turning off a particular api for the cluster. | ||
|
||
### Non-Goals | ||
|
||
* Making control-plane critical components optional (k8s apiserver, openshift apiserver, openshift controller, | ||
networking, etc) | ||
* Defining which components should be disable-able (this will be up to component teams to classify themselves | ||
as `capabilities` or not) | ||
* Providing a way to install OLM operators as part of the initial cluster install. This EP is about making | ||
the install experience around the existing CVO-based components more flexible, not adding new components to the | ||
install experience. | ||
* Allowing components to be disabled post-install. | ||
* Eliminating or replacing cluster profiles | ||
* Encoding logic in the installer itself about which components should be disabled under specific circumstances | ||
|
||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
* As a user creating a new cluster that will be managed programmatically, I do not want the additional | ||
security exposure and resource overhead of running the web console. I would like a way to install | ||
a cluster that has no console out of the box, rather than having to disable it post-install or | ||
modify rendered manifests in a way that requires deep understanding of the OCP components/resources. | ||
|
||
* As a team scaffolding a managed service based on openshift, I want to minimize the footprint of my | ||
clusters to the components I need for the service. | ||
|
||
* As a user creating a cluster that will never run an image registry, I do not want the additional overhead | ||
of running the image registry operator, or have to remove the default registry that is created. | ||
|
||
* As a team packaging openshift for a specific use case such as edge deployments, I want to provide | ||
an install experience that disables components that aren't needed for my use case. | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
The CVO already has the ability to respect annotations on resources, as can be seen | ||
[here](https://github.com/openshift/cluster-kube-apiserver-operator/blob/c03c9edf5fddf4e3fb1bc6d7afcd2a2284ca03d8/manifests/0000_20_kube-apiserver-operator_06_deployment.yaml#L10) and leveraged [here](https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/assets/cluster-version-operator/cluster-version-operator-deployment.yaml#L47-L48). | ||
This proposal consists of two parts: | ||
|
||
1) Formalizing a concept of a "capability" annotation which allows a given resource to be excluded based | ||
on installer input. For example the console related resources could be annotated as | ||
|
||
```yaml | ||
annotations: | ||
capability.openshift.io/console: "true" | ||
``` | ||
2) Defining an [install config api](https://github.com/openshift/installer/blob/790048166067273d34f76bea4220fa395b1cce1b/pkg/types/installconfig.go#L70) field whereby the user can opt out of specific capabilities. | ||
InstallConfig.ExcludeCapabilities | ||
- console | ||
- samples | ||
Which resources ultimately get installed for a given cluster would be the set of resources encompassed | ||
by the CLUSTER_PROFILE(if any), minus any resources explicitly excluded by the excluded capabilities configuration. | ||
Examples of candidate components to be treated as capabilities: | ||
* console | ||
* imageregistry | ||
* samples | ||
* cluster baremetal operator | ||
* olm/marketplace | ||
* kube-storage-version-migrator | ||
* csi-* | ||
* insights | ||
* monitoring | ||
* ??? | ||
To distinguish between the user intent of "Give me the default components that are suitable for my install" and | ||
"do not exclude any components", users can specify `none` as the capability to exclude. In the event that we | ||
want the installer to apply some default exclusions in specific scenarios when the user has not expressed any | ||
specific desire to exclude particular components, this `none` specification will be a way for the user to opt out | ||
of having those defaults applied. | ||
|
||
Alternative: | ||
If we think it's better to invert it, such that "if you tell us nothing, you get all the components", we can instead | ||
define a `default` component keyword, such that if you don't specify any exclusions, the installer leaves it alone | ||
and you get everything, but if you specify `default`, then the installer knows to replace the exclusion list with | ||
whatever the installer thinks is best for your particular platform/whatever. | ||
|
||
|
||
3) Pass the list of filtered annotations to the CVO. This is distinct from overrides because overrides | ||
put the cluster in an unsupported state. Filtered annotations are supported for upgrades. The filtered | ||
components will be listed in the [ClusterVersion](https://github.com/openshift/api/blob/4436dc8be01e8dcd8b250e1b32bb0fbd64ba78ac/config/v1/types_cluster_version.go#L35) object: | ||
|
||
```yaml | ||
spec: | ||
excludedCapabilities: | ||
- console | ||
- samples | ||
``` | ||
|
||
The CVO will filter out(not apply/reconcile) resources that are annotated with `capability.openshift.io/$exclusions` | ||
|
||
If a resource has multiple `capability.openshift.io` annotations, then the resource will only be filtered if all | ||
the annotations on the resource are matched by a configured filter. | ||
|
||
If the install-config specified `none` as the excluded component, the CVO list will be empty. If the | ||
install-config specified `default`, the CVO list will be whatever the installer chooses to disable for | ||
the install. The CVO does not need to be aware of those special keywords. | ||
|
||
4) Admin can remove an item from the filtered annotations list, but they cannot add an item to it. If an | ||
item is removed, the CVO will apply the previously filtered(skipped) resources to the cluster on the next reconciliation. | ||
Adding an item to the filtered list is not supported because it requires the component be removed from the | ||
running cluster which has more significant implications for how all traces of the component are removed. | ||
|
||
The currently configured filter list for the CVO should be recorded in telemeter so we can understand | ||
the configuration of a given cluster. | ||
|
||
If we want components to be disabled by default (either existing components or new ones added in the future), | ||
we can add their component names to a list of default-disabled components that the installer populates | ||
the install-config with. Users can then edit that disabled list in the install-config to remove those | ||
components if they want them enabled, and add other components they'd like to disable. | ||
|
||
In the future, we might allow specific APIs to be disabled, such as the build api. This could be done by | ||
defining additional capability keywords that can be put into the install-config field being defined here, | ||
which would drive the creation of cluster config that the apiserver+controllers used to disable that particular | ||
api. | ||
|
||
|
||
### Risks and Mitigations | ||
|
||
The primary risk is that teams understand how to use these new annotations and apply them | ||
correctly to the full set of resources that make up their component. Inconsistent or | ||
partial labeling will result in inconsistent or partially deployed resources for a component. | ||
|
||
Another risk is that this introduces more deployment configurations which might | ||
have unforeseen consequences (e.g. not installing the imageregistry causes some | ||
other component that assumes there is always an imageregistry or assumes the | ||
presence of some CRD api that is installed with the imageregistry to break). | ||
|
||
There was some discussion about the pros/cons of allowing each component to be enabled/disabled independent | ||
of that component explicitly opting into a particular (presumably well tested) configuration/topology | ||
[here](https://github.com/openshift/enhancements/pull/200#discussion_r375837903). The position of this EP is that | ||
we should only recommend the exclusion of fully independent "capability" components that are not depended on by | ||
other components. Further the assumption is that it will be reasonable to tell a customer who disabled | ||
something and ended up with a non-functional cluster that their chosen exclusions are simply not supported | ||
currently, or that they must accept the degraded functionality caused by the missing dependency, if they | ||
intend to keep it disabled. | ||
|
||
Since the only components/resources that can be filtered out of the installation are ones that are explicilty | ||
annotated with `capability.openshift.io/$component`, end-users will not be able to use this mechanism to filter | ||
components/resources that we did not intend for them to be able to filter out. | ||
|
||
## Design Details | ||
|
||
### Open Questions | ||
|
||
|
||
1. Do we want to constrain this functionality to turning off individual components? We could | ||
also use it to | ||
a) turn on/off groups of components as defined by "solutions" (e.g. a "headless" solution | ||
which might turn off the console but also some other components). This is what CLUSTER_PROFILES | ||
sort of enable, but there seems to be reluctance to expand the cluster profile use case to include | ||
these sorts of things. | ||
b) enable/disable specific configurations such as "HA", where components could contribute multiple | ||
deployment definitions for different configurations and then the installer/CVO would select the correct | ||
one based on the chosen install configuration (HA vs single node) instead of having components read/reconcile | ||
the infrastructure resource. | ||
|
||
Current plan is to constrain this functionality to component level controls. | ||
|
||
2. How does the admin enable a component post-install if they change their mind about what components | ||
they want enabled? Do we need/want to allow this? | ||
|
||
Turning on a component later is relatively easy (we expose a config resource for the CVO that defines | ||
the filter, we allow the user to remove items from the filter, the CVO will apply the previously | ||
filtered resources during the next reconciliation). | ||
|
||
Turning off a component later is more problematic because | ||
a) The CVO doesn't delete resources today, so that would be a new thing to teach it to do. | ||
b) Just deleting the resources for the component isn't sufficient, as the component also needs to clean | ||
itself up in case it created any additional resources on the cluster or contributed any configuration. | ||
|
||
Therefore we plan to support turning a component on, but not turning it off. | ||
|
||
3. What are the implications for upgrades if a future upgrade would add a component or resource which would | ||
have been filtered out during install time? | ||
|
||
There should be no implication here, the CVO has the list of annotations it will filter based on, if the | ||
new resources match those annotations, the new resources will also be filtered(never applied to the cluster). | ||
|
||
4. How prescriptive do we want to be about what can/can't be turned off? Components need to opt into | ||
this by annotating their resources, so it's not completely arbitrary. | ||
|
||
This will need to be evaluated on a case by case basis as a component considers adding the annotation | ||
to its resources that will allow it to be filtered out/disabled. | ||
|
||
|
||
5. What to do for components where disabling them has implications on other components or the way certain | ||
apis behave. Example: disabling the internal registry changes the behavior of imagestreams | ||
(can't push to the imagestream anymore to push content to the internal registry) as well as the assumptions | ||
made by tools like new-app/new-build (create imagestreams that push to the internal registry). | ||
|
||
6. What to do(if anything) for components with interdependencies, to ensure a user doesn't break | ||
enabled components by disabling a dependency? Options include: | ||
|
||
* Do nothing other than document dependencies so users know what not to turn off | ||
* Don't even annotate dependency resources for filtering, so if something is a dependency it cannot be turned off | ||
* Logic in the install or CVO that intelligently analyzes the filters the user has supplied and checks | ||
for dependency issues (least desirable solution imho). | ||
|
||
7. Is Capabilities an ok name for these things/fields? We'll want to get the naming right on the api to | ||
convey the right expectations for users. We may revisit this name during implementation. | ||
|
||
### Test Plan | ||
|
||
1) Install clusters w/ the various add-on components included/excluded and confirm | ||
that the cluster is functional but only running the expected add-ons. | ||
|
||
2) Upgrade a cluster to a new version that includes new resources that belong to | ||
an addon that was included in the original install. The new resources should be | ||
created. | ||
|
||
3) Upgrade a cluster to a new version that includes new resources that belong to | ||
an addon that was excluded in the original install. The new resources should *not* be | ||
created. | ||
|
||
4) After installing a cluster, enable additional addons. The newly enabled addons should | ||
be installed/reconciled by the CVO. | ||
|
||
5) After installing a cluster, disable an addon. The configuration change should be | ||
rejected by the CVO. Disabling a component post-install is not supported. | ||
|
||
|
||
|
||
### Graduation Criteria | ||
|
||
Would expect this to go directly to GA once a design is agreed upon/approved. | ||
|
||
#### Dev Preview -> Tech Preview | ||
N/A | ||
|
||
#### Tech Preview -> GA | ||
N/A | ||
|
||
#### Removing a deprecated feature | ||
|
||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
For upgrades, any new resources should have the same exclusion filters applied by the CVO. | ||
For downgrades, if downgrading below the version of the CVO that supports this logic | ||
previously excluded components will get created on the cluster. This is likely | ||
not a concern since you can't downgrade below the version you started at, and if | ||
you're using this feature that means you started at a version of the CVO that supports it. | ||
|
||
If we allow enabling filters post-install, then we need to revisit the implications of | ||
downgrades. | ||
|
||
There is also some risk if a particular resource has different annotations in different | ||
versions, then upgrading/downgrading could change whether that resource is excluded by | ||
the CVO or not. Once created, the CVO never deletes resources, so some manual cleanup | ||
might be needed to achieve the desired state. For downgrades this is probably acceptable, | ||
for upgrades this could be a concern (resource A wasn't excluded in v1, but is excluded | ||
in v2. Clusters that upgrade from v1 to v2 will still have resource A, but clusters | ||
installed at v2 will not have it). Technically this situation can already arise today | ||
if a resource is deleted from the payload between versions. | ||
|
||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Implementation History | ||
|
||
N/A | ||
|
||
## Drawbacks | ||
|
||
The primary drawback is that this increases the matrix of cluster configurations/topologies and | ||
the behavior that is expected from each permutation. | ||
|
||
## Alternatives | ||
|
||
* CVO already supports a CLUSTER_PROFILE env variable. We could define specific profiles like "headless" | ||
that disables the console. CLUSTER_PROFILE isn't a great fit because the idea there is to define a relatively | ||
small set of profiles to define specific sets of components to be included, not to allow a user to fully pick | ||
and choose individual components. We would have to define a large set of profiles to encompass all the possible | ||
combinations of components to be enabled/disabled. | ||
|
||
* CVO already supports an EXCLUDE_MANIFESTS env variable which is used to implement the ROKS deployment topology. | ||
Unfortunately it only allows a single annotation to be specified, so even if we want to use it for this purpose | ||
it needs to be extended to support multiple annotations so multiple individual components can be excluded | ||
independently rather than requiring all components to be excluded to share a single common annotation. | ||
|
||
Regardless we need a way to expose this configuration as a first class part of the install config provided by the | ||
user creating the cluster, so at a minimum we need to add a mechanism to wire an install config value into | ||
the CVO arguments and allow the CVO to consume more than a single annotation to exclude. | ||
|
||
* Allow the installer to specify additional resources to `include` in addition to ones to `exclude`. This has the challenge | ||
of potentially conflicting with the specific set of resources that a cluster_profile defines. There are some | ||
components that should never be deployed in a particular cluster_profile and so we do not want to allow the user | ||
to add them. Examples would be resources that should only be created in standalone installs, not hypershift | ||
managed ones, because hypershift has its own versions of those resources. | ||
|
||
* Use clusteroverrides to exclude content. The problem w/ this approach is it puts the cluster into an unsupported | ||
and non-upgradeable state. | ||
|
||
|
||
## Infrastructure Needed | ||
|
||
N/A |