Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposal: instancetype.kubevirt.io - Expand during submission #326

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
255 changes: 255 additions & 0 deletions design-proposals/instancetype.kubevirt.io/expand-during-submission.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
# Overview

The initial design of instance types and preferences captured a point in time
revision of the original resource for future use by the `VirtualMachine` to
ensure we always generate the same `VirtualMachineInstance` at runtime.

This additionally allows users to easily switch between classes, sizes and
generations of these resources at some point in the future. However this
flexibility comes at the cost of complexity within KubeVirt itself when handling
more advanced `VirtualMachine` lifecycle flows.

This design proposal aims to set out new alternative and configurable behavior
where these revisions are no longer captured and are instead expanded into the
`VirtualMachine`.

## Motivation

The complexity of managing these point in time revisions of instance types and
preferences has grown over time as more features and versions of the CRDs have
landed.

Some of this complexity is exposed to users and third party integrations
such as back-up tooling or user-interfaces built on top of KubeVirt. This can
take the form of differing entry points for certain functionality such as hot
plug to requiring knowledge of the stored revisions to allow for a valid back up
and eventual restore of a given VirtualMachine to happen.

## Goals

* Provide a simple cluster configurable to control how
instance types and preferences are referenced from a VirtualMachine

## Non Goals

* The default behavior will not change as part of this work

## User Stories

* As a cluster owner I want to control how instance types and preferences are
referenced from VirtualMachines within my environment
Comment on lines +39 to +40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this text too vague. It is not clear here what the cluster admin would like to control and how. Can we return to something more similar to

  • As a cluster owner I want to configure the expansion of all
    VirtualMachines referencing instance types and preferences

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested configurable still allows a VirtualMachine to reference an instance type and preference so to me it doesn't make sense to focus on expansion here.

Copy link
Member

@dankenigsberg dankenigsberg Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the goal should be clear regardless of the implementation. Let me try to suggest another text for the goal:

  • As a cluster admin I want to use a third party backup vendor which is not familiar with instance types to backup my VMs.

Expanding all VMs in the cluster is our approach to meet this goal. An alternative, which should probably mentioned in the proposal, is to speak with all backup vendors and teach them about instance types. Unfortunately this can take a very long time and depends on others, so we prefer to add expandAll.


## Repos

* kubevirt/kubevirt

## Design

A new KubeVirt configurable will be introduced to control how instance types and
preferences are referenced from VirtualMachines.

This configurable will provide an `InstancetypeReferencePolicy` that
encapsulates this behaviour. The following policies will be initially provided:

* `reference` (default) - This is the original reference behaviour of instance
types and preferences where a ControllerRevision is captured and referenced
from the VirtualMachine.
* `expand` - This is a new behaviour where any instance type and preferences are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need the new expand option. The only use case I am familiar with is cluster-admins saying "instance types are too complex for me and the third parties I'd like to interact with my cluster; please expand all VMs in my cluster".

Adding more options typically means more development time, more tests, more docs, more bugs, more complexity. It would be quite possible to add this option any time in the future when it is needed. It is practically impossible to remove the option if we add it now. Hence I'd prefer not to have this functionality now.

Now I may well be wrong; Is there a class of cluster admins that need the differentiation between expand and expandAll? Can you elaborate on the goal they would like to achieve by the expand semantics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not coming back to this ahead of the feature landing in main but we were up against FF and have ended up placing this behind a FG thus allowing us to add and remove policies in the future.

I am not sure why we need the new expand option. The only use case I am familiar with is cluster-admins saying "instance types are too complex for me and the third parties I'd like to interact with my cluster; please expand all VMs in my cluster".

That use case being put forward by stakeholders rather than actual users FWIW. I also believe end users will want the ability to leave existing VMs alone but expand new VMs thus the additional policy here.

Adding more options typically means more development time, more tests, more docs, more bugs, more complexity. It would be quite possible to add this option any time in the future when it is needed. It is practically impossible to remove the option if we add it now. Hence I'd prefer not to have this functionality now.

Now that this entire feature is behind a FG we have some room to deprecate and remove policies moving forward. Given this policy also shares a common implementation with expandAll with only the trigger being different I wanted to include it in the initial PR.

Now I may well be wrong; Is there a class of cluster admins that need the differentiation between expand and expandAll? Can you elaborate on the goal they would like to achieve by the expand semantics?

I believe so but having not heard from any end users about any of this functionality thus far I'm obviously guessing. As I've said above the goal of the policy is to leave existing VMs alone.

expanded into the VirtualMachine if a ControllerRevision hasn't already been
captured and referenced.
* `expandAll` - The same behaviour as expand but regardless of a
ControllerRevision being captured already.

## Concerns

### Exposing users to API complexity within VirtualMachines

One of the original design goals with instance types and preferences was to
simplify creation by reducing a users exposure to the core
`VirtualMachineInstanceSpec` API and thus their overall decision matrix.

While this proposal doesn't change the ability to simplify creation of
`VirtualMachines` it can result in a fully flattened `VirtualMachine` exposing
all of this complexity at that level once again.

### Breaking declarative management of VirtualMachines using instance types

This expansion behavior will break any declarative management of these
`VirtualMachines` as they will substantially change after initial submission.
VM owners will need to explicitly request to not expand their `VirtualMachines`
referencing instance types or preferences to avoid this.

## Alternatives

### Immutable instance types and preferences

The need to retain point in time revisions of instance types and preferences is
due to the simple fact that in the current implementation these resources are
mutable and can change over time. Thus to ensure we always get the same
`VirtualMachineInstance` at runtime revisions need to be taken and referenced
from the `VirtualMachine`.

We could possibly remove this requirement by making these object immutable and
thus dropping the need capture and reference `ControllerRevisions` from the
`VirtualMachines` at all.

This however still retains the need for additional logic in more complex
`VirtualMachine` lifecycle operations where we need to expand these now
immutable resources in the `VirtualMachine`.

### Expand by default and deprecate revision references

We could alter the default `policy` to `expandAll` and in doing so deprecate the
revision `reference` behaviour for eventual removal from the project ahead of
`instancetype.kubevirt.io` finally making it to `v1`.

### Keep existing behaviour

Ultimately we can also decide not to implement the core proposal or any of the
above alternatives and continue to support the original revision based flows.
Supporting users and third-party integrators with better documentation and
tooling for VirtualMachines referencing instance types or preferences.

## API Examples

The default `policy` will be `reference` and as such there should be no change
in behavior when this configurable is not provided.

A cluster admin can default the policy of all new `VirtualMachines` by setting
`expand` within the `KubeVirt` `CR`:

```yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kv
spec:
configuration:
instancetype:
referencePolicy: expand
```

```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: example
spec:
instancetype:
name: foo
preference:
name: bar
```

```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: example
spec:
template:
spec:
domain:
cpu:
sockets: 1
cores: 1
threads: 1
```

A cluster admin can also use the `expandAll` policy to have all VirtualMachines
expanded regardless of `revisionNames` already being captured.

```yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kv
spec:
configuration:
instancetype:
referencePolicy: expandAll
```

```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: example
spec:
instancetype:
name: foo
revisionName: revision-foo
preference:
name: bar
revisionName: revision-bar
```

```yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: example
spec:
template:
spec:
domain:
cpu:
sockets: 1
cores: 1
threads: 1
```

```go
// KubeVirtConfiguration holds all kubevirt configurations
type KubeVirtConfiguration struct {
[..]
// Instancetype configuration
Instancetype *InstancetypeConfiguration `json:"instancetype,omitempty"`
}

type InstancetypeConfiguration struct {
// ReferencePolicy defines how an instance type or preference should be referenced by the VM after submission, supported values are:
// reference (default) - Where a copy of the original object is stashed in a ControllerRevision and referenced by the VM.
// expand - Where the instance type or preference are expanded into the VM during submission with references removed.
// +nullable
// +kubebuilder:validation:Enum=reference;expand;expandAll
ReferencePolicy *InstancetypeReferencePolicy `json:"referencePolicy,omitempty"`
}

type InstancetypeReferencePolicy string

const (
// Copy any instance type or preference and reference from the VirtualMachine
Reference InstancetypeReferencePolicy = "reference"
// Expand any instance type or preference into VirtualMachines without a revisionName already captured
Expand InstancetypeReferencePolicy = "expand"
// Expand any instance type or preferences into all VirtualMachines
ExpandAll InstancetypeReferencePolicy = "expandAll"
)

```

## Scalability

The resulting mutation of the VirtualMachine with this proposal will
cause an additional reconciliation loop to trigger for the VM.

Work should be carried out to ensure that the substantial mutation of the
VirtualMachine during submission doesn't negatively impact the control plane.

## Update/Rollback Compatibility

There will be no ability to automatically rollback new VirtualMachines once
they have their instance type or preference expanded by this new functionality.

Users will also be unable to resize their VirtualMachines by making a singular
choice of instance type in the future without making further modifications to
their VirtualMachine.

## Functional Testing Approach

TBD

## Implementation Phases

TBD