Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposal: instancetype.kubevirt.io - Expand during submission #326

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lyarwood
Copy link
Member

@lyarwood lyarwood commented Sep 24, 2024

What this PR does / why we need it:

The initial design of instance types and preferences captured a point in time copy of the original resource for future use by the VirtualMachine to ensure we always generate the same VirtualMachineInstance at runtime.

This additionally allows users to easily switch between classes, sizes and generations of these resources at some point in the future. However this flexibility comes at the cost of complexity within KubeVirt itself especially handling more advanced VirtualMachine lifecycle flows.

This design proposal aims to set out a new cluster configurable to control how instance types and preferences are referenced from a VirtualMachine.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign davidvossel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 24, 2024
@lyarwood lyarwood force-pushed the instancetype-expand-by-default branch from 8b0c355 to 6e77451 Compare September 24, 2024 11:38

## Update/Rollback Compatibility

There will be no ability to rollback VirtualMachines that have already had their instance type and/or preference expanded by this new functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also no longer possible to change the resource sizing of the VM easily by selecting a different instance type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically suggested here but I'll explicitly call that out.

@vladikr
Copy link
Member

vladikr commented Sep 24, 2024

/cc

@lyarwood lyarwood force-pushed the instancetype-expand-by-default branch 4 times, most recently from cb53268 to d083ff6 Compare September 25, 2024 13:24
@lyarwood lyarwood force-pushed the instancetype-expand-by-default branch 2 times, most recently from 338a2b6 to a07a9d1 Compare October 9, 2024 18:51
Comment on lines 100 to 101
This could however result in drastic changes to existing VirtualMachine objects
without user interaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow: a user interaction took place if the configurable was set to expand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reword to explicitly state that this can happen without the VM owner interacting with the VM thanks.

Comment on lines 103 to 105
We would also still need to retain webhook validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it recommended for KubeVirt to move the validation logic from the webhook into the controller. Validating VM definition against cluster-level objects such is instance types or preferences is raceful by design and breaks eventual consistency.

But I do not see how this is related to the design proposal. Can you elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand but while there are still validations taking place in the webhooks we will need to expand any provided instance type and preference to retain the same behaviour as a flat VirtualMachine. Once these validations are moved into the control loop then yes the need for this is also removed.

I wanted to mention this in passing as it's logic we would still need to carry in the project that isn't removed by eventually expanding later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the following represents what you are saying?

Suggested change
We would also still need to retain webhook validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.
Note that this proposal does not affect the validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.


A user can explicitly ask for the associated instance type to be expanded into
the `VirtualMachine` by removing `revisionName` and setting `policy` to
`expand`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't adding policy:expand explicit enough? Why removing revisionName is required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal stops short of expanding all VirtualMachines and would use revisionName to determine if the VirtualMachine had already been seen by the controller and revisions stored. If we went with the alternative of expanding all regardless of a revision already being captured then policy being set to expand would be enough to trigger the behaviour.

they have their instance type or preference expanded by this new functionality.

Users will also be unable to resize their `VirtualMachines` by making a singular
choice of instance type in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It is still OK to define instance-typed VM specs, why would it not be ok to modify an existing VM to be instance typed? In properly eventual-consistent systems, the intention of the user is what matters most, not the history of the resource.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present instance types conflict if the VirtualMachine already provides an amount of vCPU or memory.

We have talked about changing this behaviour in #318 and in doing so could allow a user to resize flat VirtualMachines by selecting an instance type but that might require a follow up design proposal and discussion to work out the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow... Assume that I created a flat VM with 7 CPUs. I stopped my VM and then changed my mind. I would like it to be much larger and more standard and edit the VM to use cx1.8xlarge. Would KubeVirt attempt to block this? Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present yes it would unless you also removed the 7 vCPUs you originally requested within the VirtualMachine.

We just don't allow you to request vCPUs and memory through the VirtualMachineInstanceSpec and instance type at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present yes it would unless you also removed the 7 vCPUs you originally requested within the VirtualMachine.

We just don't allow you to request vCPUs and memory through the VirtualMachineInstanceSpec and instance type at the same time.

this is clear and expected. I just don't understand it from "Users will also be unable to resize their VirtualMachines by making a singular choice of instance type in the future". Would you agree to rephrase the text?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah apologies I see your point, I'll rephrase now.

Comment on lines 53 to 55
Additional `VirtualMachine` configurables will also be introduced to allow a
specific newly submitted VirtualMachine referencing instance types or
preferences to be expanded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider alternatives where this is not available? My question stems form me not being aware of the second user story (a VM owner wanting to control this). I heard only of cluster admins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can capture that in alternatives but personally I think that VM Owners should have the ability to expand or not regardless of the admins choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But if I understand correctly, the biggest issue today is the complexity for third parties. If each VM owner can choose whatever they want, the complexity remains.

I would prefer have an opinionated solution, with less knobs and options; if we see a use case for the VM owners having this freedom to work against their cluster admin, we can add per-VM knob later. Removing knobs is practically impossible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That complexity will remain still even if this configurable is limited to cluster admins surely, especially given that we are not changing the default here.

Making this a cluster admin only configurable will mean that the end user experience of using a VirtualMachine initially defined with instance types and preferences can change drastically between environments and there's nothing a user can do to avoid this.

By providing an additional VM owner configurable they can be explicit and know that the resulting VirtualMachine will behave as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users will also be unable to resize their VirtualMachines by making a singular
choice of instance type in the future

Why? A cluster admin that would like to interact with a third party can configure expand on her cluster, and that's that. If we complicate the solution and let the VM owner choose his own semantic, the cluster admin is left with uncertainty.

Making this a cluster admin only configurable will mean that the end user experience of using a VirtualMachine initially defined with instance types and preferences can change drastically between environments and there's nothing a user can do to avoid this.

Correct. And that's exactly what I, as a cluster admin, want. I need to interact with a third party that is unaware of instance types, so I want my cluster to comply.

By providing an additional VM owner configurable they can be explicit and know that the resulting VirtualMachine will behave as expected.

Right, but this adds complexity to the code, which I am not aware was requested by any user, increases development time, expands test matrix and add possibility of bugs, and cannot be dropped out of the API in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay these are fair points, I'll remove the user configurables for now and we can always follow up in the future if requested.

Comment on lines 53 to 55
Additional `VirtualMachine` configurables will also be introduced to allow a
specific newly submitted VirtualMachine referencing instance types or
preferences to be expanded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But if I understand correctly, the biggest issue today is the complexity for third parties. If each VM owner can choose whatever they want, the complexity remains.

I would prefer have an opinionated solution, with less knobs and options; if we see a use case for the VM owners having this freedom to work against their cluster admin, we can add per-VM knob later. Removing knobs is practically impossible.

Comment on lines 103 to 105
We would also still need to retain webhook validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the following represents what you are saying?

Suggested change
We would also still need to retain webhook validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.
Note that this proposal does not affect the validation logic to ensure that a
given `VirtualMachine` using an instance type or preference is valid during
submission.

they have their instance type or preference expanded by this new functionality.

Users will also be unable to resize their `VirtualMachines` by making a singular
choice of instance type in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow... Assume that I created a flat VM with 7 CPUs. I stopped my VM and then changed my mind. I would like it to be much larger and more standard and edit the VM to use cx1.8xlarge. Would KubeVirt attempt to block this? Why?

A new KubeVirt configurable will be introduced to control how instance
types and preferences are referenced from VirtualMachines.

This configurable will provide an `InstancetypeReferencePolicy` that
encapsulates this behaviour. The following policies will be initially
provided:

* `reference` (default) - This is the original reference behaviour of
  instance types and preferences where a ControllerRevision is captured
  and referenced from the VirtualMachine.
* `expand` - This is a new behaviour where any instance type and
  preferences are expanded into the VirtualMachine if a
  ControllerRevision hasn't already been captured and referenced.
* `expandAll` - The same behaviour as expand but regardless of a
  ControllerRevision being captured already.

Signed-off-by: Lee Yarwood <[email protected]>
@lyarwood lyarwood marked this pull request as ready for review October 21, 2024 10:01
@lyarwood lyarwood changed the title WIP - design-proposal: instancetype.kubevirt.io - Expand during submission design-proposal: instancetype.kubevirt.io - Expand during submission Oct 21, 2024
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2024
@lyarwood lyarwood force-pushed the instancetype-expand-by-default branch from a07a9d1 to a99a0fc Compare October 21, 2024 17:19
@lyarwood
Copy link
Member Author

It does help if I push to the correct branch when attempting to update a PR, apologies!

Comment on lines +39 to +40
* As a cluster owner I want to control how instance types and preferences are
referenced from VirtualMachines within my environment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this text too vague. It is not clear here what the cluster admin would like to control and how. Can we return to something more similar to

  • As a cluster owner I want to configure the expansion of all
    VirtualMachines referencing instance types and preferences

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested configurable still allows a VirtualMachine to reference an instance type and preference so to me it doesn't make sense to focus on expansion here.

Copy link
Member

@dankenigsberg dankenigsberg Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the goal should be clear regardless of the implementation. Let me try to suggest another text for the goal:

  • As a cluster admin I want to use a third party backup vendor which is not familiar with instance types to backup my VMs.

Expanding all VMs in the cluster is our approach to meet this goal. An alternative, which should probably mentioned in the proposal, is to speak with all backup vendors and teach them about instance types. Unfortunately this can take a very long time and depends on others, so we prefer to add expandAll.

* `reference` (default) - This is the original reference behaviour of instance
types and preferences where a ControllerRevision is captured and referenced
from the VirtualMachine.
* `expand` - This is a new behaviour where any instance type and preferences are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need the new expand option. The only use case I am familiar with is cluster-admins saying "instance types are too complex for me and the third parties I'd like to interact with my cluster; please expand all VMs in my cluster".

Adding more options typically means more development time, more tests, more docs, more bugs, more complexity. It would be quite possible to add this option any time in the future when it is needed. It is practically impossible to remove the option if we add it now. Hence I'd prefer not to have this functionality now.

Now I may well be wrong; Is there a class of cluster admins that need the differentiation between expand and expandAll? Can you elaborate on the goal they would like to achieve by the expand semantics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not coming back to this ahead of the feature landing in main but we were up against FF and have ended up placing this behind a FG thus allowing us to add and remove policies in the future.

I am not sure why we need the new expand option. The only use case I am familiar with is cluster-admins saying "instance types are too complex for me and the third parties I'd like to interact with my cluster; please expand all VMs in my cluster".

That use case being put forward by stakeholders rather than actual users FWIW. I also believe end users will want the ability to leave existing VMs alone but expand new VMs thus the additional policy here.

Adding more options typically means more development time, more tests, more docs, more bugs, more complexity. It would be quite possible to add this option any time in the future when it is needed. It is practically impossible to remove the option if we add it now. Hence I'd prefer not to have this functionality now.

Now that this entire feature is behind a FG we have some room to deprecate and remove policies moving forward. Given this policy also shares a common implementation with expandAll with only the trigger being different I wanted to include it in the initial PR.

Now I may well be wrong; Is there a class of cluster admins that need the differentiation between expand and expandAll? Can you elaborate on the goal they would like to achieve by the expand semantics?

I believe so but having not heard from any end users about any of this functionality thus far I'm obviously guessing. As I've said above the goal of the policy is to leave existing VMs alone.

@lyarwood lyarwood force-pushed the instancetype-expand-by-default branch from 756bbe2 to a99a0fc Compare November 4, 2024 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants