Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incusd/instance/common: Fix CanMigrate mutating devices #649

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

stgraber
Copy link
Member

This was leading to very odd validation errors in some cases if the instance being considered as part of a cluster restore happened to have a network device that would get an otherwise restricted key be internally set.

The root of the issue was that CanMigrate didn't operate against a copy of the devices but would instead mutate the whole devices config, which would then trigger the validation error when sent through the devices logic for the second time as part of startup.

This issue would only hit if:

  • The setup is clustered
  • The cluster is evacuated with action=stop
  • The user askes for the cluster member to be restored
  • The cluster member has one or more instances using a device that internally expands its configuration (like an OVN network card).
  • Those instances process such a device prior to any other check failing CanMigrate.
  • One or more of those instances are local to the server performing the check.

Sponsored-by: Buddy (https://buddy.works)

This was leading to very odd validation errors in some cases if the
instance being considered as part of a cluster restore happened to have
a network device that would get an otherwise restricted key be
internally set.

The root of the issue was that CanMigrate didn't operate against a copy
of the devices but would instead mutate the whole devices config, which
would then trigger the validation error when sent through the devices
logic for the second time as part of startup.

This issue would only hit if:
 - The setup is clustered
 - The cluster is evacuated with action=stop
 - The user askes for the cluster member to be restored
 - The cluster member has one or more instances using a device that
   internally expands its configuration (like an OVN network card).
 - Those instances process such a device prior to any other check
   failing CanMigrate.
 - One or more of those instances are local to the server performing the check.

Signed-off-by: Stéphane Graber <[email protected]>
Sponsored-by: Buddy (https://buddy.works)
@hallyn hallyn merged commit e827c47 into lxc:main Mar 23, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants