clusterctl can fail idempotency when installing cert-manager during upgrade #10389
Labels
area/clusterctl
Issues or PRs related to clusterctl
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
What steps did you take and what happened?
not necessary a blocking bug, but IIUC there is something that can be improved for the idempotency of clusterctl upgrades of certmanager. appreciated feedback on this as i was not able to reproduce it personally (downstream report) and i'm not too familiar with cert-manager objects.
reading the logic of clusterctl upgrade.
ApplyUpgrade is called:
cluster-api/cmd/clusterctl/client/upgrade.go
Line 128 in a934904
EnsureLatestVersion for certmanager is called:
cluster-api/cmd/clusterctl/client/upgrade.go
Line 160 in a934904
then shouldUpgrade() is called:
cluster-api/cmd/clusterctl/client/cluster/cert_manager.go
Line 247 in a934904
which determines if an upgrade is required, by looking at objects that have the version annotation
cluster-api/cmd/clusterctl/client/cluster/cert_manager.go
Line 348 in a934904
if not it returns, if yes it calls install()
cluster-api/cmd/clusterctl/client/cluster/cert_manager.go
Lines 183 to 195 in a934904
if i'm understanding this right, if during the install() loop a certain creatObj fails, then EnsureLatestVersion() will exit in an error, which is OK.
however, this can leave the cert-manager install in a partial state, so if the user attempt to call ApplyUpgrade() again, shouldUpgrade() can return false because the version annotation could be there and up-to-date, however not all objects were installed.
before install() objects are deleted correctly
https://github.com/kubernetes-sigs/cluster-api/blob/main/cmd/clusterctl/client/cluster/cert_manager.go#L263-L269
proposed solution - check if all obj are installed before checking version.
not sure how feasible is that as it must "keep inventory" somehow
or other ideas are welcome.
What did you expect to happen?
to be possible for clusterctl to recover from partially installed cert-manager with idempotent retries.
Cluster API version
latest master
Kubernetes version
latest master
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
/area clusterctl
The text was updated successfully, but these errors were encountered: