-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAPI capi-e2e-main-1-24-latest failing consistently since June 1st #6596
Comments
it seems there is a similar problem in capi-e2e-main-1-22-1-23 |
I think it's something with the new registry, which is used for >= v1.25 Reposting from Slack: I don't think this has been caused by our recent changes. This job succeeded the first time after those changes afaik. Testgrid CAPO/CAPI:
Note: I think the root cause is the same for the CAPI/CAPO failures but the fixes are probably different. |
I tested it locally with the following results: tl;dr the cause is that since 1st June the new (kubeadm) default registry is registry.k8s.io. Usually this is not a problem as:
In our case CAPD CI fails because we are relying on the correct images being baked into the kind image. When the v1.25-x machine comes up we get errors like the following:
In theory we can work around this by choosing the KCP imageRepository based on Kubernetes version in the ClusterClass. I tried manually to use registry.k8s.io for the v1.25-x machines (by upgrading the field in KCP) and it worked. But I think we should consider a more general solution. As @neolit123 mentioned in Slack:
When the imageRepository is not set in KCP the clusterConfiguration.imageRepository in the I would suggest the following change to the upgrade behavior of KCP:
If I got it right this would align our KCP upgrade implementation to kubeadm upgrade (kubernetes/kubernetes#110343) WDYT @fabriziopandini ? |
I think the CAPD node images should bake registry.k8s.io component images if the kubeadm version is at least 1.25.0-0. The issue here is that kubeadm 1.25.0-0 would need registry.k8s.io images by default and if another repo is passed via imageRepository it will be considered custom (e.g. coredns path is different). We had a similar problem in kinder / kubeadm e2e. So what we do is check the kubeadm version and bake the component images from tars differently. |
@neolit123 We already have the registry.k8s.io images baked in. Our problem is that we don't have the kubeadm upgrade logic. So an upgrade test which init's v1.24 and then upgrades to v1.25 is still trying to use the old registry. I think we have to implement the "kubeadm upgrade" change in KCP anyway and this will then also fix our test. |
ok, understood.
yes, KCP has to do it. |
For completeness. Upstream kubeadm issues/PRs:
I added it as release specific task to our v1.25 tracking issue #6661 |
Talked with Fabrizio about this issue, we would do the following:
|
I'd like to tackle this /assign |
To note: this won't actually fix the issue our tests have yet, because we always set a imageRepository via ClusterClass. |
Yup. We should drop that patch |
I think we should just not do the patch if the value is empty instead :-) keeping it customizable?! (And to do so, change the default value to empty sting) |
I don't think there is any need for that patch to be honest. It was just a random patch we added for testing. To be clear. I'm only referring to the ClusterClass we are using in e2e tests. I don't think we have to keep it customizable. (and keep a patch that is not covered by our tests) But probably the same applies to our CAPD ClusterClass as well. It's just a test/dev provider, explicitly not recommended for production. So I don't think we have to provide a variable to allow customization of the imageRepository |
@sbueringer : but doesn't the xref: cluster-api/test/e2e/quick_start_test.go Line 27 in 37a4b2b
|
Given that they are not in sync today, it's not a hard requirement :). But I think we can drop the variable+patch from both. I think it would be a good practice to keep them in sync and it would be nice to ~ use the patches we have in our e2e test so we have some confidence that they work / didn't just break at some point. |
Would be great if someone can keep an eye on testgrid the next few days |
I will do so 😀 I'd like to see green test grids |
For the record: was green for two times in series since the fix was merged. |
see https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main-1-24-latest
this should be related to one of the recent changes in the test framework
The text was updated successfully, but these errors were encountered: