Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Remove CI image pull and tagging for kubeadm injection script #6590

Conversation

jsturtevant
Copy link
Contributor

What this PR does / why we need it:
After the switch from k8s.gcr.io to registry.k8s.io (see kubernetes/kubernetes#109938 for details), the CI jobs are outputting the following when trying to extract the ci images:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-master/1532465763934277632

[2022-06-02 21:09:45] 
[2022-06-02 21:09:49] unpacking registry.k8s.io/kube-apiserver-amd64:v1.25.0-alpha.0.734_ba502ee555924a (sha256:4600d5b413457521bd7c01d24c0b21288c63f05137aa7547b6e0cb5e9a654bef)...done
[2022-06-02 21:09:49] ctr: image "k8s.gcr.io/kube-apiserver-amd64:v1.25.0-alpha.0.734_ba502ee555924a": not found`

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Partially Fixes kubernetes-sigs/cluster-api-provider-azure#2351

other info

I found our jobs are not failing since kubeadm is smart enough because we give it the kubernetesversion in the job config, even without this to pull the correct images so our jobs are still passing.

See https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/9e793b42f79b65cbc830e711dce63a2c1e7c82c0/templates/test/ci/cluster-template-prow-ci-version-windows-containerd-2022.yaml#L90

Is there a reason to pull these images?

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 3, 2022
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 3, 2022
@jsturtevant jsturtevant changed the title Use registry.k8s.io in kubeadm injection script 🌱 Use registry.k8s.io in kubeadm injection script Jun 3, 2022
@stmcginnis
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2022
@jackfrancis
Copy link
Contributor

lgtm

@sbueringer
Copy link
Member

sbueringer commented Jun 6, 2022

Is there a reason to pull these images?

@jsturtevant We needed the wget + import + tag of those images as they were not available in a registry (at least for CAPO and CAPA too afaik).

It's interesting that for CAPZ it works with image pull, e.g. here is a successfull image pull:

Jun 02 21:09:50.735341 capz-conf-x78nve-control-plane-cxrb9 containerd[1319]: time="2022-06-02T21:09:50.735271164Z" level=info msg="PullImage "gcr.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.0.734_ba502ee555924a""
Jun 02 21:09:54.684557 capz-conf-x78nve-control-plane-cxrb9 containerd[1319]: time="2022-06-02T21:09:54.684487533Z" level=info msg="ImageCreate event &ImageCreate{Name:gcr.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.0.734_ba502ee555924a,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"

Note: The download script fails in CAPZ but this doesn't fail the entire bootstrap process.

Looks like the CAPO tests are currently failing since the change in image names:

ctr: image "k8s.gcr.io/kube-apiserver-amd64:v1.25.0-alpha.0.729_62d9f8ba80f4cc": not found
...
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.25.0-alpha.0.729_62d9f8ba80f4cc

In general the change looks fine, but I think we have to use the old/new registry depending on the Kubernetes version, otherwise jobs using <v1.25 will fail (if there are any).

@dims Do you know why those images might be available for CAPZ but not for CAPO? (I read about the cloud-local redirects, but I don't know if that explains it)

Note: The logs above show different registries, but I"m not sure if that is the actual issue (maybe just +/- redirect), given that we have kubeadm image pulls in both cases and it works in one case but not the other:

@sbueringer
Copy link
Member

cc @mdbooth (for CAPO)

@dims
Copy link
Member

dims commented Jun 6, 2022

@ameukam @BenTheElder can you please peek?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2022
@vincepri
Copy link
Member

vincepri commented Jun 6, 2022

/hold

for the above

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2022
@jsturtevant
Copy link
Contributor Author

@jsturtevant We needed the wget + import + tag of those images as they were not available in a registry (at least for CAPO and CAPA too afaik).

something might be wrong in the configuration on openstack. Kubeadm in Azure is pulling: gcr.io/k8s-staging-ci-images/kube-proxy:v1.25.0-alpha.0.859_198dd7668a19c8\ where as the kubeadm in Openstack is pulling k8s.gcr.io/kube-controller-manager:v1.25.0-alpha.0.729_62d9f8ba80f4cc

notice the difference between gcr.io/k8s-staging-ci-images and k8s.gcr.io/kube-controller-manager. I don't think the CI images are pushed to k8s.gcr.io but are pushed to the gcr.io/k8s-staging-ci-images staging repo? I am not sure why kubeadm is configured to pull from there in Openstack and not Azure.

@BenTheElder
Copy link
Member

Yeah,

ctr: image "k8s.gcr.io/kube-apiserver-amd64:v1.25.0-alpha.0.729_62d9f8ba80f4cc": not found

An image with a tag like this won't be in k8s.gcr.io, tags there have to be promoted in the k8s.io repo, only release tags are present.

@sbueringer
Copy link
Member

sbueringer commented Jun 8, 2022

Thx! Sorry should have seen that.

Okay looks like the following:

I would suggest the following:

  • let's drop the entire image download + tagging from the script because:
    • we don't need it
    • we can avoid adding a Kubernetes version dependent if/else to tag the old/new image name based on Kubernetes version
  • CAPO should not pin to the production registry

@CecileRobertMichon
Copy link
Contributor

let's drop the entire image download + tagging from the script because:
we don't need it
we can avoid adding a Kubernetes version dependent if/else to tag the old/new image name based on Kubernetes version

+1

Kubeadm has the ability to automatically pull the CI based images if it
is a CI binary itself so we don't need to pull and tag these images
but we do need to get and replace the correct binaries.
@sbueringer
Copy link
Member

sbueringer commented Jun 8, 2022

Looping in CAPA in case they are (still/also) using this script cc @pydctw @richardcase

@jsturtevant jsturtevant force-pushed the update-kubeadm-script-registry.k8s.io branch from 75aaea2 to 30ba31a Compare June 8, 2022 17:57
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 8, 2022
@sbueringer
Copy link
Member

sbueringer commented Jun 8, 2022

Thank you!

/lgtm

It would be good to wait until next week with merge if possible. Just to get feedback from CAPO/CAPA (one of the CAPO maintainers at least is out this week). Otherwise I'm fine with just iterating, CAPO tests are currently broken anyway so this won't make it worse :).

@jsturtevant
Copy link
Contributor Author

/retitle Remove CI image pull and tagging for kubeadm injection script

@k8s-ci-robot k8s-ci-robot changed the title 🌱 Use registry.k8s.io in kubeadm injection script Remove CI image pull and tagging for kubeadm injection script Jun 9, 2022
@jsturtevant jsturtevant changed the title Remove CI image pull and tagging for kubeadm injection script 🌱 Remove CI image pull and tagging for kubeadm injection script Jun 9, 2022
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per comments in the thread above
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [fabriziopandini,vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2022
@fabriziopandini
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2022
@fabriziopandini
Copy link
Member

It would be good to wait until next week with merge if possible. Just to get feedback from CAPO/CAPA
@mdbooth @pydctw @richardcase

@sedefsavas
Copy link

CAPA k8s-main conformance test is broken since kubernetes/kubernetes#109938 is merged, not sure how it works for CAPZ but we were using the pulled images here.

I will need to further test if just changing the registry here works instead of removing it.

@sedefsavas
Copy link

For unblocking CAPZ, how about just changing k8s.gcr.io to registry.k8s.io in this PR so that CAPZ script does not fail? @jsturtevant @CecileRobertMichon

@jsturtevant
Copy link
Contributor Author

capz isn't blocked on this, we haven't taken a dependency on this e2e helper script yet but were considering it after this merged.

For unblocking CAPZ, how about just changing k8s.gcr.io to registry.k8s.io in this PR so that CAPZ script does not fail?

This is what I did in the initial version of this PR but it was discover that the logic for pull/tagging images is not required so switch to removing it #6590 (comment)

@sedefsavas
Copy link

This is what I did in the initial version of this PR but it was discover that the logic for pull/tagging images is not required so switch to removing it #6590 (comment)

CAPA also doesn't pin the registry to k8s.gcr.io but still it stopped working for us, need to collect the logs.

@sedefsavas
Copy link

@sbueringer follow up on #6590 (comment)

let's drop the entire image download + tagging from the script

Without importing and tagging part, ci kubeadm version fails to pull images. Errors I see:

root@ip-10-0-149-177:/var/snap/amazon-ssm-agent/5656# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.0-alpha.0.956+0fe0dbf3fb8cf5", GitCommit:"0fe0dbf3fb8cf501c24c87f4113a3819cb86a550", GitTreeState:"clean", BuildDate:"2022-06-13T17:30:11Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

root@ip-10-0-149-177:/var/snap/amazon-ssm-agent/5656# kubeadm init --config /run/kubeadm/kubeadm.yaml 
[init] Using Kubernetes version: v1.25.0-alpha.0.956+0fe0dbf3fb8cf5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-apiserver:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: output: time="2022-06-13T21:58:02Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": failed to resolve reference \"registry.k8s.io/kube-apiserver:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": registry.k8s.io/kube-apiserver:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: not found"
, error: exit status 1
        [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-controller-manager:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: output: time="2022-06-13T21:58:04Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.k8s.io/kube-controller-manager:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": failed to resolve reference \"registry.k8s.io/kube-controller-manager:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": registry.k8s.io/kube-controller-manager:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: not found"
, error: exit status 1
        [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-scheduler:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: output: time="2022-06-13T21:58:05Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.k8s.io/kube-scheduler:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": failed to resolve reference \"registry.k8s.io/kube-scheduler:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": registry.k8s.io/kube-scheduler:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: not found"
, error: exit status 1
        [ERROR ImagePull]: failed to pull image registry.k8s.io/kube-proxy:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: output: time="2022-06-13T21:58:06Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.k8s.io/kube-proxy:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": failed to resolve reference \"registry.k8s.io/kube-proxy:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5\": registry.k8s.io/kube-proxy:v1.25.0-alpha.0.956_0fe0dbf3fb8cf5: not found"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

After changing the registry to the new one in this PR, it works. So looks like we still need to import and tag images.

I don't know how CAPZ test does not fail with this kubeadm error.

@sbueringer
Copy link
Member

sbueringer commented Jun 14, 2022

Looks like there is something wrong with kubeadm somehow.

It works in CAPZ because kubeadm detects the CI version and it pulls images from the CI registry automatically, e.g. gcr.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.0.965_c0cc9116677732 (because of https://github.com/kubernetes/kubernetes/blob/703f2a7b86a2340fce92b3f9dae12a30254620a3/cmd/kubeadm/app/util/config/common.go#L94-L96)

Looks like in CAPA kubeadm somehow tries to pull images from the (new) production registry (registry.k8s.io).

For unblocking CAPZ, how about just changing k8s.gcr.io to registry.k8s.io in this PR so that CAPZ script does not fail?

The problem is that the script then only works with >= v1.25. The older versions are using the old registry.

@sbueringer
Copy link
Member

sbueringer commented Jun 14, 2022

I think I found it. CAPZ sets a ci/ version in clusterConfiguration.kubernetesVersion

kubeadm is looking for the ci prefix there, then it automatically uses the test registry/imageRepository (gcr.io/k8s-staging-ci-images)

@CecileRobertMichon
Copy link
Contributor

Good find @sbueringer. @sedefsavas can CAPA add the ci/ prefix too? Then we don't need additional script lines to pull the images and it works for all versions.

CAPO should also do the same.

@sedefsavas
Copy link

Good find!! We will change it in CAPA.

/lgtm

@sbueringer
Copy link
Member

Okay I think we can go ahead and merge it.

Thx everyone!

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 14, 2022
@k8s-ci-robot k8s-ci-robot merged commit f87f1f3 into kubernetes-sigs:main Jun 14, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conformance CI job image replacement is failing after switching from k8s.gcr.io to registry.k8s.io