Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Print all control plane images on init error #66503

Closed

Conversation

rosti
Copy link
Contributor

@rosti rosti commented Jul 23, 2018

What this PR does / why we need it:

kubeadm init dumps a comprehensive error message on failure. This message
contains some of the control plane images, the fetching of which could have
caused the init failure. Unfortunately, this is not the full list of images.
Most notably, pause, kube-proxy and DNS images are missing.

This fixes the issue, by using GetAllImages. Furthermore, it simplifies the
code by deduplicating some checks already made in GetAllImages.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Inspired by kubernetes/kubeadm#1016, but not fixing it.

Special notes for your reviewer:

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/assign @luxas
/assign @timothysc

Release note:

kubeadm: Print all control plane images on init error

kubeadm init dumps a comprehensive error message on failure. This message
contains some of the control plane images, the fetching of which could have
caused the init failure. Unfortunately, this is not the full list of images.
Most notably, pause, kube-proxy and DNS images are missing.

This fixes the issue, by using GetAllImages. Furthermore, it simplifies the
code by deduplicating some checks already made in GetAllImages.

Signed-off-by: Rostislav M. Georgiev <[email protected]>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jul 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rosti
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: fabriziopandini

If they are not already assigned, you can assign the PR to them by writing /assign @fabriziopandini in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 23, 2018
@k8s-ci-robot
Copy link
Contributor

@rosti: Reiterating the mentions to trigger a notification:
@kubernetes/sig-cluster-lifecycle-pr-reviews

In response to this:

What this PR does / why we need it:

kubeadm init dumps a comprehensive error message on failure. This message
contains some of the control plane images, the fetching of which could have
caused the init failure. Unfortunately, this is not the full list of images.
Most notably, pause, kube-proxy and DNS images are missing.

This fixes the issue, by using GetAllImages. Furthermore, it simplifies the
code by deduplicating some checks already made in GetAllImages.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Inspired by kubernetes/kubeadm#1016, but not fixing it.

Special notes for your reviewer:

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/assign @luxas
/assign @timothysc

Release note:

kubeadm: Print all control plane images on init error

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 23, 2018
@neolit123
Copy link
Member

/ok-to-test
@kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 23, 2018
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 23, 2018

@rosti: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-integration 47935db link /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda ok I guess, but the DNS and proxy images aren't actually used in this phase if kubeadm fails there...

Images []string
}{
Error: fmt.Sprintf("%v", err),
Images: images.GetAllImages(i.cfg),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this includes images other than the control plan I think this will be more confusing.

@rosti
Copy link
Contributor Author

rosti commented Jul 24, 2018

@luxas @timothysc you are both right. GetAllImages will return kube-proxy and DNS images too and init failure would not be caused by them. What is missing in the current code base is pause image though (I've seen init failures on non-internet connected machines, because of forgotten pause image).

I can extract portions of GetAllImages into GetAllControlPlaneImages and still contain the full knowledge of images needed for init, separated into images.go.

I'll be on a PTO for the rest of the week, so I'll probably finish it off next week.

WDYT?

@luxas
Copy link
Member

luxas commented Jul 24, 2018

Adding the pause image is totally ok, dns and proxy not IMO.

I'll be on a PTO for the rest of the week

Take your time and enjoy!

@kad
Copy link
Member

kad commented Jul 24, 2018

@luxas in theory, every image that is referenced by manifests maintained inside kubeadm, should be reported. This really helps for air-gap k8s installations.

@dixudx
Copy link
Member

dixudx commented Jul 25, 2018

Roughly lgtm.

@luxas
Copy link
Member

luxas commented Jul 25, 2018

@luxas in theory, every image that is referenced by manifests maintained inside kubeadm, should be reported. This really helps for air-gap k8s installations.

@kad we added kubeadm config images list exactly for that purpose.

@k8s-ci-robot
Copy link
Contributor

@rosti: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2018
@rosti
Copy link
Contributor Author

rosti commented Jul 31, 2018

This is superseded by #66658 . Closing for that matter.

@rosti rosti closed this Jul 31, 2018
@rosti rosti deleted the kubeadm-print-all-images-on-err branch November 22, 2018 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants