No preflight check of config files #646

ManuStoessel · 2019-09-05T09:32:59Z

What happened:
When provisioning a new k8s cluster on AWS with KubeOne version 0.9.2, following the 'Quickstart on AWS' tutorial on the master branch I encountered following error message:

> kubeone install config.yaml --tfjson tf.json 
INFO[10:28:33 CEST] Installing prerequisites…                    
INFO[10:28:33 CEST] Determine operating system…                   node=18.184.18.152
INFO[10:28:34 CEST] Determine operating system…                   node=35.156.169.171
INFO[10:28:34 CEST] Determine operating system…                   node=18.185.13.164
INFO[10:28:35 CEST] Determine hostname…                           node=18.184.18.152
INFO[10:28:35 CEST] Determine hostname…                           node=18.185.13.164
INFO[10:28:35 CEST] Creating environment file…                    node=18.185.13.164
INFO[10:28:35 CEST] Creating environment file…                    node=18.184.18.152
INFO[10:28:35 CEST] Installing kubeadm…                           node=18.185.13.164 os=ubuntu
INFO[10:28:35 CEST] Installing kubeadm…                           node=18.184.18.152 os=ubuntu
INFO[10:28:35 CEST] Determine hostname…                           node=35.156.169.171
INFO[10:28:35 CEST] Creating environment file…                    node=35.156.169.171
INFO[10:28:35 CEST] Installing kubeadm…                           node=35.156.169.171 os=ubuntu
INFO[10:29:19 CEST] Deploying configuration files…                node=18.185.13.164 os=ubuntu
INFO[10:29:20 CEST] Deploying configuration files…                node=18.184.18.152 os=ubuntu
INFO[10:29:24 CEST] Deploying configuration files…                node=35.156.169.171 os=ubuntu
INFO[10:29:24 CEST] Generating kubeadm config file…              
INFO[10:29:25 CEST] Configuring certs and etcd on first controller… 
INFO[10:29:25 CEST] Ensuring Certificates…                        node=35.156.169.171
INFO[10:29:27 CEST] Downloading PKI files…                        node=35.156.169.171
INFO[10:29:28 CEST] Creating local backup…                        node=35.156.169.171
INFO[10:29:28 CEST] Deploying PKI…                               
INFO[10:29:28 CEST] Uploading files…                              node=18.184.18.152
INFO[10:29:28 CEST] Uploading files…                              node=18.185.13.164
INFO[10:29:30 CEST] Configuring certs and etcd on consecutive controller… 
INFO[10:29:30 CEST] Ensuring Certificates…                        node=18.184.18.152
INFO[10:29:30 CEST] Ensuring Certificates…                        node=18.185.13.164
INFO[10:29:32 CEST] Initializing Kubernetes on leader…           
INFO[10:29:32 CEST] Running kubeadm…                              node=35.156.169.171
WARN[10:32:13 CEST] Task failed, retrying…                       
Error: failed to init kubernetes on leader: + [[ -f /etc/kubernetes/admin.conf ]]
+ sudo kubeadm init --config=./kubeone/cfg/master_0.yaml
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition: failed to exec command: export "PATH=$PATH:/sbin:/usr/local/bin:/opt/bin"

set -xeu pipefail


if [[ -f /etc/kubernetes/admin.conf ]]; then exit 0; fi
sudo kubeadm init --config=./kubeone/cfg/master_0.yaml
: Process exited with status 1
Usage:
  kubeone install <manifest> [flags]

Examples:
kubeone install mycluster.yaml -t terraformoutput.json

Flags:
  -b, --backup string   path to where the PKI backup .tar.gz file should be placed (default: location of cluster config file)
  -h, --help            help for install

Global Flags:
  -d, --debug                           debug
  -t, --tfjson terraform output -json   Source for terrafor output JSON. - to read from stdin. If path is file, contents will be used. If path is dictionary, terraform output -json is executed in this path
  -v, --verbose                         verbose

failed to init kubernetes on leader: + [[ -f /etc/kubernetes/admin.conf ]]
+ sudo kubeadm init --config=./kubeone/cfg/master_0.yaml
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition: failed to exec command: export "PATH=$PATH:/sbin:/usr/local/bin:/opt/bin"

set -xeu pipefail


if [[ -f /etc/kubernetes/admin.conf ]]; then exit 0; fi
sudo kubeadm init --config=./kubeone/cfg/master_0.yaml
: Process exited with status 1

This is fixed when using a current kubeone version (e.g. v0.10.0-alpha.3), since then the config format used in the 'Quickstart on AWS' tutorial apparently fits the kubeone version. I however would have expected a graceful and idempotent termination of kubeone before running in a problem while provisioning.

What is the expected behavior:

Successfully create a cluster or give meaningful output about wrong config format and exit before attempting to create the cluster.

How to reproduce the issue:

Use the config from the 'Quickstart on AWS' tutorial from the master branch with an older version of kubeone (e.g. v0.9.2)

Anything else we need to know?

Information about the environment:
KubeOne version (kubeone version):

{
  "kubeone": {
    "major": "0",
    "minor": "9",
    "gitVersion": "0.9.2",
    "gitCommit": "dab11436a9acb7813816d5c389360ab537bed758",
    "gitTreeState": "",
    "buildDate": "2019-07-04T16:57:12Z",
    "goVersion": "go1.12.6",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "3",
    "gitVersion": "v1.3.0",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Operating system: Linux 4.19.60-1-MANJARO x86_64 GNU/Linux
Provider you're deploying cluster on: AWS
Operating system you're deploying on: Ubuntu 18.04 amd64

The text was updated successfully, but these errors were encountered:

thz · 2019-09-05T09:48:41Z

I agree on that, and I actually ran in the same issue. There should be something like either schema validation of the terraform output, or at least when an empty kubelet config is generated (which is the case here), there should be an error about that.
That would make things easier than monitoring how the remote instance acts on the obviously invalid kubelet conf.

xmudrii · 2019-09-05T09:58:59Z

When provisioning a new k8s cluster on AWS with KubeOne version 0.9.2, following the 'Quickstart on AWS' tutorial on the master branch I encountered following error message

We version the documentation and generally recommend to use documentation for your desired version (by switching the branch or tag). However, in this case, the documentation is absolutely the same between v0.9.x and v.0.10.x, so it's okay. 🙂

This is fixed when using a current kubeone version (e.g. v0.10.0-alpha.3), since then the config format used in the 'Quickstart on AWS' tutorial apparently fits the kubeone version.

We haven't changed the config format since a long time ago, so everything should work as expected, even with v0.9.2. Although, v0.9.2 is quite old at this point and I would recommend going with v0.10.0-alpha.3. The new stable version is expected after 1.16 goes out.

error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition: failed to exec command: export "PATH=$PATH:/sbin:/usr/local/bin:/opt/bin"

This error pretty much doesn't say anything usable, but from the experience, it makes me think there is something wrong with the hostname (e.g. see #645).

Regarding hostnames, we added some additional validations and workarounds, so this maybe explains why it works with v0.10.0-alpha.3.

I however would have expected a graceful and idempotent termination of kubeone before running in a problem while provisioning.

I'm not sure what graceful and idempotent termination means. Can you please explain a little bit more what steps would you like to see?

Successfully create a cluster or give meaningful output about wrong config format and exit before attempting to create the cluster.

This is a little bit tricky. We are running commands over SSH and depend on external components, e.g. kubeadm. One kubeadm error (like this one) can have multiple meanings, so even if we parse it, we don't know what to instruct the user. But, I agree that our error format is bad and I'd like to be able to better show the error and point users to some documentation or something similar.

@thz

or at least when an empty kubelet config is generated (which is the case here), there should be an error about that.

How do you know the kubelet config is empty? I'd like to trace this but I'd need some more details on how to reproduce it.

ManuStoessel · 2019-09-05T10:09:58Z

How do you know the kubelet config is empty? I'd like to trace this but I'd need some more details on how to reproduce it.

So in this case the kubelet config file wasn't even created, not only empty:

ubuntu@ip-10-101-44-172:~$ less /var/lib/kubelet/config
/var/lib/kubelet/config: No such file or directory
ubuntu@ip-10-101-44-172:~$ ls /var/lib/kubelet
cpu_manager_state  device-plugins  pki

I'm not sure what graceful and idempotent termination means. Can you please explain a little bit more what steps would you like to see?

I just meant to exit kubeone with a meaningful error message and without actually changing the environment. But I see that this can be tricky when running stuff over ssh and relying on external tools.

xmudrii · 2019-09-05T10:15:32Z

I just meant to exit kubeone with a meaningful error message and without actually changing the environment. But I see that this can be tricky when running stuff over ssh and relying on external tools.

I agree with the meaningful error message part but slightly disagree on the environment part. You can always use the kubeone reset command with appropriate flags to revert what has been done by kubeone install (destroy the cluster, destroy the worker nodes, remove binaries, etc). I think when we have such command, it's useful to leave stuff in place, so someone can debug what happened there.

thz · 2019-09-05T10:23:50Z

We version the documentation and generally recommend to use documentation for your desired version (by switching the branch or tag). However, in this case, the documentation is absolutely the same between v0.9.x and v.0.10.x, so it's okay. 🙂

how about

% git diff v0.9.2..v0.10.0-alpha.3 output.tf

diff --git a/examples/terraform/aws/output.tf b/examples/terraform/aws/output.tf
index ce1c65a..7a22621 100644
--- a/examples/terraform/aws/output.tf
+++ b/examples/terraform/aws/output.tf
@@ -31,6 +31,7 @@ output "kubeone_hosts" {
       cloud_provider       = "aws"
       private_address      = aws_instance.control_plane.*.private_ip
       public_address       = aws_instance.control_plane.*.public_ip
+      hostnames            = aws_instance.control_plane.*.private_dns
       ssh_agent_socket     = var.ssh_agent_socket
       ssh_port             = var.ssh_port
       ssh_private_key_file = var.ssh_private_key_file

I ran into this problem, but cannot remember if it was exactly the hostnames.

ManuStoessel · 2019-09-06T15:03:55Z

So I used version v0.10.0-alpha.3 successfully without specifying the hostnames in the terraform output. So I guess this is a non-issue at least for versions equal to or higher than 0.10.0-apha.3.

Maybe a comment in the docs about that is enough? I could open a PR for that.

xmudrii · 2019-09-06T15:26:03Z

So I used version v0.10.0-alpha.3 successfully without specifying the hostnames in the terraform output.

It's specified automatically (see the output @thz posted) unless you are using your own Terraform scripts.

So I guess this is a non-issue at least for versions equal to or higher than 0.10.0-apha.3.

It should be a non-issue since v0.10.0-alpha.0 when we introduced the ability to set the hostnames (#567).

Maybe a comment in the docs about that is enough? I could open a PR for that.

That would be very nice, but the problem might be what workaround to recommend.

Note that you can't set the hostname for versions before v0.10.0-alpha.0. For such versions, the hostname was determined automatically. Cherry-picking that change is not possible at all because of multiple reasons.

I guess that this error only occurs when you have a longer cluster name, i.e. longer instances names. If we can confirm that, we can add to docs to either use shorter cluster/instance names or to use v0.10.0 instead.

xmudrii · 2019-10-15T13:42:43Z

Currently, there is no way to validate the kubeadm configuration file without running kubeadm init or kubeadm join, which validates it and runs preflight checks. KubeOne configuration is validated before running kubeone install and kubeone upgrade. If you think we can extend the KubeOne validation, please let us know.

Because of that, I'm going to close this issue.
/close

kubermatic-bot · 2019-10-15T13:42:45Z

@xmudrii: Closing this issue.

In response to this:

Currently, there is no way to validate the kubeadm configuration file without running kubeadm init or kubeadm join, which validates it and runs preflight checks. KubeOne configuration is validated before running kubeone install and kubeone upgrade. If you think we can extend the KubeOne validation, please let us know.

Because of that, I'm going to close this issue.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ManuStoessel added the kind/bug Categorizes issue or PR as related to a bug. label Sep 5, 2019

kubermatic-bot closed this as completed Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No preflight check of config files #646

No preflight check of config files #646

ManuStoessel commented Sep 5, 2019

thz commented Sep 5, 2019

xmudrii commented Sep 5, 2019

ManuStoessel commented Sep 5, 2019

xmudrii commented Sep 5, 2019

thz commented Sep 5, 2019

ManuStoessel commented Sep 6, 2019

xmudrii commented Sep 6, 2019

xmudrii commented Oct 15, 2019

kubermatic-bot commented Oct 15, 2019

No preflight check of config files #646

No preflight check of config files #646

Comments

ManuStoessel commented Sep 5, 2019

thz commented Sep 5, 2019

xmudrii commented Sep 5, 2019

ManuStoessel commented Sep 5, 2019

xmudrii commented Sep 5, 2019

thz commented Sep 5, 2019

ManuStoessel commented Sep 6, 2019

xmudrii commented Sep 6, 2019

xmudrii commented Oct 15, 2019

kubermatic-bot commented Oct 15, 2019