Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate kubelet extra args produced by installFlags change detection #779

Closed
NeonSludge opened this issue Oct 31, 2024 · 4 comments · Fixed by #784
Closed

Duplicate kubelet extra args produced by installFlags change detection #779

NeonSludge opened this issue Oct 31, 2024 · 4 comments · Fixed by #784

Comments

@NeonSludge
Copy link

k0sctl >=0.19.0 appears to duplicate generated extra kubelet arguments (specifically, --node-ip and --hostname-override) and reinstall k0s due to perceived changes in installFlags.

We have a cluster that has been stood up by k0sctl 0.18.1 some time ago. Since then k0sctl has been updated to 0.19.0.
Recently we've added several workers to this cluster's manifest and noticed that existing k0s controllers and workers were being reinstalled and restarted.

After some digging we've discovered that this has apparently happened due to a weird change in the --kubelet-extra-args flag value.

This is a spec.hosts entry for a controller node. It had no changes since the cluster had been created.

- hostname: <node hostname>
  privateAddress: <node address>
  ssh:
  # ...
  role: controller+worker
  dataDir: "/var/lib/k0s"
  installFlags:
    - --disable-components=konnectivity-server,metrics-server
    - --profile=custom
    - --kubelet-extra-args="--pod-infra-container-image=<internal registry>/pause:3.9"
    - --cri-socket=remote:/var/run/containerd/containerd.sock
    - --iptables-mode=nft

(just in case: we know the --pod-infra-container-image kubelet argument has long been deprecated, just haven't gotten around to removing it)

The /etc/systemd/system/k0scontroller.service file generated for this host by k0sctl 0.18.1:

[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s

After=network-online.target 
Wants=network-online.target 

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s controller --config=/etc/k0s/k0s.yaml --cri-socket=remote:/var/run/containerd/containerd.sock --data-dir=/var/lib/k0s --disable-components=konnectivity-server,metrics-server --enable-worker=true --iptables-mode=nft --kubelet-extra-args=--pod-infra-container-image=<internal registry>/pause:3.9\x20--node-ip=<node address>\x20--hostname-override=<node hostname> --profile=custom --token-file=/etc/k0s/k0stoken

RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always

[Install]
WantedBy=multi-user.target

The /etc/systemd/system/k0scontroller.service file generated and reinstalled for this host by k0sctl 0.19.0:

[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s

After=network-online.target 
Wants=network-online.target 

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s controller --config=/etc/k0s/k0s.yaml --cri-socket=remote:/var/run/containerd/containerd.sock --data-dir=/var/lib/k0s --disable-components=konnectivity-server,metrics-server --enable-worker=true --iptables-mode=nft --kubelet-extra-args=--pod-infra-container-image=<internal registry>/pause:3.9\x20--node-ip=<node address>\x20--hostname-override=<node hostname>\x20--node-ip=<node address>\x20--hostname-override=<node hostname> --profile=custom --token-file=/etc/k0s/k0stoken

RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always

[Install]
WantedBy=multi-user.target

Notice the duplicated node-ip and hostname-override arguments nested in the kubelet-extra-args flag.

We've also then done a separate k0sctl apply dry run with the same cluster manifest and saw that it would reinstall everything again. So it seems that there might be some issue with the installFlags change detection mechanism.

Aside from the argument duplication and some unexpected restarts, everything works fine. Rolling back to k0sctl 0.18.1 solves this issue for us for now.

@kke
Copy link
Contributor

kke commented Nov 1, 2024

I see this:

--kubelet-extra-args=--pod-infra-container-image=<internal registry>/pause:3.9\x20--node-ip=<node address>\x20--hostname-override=<node hostname>\x20--node-ip=<node address>\x20--hostname-override=<node hostname>

I wonder if the \x20 comes from k0s status -o json and isn't unescaped properly 🤔

This would explain the detected change but it doesn't explain where the duplication happens

@kke
Copy link
Contributor

kke commented Nov 5, 2024

No, \x20 is only in the systemd unit file. k0s has them like:

   "Args": [
      "/usr/local/bin/k0s",
      "controller",
      "--config=/etc/k0s/k0s.yaml",
      "--data-dir=/var/lib/k0s",
      "--kubelet-extra-args=--fail-swap-on=false --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9"
   ],

@kke
Copy link
Contributor

kke commented Nov 5, 2024

Re-apply without change gives me:

DEBU[0003] [ssh] 127.0.0.1:9022: installFlags seem to have changed because of different flags: 
[--config=/etc/k0s/k0s.yaml --data-dir=/var/lib/k0s --kubelet-extra-args=--fail-swap-on=false] vs 
[--config=/etc/k0s/k0s.yaml --data-dir=/var/lib/k0s --kubelet-extra-args=--fail-swap-on=false --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9]

@kke
Copy link
Contributor

kke commented Nov 5, 2024

I think the problem is

var flagParseRe = regexp.MustCompile(`--?([\w\-]+)(?:[=\s](\S+))?`)

which fails to parse flags with spaces in values correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants