Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s not working on PopOs 21.10 (kernel 5.15.5) #4783

Closed
vrince opened this issue Dec 17, 2021 · 11 comments
Closed

k3s not working on PopOs 21.10 (kernel 5.15.5) #4783

vrince opened this issue Dec 17, 2021 · 11 comments

Comments

@vrince
Copy link

vrince commented Dec 17, 2021

Environmental Info:
K3s Version:

k3s version v1.21.7+k3s1 (ac705709)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux x1-pop-os 5.15.5-76051505-generic #202111250933~1638201579~21.10~09f1aa7 SMP Mon Nov 29 16:23:13 U x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 server
3 worker

Describe the bug:

Installing / updating PopOS!_21.10 breaks k3s.

Steps To Reproduce:

Install k3s on PopOS!_21.10.

Expected behavior:

Actual behavior:

Additional context / logs:

Note: found it while running rancher on PopOS ref : rancher/rancher#29223 (comment)

Work around add systemd.unified_cgroup_hierarchy=0 to kernel command :

  • kernelstub
sudo kernelstub -a "systemd.unified_cgroup_hierarchy=0"  
sudo update-initramfs -c -k all
sudo reboot
  • grub
sudo nano /etc/default/grub
# add this line : GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
sudo update-grub
sudo reboot
@brandond
Copy link
Member

What version of K3s are you using? All current releases should support cgroup v2 (unified_cgroup_hierarchy).

@vrince
Copy link
Author

vrince commented Dec 17, 2021

Using 1.21.7k3s1 was installed on PopOs 21.04 for a while, everything was good. Updated to 21.10 k3s was refusing to start.

You saying it should work without the systemd.unified_cgroup_hierarchy=0 right ? If so I can confirm it's not.

@brandond
Copy link
Member

brandond commented Dec 17, 2021

Can you perhaps include the K3s logs so that we can see what error you're getting? "Not working" is somewhat hard to troubleshoot without any additional information on what the actual error is.

@vrince
Copy link
Author

vrince commented Dec 17, 2021

On it.

@vrince
Copy link
Author

vrince commented Dec 17, 2021

Was able to reproduce removing the unified_cgroup_hieararchy thingy.

Error log is massive ... here is the couple of first errors filtered from journalctl -u k3s (am I looking in the right place ?)

Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.368579    1847 node.go:161] Failed to retrieve node info: nodes "x1-pop-os" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope
Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.497216    1847 cri_stats_provider.go:369] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.497265    1847 kubelet.go:1306] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.561414    1847 kubelet.go:1870] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.661972    1847 kubelet.go:1870] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
Dec 17 16:29:19 x1-pop-os k3s[1847]: E1217 16:29:19.730210    1847 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
Dec 17 16:29:20 x1-pop-os k3s[1847]: E1217 16:29:20.848117    1847 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"83f7d501f375cb76831607d046f0b92afe24eefabe0a1689aa8a3147b9465a0f\": not found" podSandboxID="83f7d501f375cb76831607d046f0b92afe24eefabe0a1689aa8a3147b9465a0f"
Dec 17 16:29:20 x1-pop-os k3s[1847]: E1217 16:29:20.848144    1847 kuberuntime_manager.go:958] "Failed to stop sandbox" podSandboxID={Type:containerd ID:83f7d501f375cb76831607d046f0b92afe24eefabe0a1689aa8a3147b9465a0f}
Dec 17 16:29:20 x1-pop-os k3s[1847]: E1217 16:29:20.848174    1847 kuberuntime_manager.go:729] "killPodWithSyncResult failed" err="failed to \"KillPodSandbox\" for \"c13946c2-558b-4207-8e2c-b95cca891d79\" with KillPodSandboxError: \"rpc error: code = NotFound desc = an error occurred when try to find sandbox \\\"83f7d501f375cb76831607d046f0b92afe24eefabe0a1689aa8a3147b9465a0f\\\": not found\""
Dec 17 16:29:20 x1-pop-os k3s[1847]: E1217 16:29:20.848199    1847 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"c13946c2-558b-4207-8e2c-b95cca891d79\" with KillPodSandboxError: \"rpc error: code = NotFound desc = an error occurred when try to find sandbox \\\"83f7d501f375cb76831607d046f0b92afe24eefabe0a1689aa8a3147b9465a0f\\\": not found\"" pod="elk/elk-kb-578b67bd49-6gxl7" podUID=c13946c2-558b-4207-8e2c-b95cca891d79
Dec 17 16:29:21 x1-pop-os k3s[1847]: E1217 16:29:21.730872    1847 configmap.go:200] Couldn't get configMap kube-system/coredns: failed to sync configmap cache: timed out waiting for the condition
Dec 17 16:29:21 x1-pop-os k3s[1847]: E1217 16:29:21.730975    1847 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/configmap/dba761e1-a2af-4502-ac73-ad350cf51dbf-config-volume podName:dba761e1-a2af-4502-ac73-ad350cf51dbf nodeName:}" failed. No retries permitted until 2021-12-17 16:29:22.230939166 -0500 EST m=+5.057111441 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"config-volume\" (UniqueName: \"kubernetes.io/configmap/dba761e1-a2af-4502-ac73-ad350cf51dbf-config-volume\") pod \"coredns-7448499f4d-t4q84\" (UID: \"dba761e1-a2af-4502-ac73-ad350cf51dbf\") : failed to sync configmap cache: timed out waiting for the condition"
Dec 17 16:29:21 x1-pop-os k3s[1847]: E1217 16:29:21.731383    1847 secret.go:195] Couldn't get secret cattle-system/cattle-credentials-0b48f2b: failed to sync secret cache: timed out waiting for the condition

Not sure you want / need the entire thing ?

@brandond
Copy link
Member

brandond commented Dec 17, 2021

Nothing useful in that short chunk. Can you attach (not copy/paste) the complete log from journalctl -u k3s --no-pager &> k3s.log ?

@vrince
Copy link
Author

vrince commented Dec 17, 2021

k3s.log

@brandond
Copy link
Member

Looks to me like it's running fine. The end of the log is just normal steady-state operation. Is this the log from a system with systemd.unified_cgroup_hierarchy=0 set? Can you get the log from a broken system? The logs here don't go back far enough to tell what was previously failing.

@vrince
Copy link
Author

vrince commented Dec 17, 2021

Maybe you are right this is more on the rancher side that things go south I get :

Failed to communicate with API server during namespace check: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system?timeout=45s": context canceled

I'll try to start again from scratch, maybe k3s way ok on it's own from the beginning, just some of the rancher payload that was not happy.

@brandond
Copy link
Member

Are you perhaps missing the vxlan module on newer kernels? Not sure if this is just limited to raspberry pi or not, but other users have run into the change described at #4188 (comment)

@vrince
Copy link
Author

vrince commented Dec 20, 2021

Re-installed everything from scratch, works as expected.

Something fishy append after transition to 21.04 to 21.10, but it might be just me not understanding what was appending.
Sorry for the noise and wasting your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants