Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playbook hangs on [Enable and check K3s service] #57

Closed
nwber opened this issue Jul 13, 2020 · 16 comments
Closed

Playbook hangs on [Enable and check K3s service] #57

nwber opened this issue Jul 13, 2020 · 16 comments

Comments

@nwber
Copy link

nwber commented Jul 13, 2020

I'm running the playbook with ansible-playbook site.yml -i inventory/sample/hosts.ini -k -K -vv.

It runs successfully up to [Enable and check K3s service] in /node/tasks/main.yml, then it hangs indefinitely. I've run this with varying levels of verbosity and debugging on.

Running this on Raspberry Pi 4Bs, all of which have Raspberry Pi OS Lite.

ansible-checks-k3s-hangs

hosts.ini

hosts

@b-m-f
Copy link
Contributor

b-m-f commented Jul 20, 2020

I had the same problem.

For me the problem was the Firewall. Port 6443 was blocked on the master node.

EDIT:

A way to debug is to SSH into you Raspberry and execute the ExecStart command from /etc/systemd/system/k3s-node.service manually.

@JohnTheNerd
Copy link
Contributor

I had the same problem. For me, the master IP wasn't set properly. Anything that stops the k3s-node service from starting will cause that.

I was able to see the error by running systemctl status k3s-node on the node itself.

@dougbertwashere
Copy link

dougbertwashere commented Aug 27, 2020

thanks for the info, as I had the same problem

I assume JohnTheNerd is referring to a feature of when using a hostname for the control node, the installation fails to append that info with its IPaddr to the nodes' /etc/host file

I was using a hostname as well in the [master] section and it failed. Switched to an explicit IPaddr and it installed fine and came up.

alternate solution is to use ansible to spray out an update to the nodes' /etc/hosts file.

update: I put k3s on a pi cluster using explicit IPaddrs and it works as above.
Then attempted an install on a lightsail aws cluster but due to public IPaddr to primary from my work machine and internal IPaddr within the cluster it failed.
I finally went back to using hostnames for the install by appending the cluster internal IP assignments to all nodes' and the master's /etc/hosts.
Then I set the public IPaddr hostname on my work machine's /etc/hosts

thus the cluster machines had the internal IP and my machine had the public IP
sweet

@brobare
Copy link

brobare commented Apr 8, 2021

Thanks for the pointers in people that figured this out for their distros. Was able to resolve on Centos 7 by doing:
firewall-cmd --zone=public --add-port=6443/tcp

@bVdCreations
Copy link

I had the same problem.
I changed the inventory for the IPaddress instead of the names i set in my .ssh/config file.
that solved it!

@aireilly
Copy link

I changed inventory/hosts.ini to point to IP addresses instead of configured hostnames and the install worked. hostnames are listed when I do kubectl get nodes:

NAME              STATUS   ROLES    AGE     VERSION
control-plane-0   Ready    master   3h18m   v1.17.5+k3s1
compute-node-1    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-2    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-0    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-3    Ready    <none>   4m28s   v1.17.5+k3s1

@FilBot3
Copy link

FilBot3 commented Oct 29, 2021

Mine is also failing on that same spot. I'm using Ubuntu Server 21.10 arm64 with k3s v1.22.5-k3s2.

Logs from my nodes. I have 4 total, 1 master 3 workers. Each freshly imaged with /etc/hosts/ set to have FQDNs to their IP's. Then I bootstrapped with k3s-ansible latest master. I then reset and disabled the firewall with sudo ufw disable, rebooted, and tried again. Just hangs.

Obviously the k3s.service on the master node is failing to start, which is preventing the k3s-node.service workers from connecting.

@jbeere
Copy link

jbeere commented Dec 5, 2021

I had this same problem, changing the hostnames from "raspberrypi" to something unique seemed to make it work.

@orzen
Copy link

orzen commented Dec 10, 2021

I had the same issue. I'm trying to deploy to a series av VPSs and it seems like the playbook is trying to use the private IP of the master node when launching the worker nodes.

@nmstoker
Copy link

Many thanks @b-m-f
I also had the firewall / port problem identified above and the following fixed it for me on Pi4s running Raspberry Pi OS 64-bit Bullseye Lite with ufw:

sudo ufw allow proto tcp from 192.168.1.0/24 to any port 6443

(obviously adjust the from details to whatever you need for your network)

@JohanNicander
Copy link

I had the same issue also using Raspberry Pi OS 64-bit Bullseye Lite. Not until I was running with both IPs instead of hostnames and followed @nmstoker suggestion did it work.
Many thanks!

@Lechindianer
Copy link

Just in case someone comes across this issue running Ubuntu version > 20.04: There's an issue on the k3s project regarding kernel modules: k3s-io/k3s#4234 (comment)

Installing the necessary kernel modules helped me, the playbook ran successfully without hanging

@Zedifuu
Copy link

Zedifuu commented Jan 22, 2023

@Lechindianer You absolute champion, that was my issue! TY so much!

@ozbillwang
Copy link

ozbillwang commented Mar 18, 2023

the task Enable and check K3s service is to restart / enable k3s-node service in nodes

- name: Enable and check K3s service
  systemd:
    name: k3s-node
    daemon_reload: yes
    state: restarted
    enabled: yes

the problem is related to the setting here:

in file roles/k3s/node/templates/k3s.service.j2

ExecStart=/usr/local/bin/k3s agent --server https://{{ master_ip }}:6443 --token {{ hostvars[groups['master'][0]]['token'] }} {{ extra_agent_args | default("") }}

But the variable master_ip is not always master's IP, it is from inventory/my-cluster/hosts.ini. If you put IP in hosts.ini, then you get IP address, if you put hostname in it, you get hostname

inventory/my-cluster/group_vars/all.yml:master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"

But in my case, I set the hostname, not IP in the hosts.ini, as poster, like this:

[master]
kubernetes-master

So when check in node, I got

ExecStart=/usr/local/bin/k3s agent --server https://kubernetes-master:6443 --token xxx

But there is no /etc/hosts to resolve its IP address, so the playbook is wait to join to the master, never finish (Active: activating (start))

$ sudo systemctl status k3s-node
● k3s-node.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: enabled)
   Active: activating (start) since Sat 2023-03-18 08:11:51 UTC; 29s ago
     Docs: https://k3s.io

The way to fix it is, if you use hostname in hosts.ini, you need add ansible_host

for example

[master]
kubernetes-master ansible_host=192.168.xxx.xxx

@OladapoAjala
Copy link

Thanks @ozbillwang for the detailed info, I however fixed this by slightly modifying your suggestion.

In the hosts.ini file I added a new variable

[master]
HOST_NAME ansible_host_ip=192.168.xxx.xxx

then in the /group_vars/all.yml file, I changed how the variable master_ip is being resolved by replacing ansible_host with ansible_host_ip which is already set above.

@dereknola
Copy link
Member

Closing this as discussion seems to have ended. It is recommended that

  1. All nodes have static IPs OR an external load balancer is configured with a fixed registration address. See https://docs.k3s.io/datastore/ha#4-optional-configure-a-fixed-registration-address and https://docs.k3s.io/datastore/cluster-loadbalancer
  2. Firewalls be disabled. I have another issue open to track attempting to open the firewall ports. Attempt to Work Around Firewalls #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests