openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

Tokix · 2022-02-17T03:39:05Z

Description
With a bit of modification of the kubeinit files I am able to get okd deployed on rocky-linux 8.5. You can see the modifications here: https://github.com/Tokix/kubeinit I could make a pull request but there is one thing that is not working as expected and that is the restart of the server. After the restart the routes are vanished and I'm not able to reach the frontend anymore.

To Reproduce
Steps to reproduce the behavior:

Install a Redhat 8.5 machine setup ssh connection as nyctea as described in the manual
In my case I had to install python on the hypervisor_host machine addionally before running the playbook successfully

yum install python3

Clone the changes for Rocky8.5

git clone https://github.com/Tokix/kubeinit.git

Run the playbook

ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

Enable the frontend

ssh root@nyctea
chmod +x  create-external-ingress.sh
./create-external-ingress.sh

Setup the DNS Entries for your system
check if the url is working (it works at this point):

https://console-openshift-console.apps.okdcluster.kubeinit.local/

reboot the server

init 6

The URL is not working any longer:

https://console-openshift-console.apps.okdcluster.kubeinit.local/

Expected behavior
The external url of the cluster should be available on restart and the routes should be set.

Screenshots
Working route-configuration before the restart:

Route configuration after restart:

Infrastructure

Hypervisors OS: Rocky-Linux
Version 8.5

Deployment command

ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

Inventory file diff

I did no changes to the inventory file

Additional context

As selinux is active on rocky-linux 8.5 my first thought was that some changes could not be persisted so I disabled selinux for testing. However it is still not running after restart.

Checked this old issue https://forums.opensuse.org/showthread.php/530879-openvswitch-loses-configuration-on-reboot but it seems that the booting order of openvswitch and network.service is fine.

Furthermore I ran the steps "Attach our cluster network to the logical router" in the file kubeinit/roles/kubeinit_libvirt/tasks/create_network.yml - This got me back to the correct routing table but I'm still not able to reach the guest-systems via 10.0.0.1-x

Is there any script or service that needs or can be re-run to enable the networking after reboot?
In any case I'm thankful for any hints let me know if you need more information.

Thank you in any case for the great project :)

The text was updated successfully, but these errors were encountered:

jeffabailey · 2022-04-24T02:12:00Z

I'm also running into a problem with Rocky Linux.

Any help is welcome, this is a cool project, I hope we can get it working on Rocky.

TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] *******************************************************************************
task path: /home/jeff/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:52
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: jeff
<127.0.0.1> EXEC /bin/sh -c 'echo ~jeff && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/jeff/.ansible/tmp `"&& mkdir "` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" && echo ansible-tmp-1650765476.136161-198434-213715344205742="` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" ) && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1117, in do_template
    res = j2_concat(rf)
  File "<template>", line 47, in root
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run
    resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1154, in do_template
    raise AnsibleUndefinedVariable(e)
ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'"
}

PLAY RECAP *****************************************************************************************************************************************************
localhost                  : ok=48   changed=7    unreachable=0    failed=1    skipped=25   rescued=0    ignored=0

jeffabailey · 2022-05-06T13:02:10Z

My issue isn't specific to Rocky, so I'll add a new issue.

I ran into the same error using Debian.

Edit (Issue added): #647

ccamacho · 2022-11-27T20:57:53Z

Maybe there are some IPtables rules not persisted after rebooting and I dont have a way to test this on Rocky.

logeshwaris · 2023-02-22T11:14:55Z

Hi @ccamacho,

Thanks for the awesome project. 👍

I am also running into same issue. After reboot, I am not able to reach 10.0.0.x.
Is there a way where we can re enable the networking after reboot?

tschuyebuhl · 2023-02-23T15:33:18Z

I've got two servers, one with alma 8.x (which also seems to lose connectivity after reboot), and one with centos stream. I could help with providing some debug data, I can sacrifice my currently running clusters if need be.

tschuyebuhl · 2023-03-02T12:06:26Z

Okay, so the one with CentOS 8 and vanilla k8s didn't persist after restart. The VM's launched fine, but there was no networking. Also, the service pod only had one IP address, from the 10.89.x.x subnet.

ccamacho added the keep label Mar 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

Tokix commented Feb 17, 2022

jeffabailey commented Apr 24, 2022

jeffabailey commented May 6, 2022 •

edited

Loading

ccamacho commented Nov 27, 2022

logeshwaris commented Feb 22, 2023

tschuyebuhl commented Feb 23, 2023

tschuyebuhl commented Mar 2, 2023

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

Comments

Tokix commented Feb 17, 2022

jeffabailey commented Apr 24, 2022

jeffabailey commented May 6, 2022 • edited Loading

ccamacho commented Nov 27, 2022

logeshwaris commented Feb 22, 2023

tschuyebuhl commented Feb 23, 2023

tschuyebuhl commented Mar 2, 2023

jeffabailey commented May 6, 2022 •

edited

Loading