Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVS module loading failure for RHEL and CentOS Nodes #51

Closed
jianjuns opened this issue Nov 13, 2019 · 7 comments · Fixed by #61
Closed

OVS module loading failure for RHEL and CentOS Nodes #51

jianjuns opened this issue Nov 13, 2019 · 7 comments · Fixed by #61
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jianjuns
Copy link
Contributor

jianjuns commented Nov 13, 2019

Describe the bug
modprobe of initContainer will fail on RHEL or CentOS Nodes.

To Reproduce
Deploy Antrea on RHEL or CentOS Nodes, and initContainer will fail.

Expected
Loading OVS module by initContainer for RHEL and CentOS Nodes.

Actual behavior
modprobe in the initContainer returns an error about binary execution.

Versions:
Please provide the following information:

  • Antrea version v0.0.1
@jianjuns jianjuns added the bug label Nov 13, 2019
@jianjuns jianjuns self-assigned this Nov 13, 2019
@antoninbas
Copy link
Contributor

Could you provide the error logs in "Actual behavior" if you have access to them?

@jianjuns
Copy link
Contributor Author

Updated the description, but I did not remember the exact error. I am still setting up my CentOS env, and will try once it is ready.

@McCodeman McCodeman added the kind/bug Categorizes issue or PR as related to a bug. label Jan 29, 2020
@antoninbas
Copy link
Contributor

antoninbas commented Jan 20, 2021

@jianjuns sorry for reviving this old issue, but I have seen some issues caused by mounting /sbin/depmod to the initContainer. Typically this happens when the /sbin/depmod (or rather kmod since /sbin/depmod is just a symlink to kmod) on the host / Node depends on some libraries which are not available inside the container. This can happen when the OS distribution on the host uses a different version of kmod than the container OS distribution.

Is there any chance you could share the error message that led to the decision of mounting /sbin/depmod? I tried to reproduce on EC2 using a CentOS instance (see version below) but without success:

[centos@ip-172-30-0-73 ~]$ cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
[centos@ip-172-30-0-73 ~]$ docker run --cap-add SYS_MODULE -v /lib/modules:/lib/modules -ti antrea/antrea-ubuntu:v0.12.0 modprobe openvswitch
[centos@ip-172-30-0-73 ~]$

@jianjuns
Copy link
Contributor Author

Unfortunately I lost my env for testing this. Could you try with the old Antrea image (like 0.1.0) in case it is no more an issue with Ubuntu 18.04?

@antoninbas
Copy link
Contributor

So I confirmed that there is an issue with v0.1.0 (or more precisely with Ubuntu 18.04, before we switched our base image to Ubuntu 20.04):

[centos@ip-172-30-0-114 ~]$ docker run -ti --cap-add SYS_MODULE -v /lib/modules:/lib/modules -ti antrea/antrea-ubuntu:v0.1.0 modprobe -v openvswitch
insmod /lib/modules/3.10.0-1062.12.1.el7.x86_64/kernel/net/ipv6/netfilter/nf_defrag_ipv6.ko.xz
modprobe: ERROR: could not insert 'openvswitch': Exec format error

It's a bit misleading to mount /sbin/depmod as the solution. Really it's an issue with kmod (but /sbin/depmod is just a symbolic link to kmod).

The issue was because the kmod included in Ubuntu 18.04 did not support compressed kernel modules (*.xz). While the kmod included in Ubuntu 20.04 does support them:

$ docker run antrea/antrea-ubuntu:v0.1.0 kmod -V
kmod version 24
-XZ -ZLIB -EXPERIMENTAL
$ docker run antrea/antrea-ubuntu:v0.12.0 kmod -V
kmod version 27
+XZ -ZLIB +LIBCRYPTO -EXPERIMENTAL

There is no magic solution, here:

  1. there is always a possibility that the kmod included in the container does not support kernel modules on the host
  2. if we always copy the kmod binary from the host to the container, there is a risk that some shared libraries are missing in the container

I can think of the following solutions:

  1. mount the host's root in the initContainer (read-only) and run the host's kmod with no dependency issue
[centos@ip-172-30-0-114 ~]$ docker run -ti --cap-add SYS_MODULE -v /:/host/root:ro -ti antrea/antrea-ubuntu:v0.1.0 bash
root@94cb91445427:/# chroot /host/root
sh-4.2# modprobe openvswitch
sh-4.2# exit
root@94cb91445427:/#
  1. stop mounting /sbin/depmod / kmod altogether, with the following rationale: kmod included in Ubuntu 20.04 now supports compressed kernel modules (but there is no guarantee that in the future we will not run into a similar issue)

My only concern with 1) - my preferred solution - is the possible conflict with some security policies. It works great in my environment, but could it create issues in K8s distributions? What do you think @jianjuns and @tnqn?

@jianjuns
Copy link
Contributor Author

Thanks for figuring this out!

Yes, I am also thinking about any security violation to mount the whole FS. How about let us go with your option #2 for now, until we see new issues?

@tnqn
Copy link
Member

tnqn commented Jan 21, 2021

I also feel option 2 is less risky if it can solve all current issues.
I don't know if mounting the whole host FS into container is common, not sure if there is circulation in path or any other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants