Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

Closed
wenyingd opened this issue Jul 3, 2023 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@wenyingd
Copy link
Contributor

wenyingd commented Jul 3, 2023

Describe the bug

On a RHEL 8.4 VM running on azure cloud, two copies of IP addresses and routes are configured after an ExternalNode is created.

This is the configuretions before creating ExternalNode,

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff

After an ExternalNode is created for the VM, there two copies of ip address and routes, one is configured on eth0 which is an OVS internal port created by antrea-agent, and the other is configured on eth0~ which is renamed by antrea-agent and expected to work as the uplink.

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
default via 10.110.0.1 dev eth0~ proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
10.110.0.0/24 dev eth0~ proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
168.63.129.16 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0~: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0~
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff
16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20d:3aff:fe36:cdde/64 scope link 
       valid_lft forever preferred_lft forever

After listing the processes, we can see that dhclient is working on the uplink (eth0~) who configures the IP and routes.

[root@rhel84 nsxadmin]# ps -ef | grep dhclient
root      458577    1044  0 09:16 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /run/NetworkManager/dhclient-eth0~.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0~.lease -cf /var/lib/NetworkManager/dhclient-eth0~.conf eth0~
root      458697  453827  0 09:18 pts/0    00:00:00 grep --color=auto dhclient

This is observed only on RHEL 8.4 on azure cloud. After some comparation, we found that RHEL defaultly configure NetworkManager to use dhclient for dhcp on azure, and NetworkManager will start dhclient process on the uplink although it is renamed.

[root@rhel84 nsxadmin]# nmcli d
DEVICE      TYPE         STATE                   CONNECTION  
eth0~       ethernet     connected               System eth0 
docker0     bridge       connected (externally)  docker0     
lo          loopback     unmanaged               --          
eth0        openvswitch  unmanaged               --          
ovs-system  openvswitch  unmanaged               --          
[root@rhel84 nsxadmin]# nmcli con
NAME         UUID                                  TYPE      DEVICE  
System eth0  5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  ethernet  eth0~   
docker0      e4450544-8933-4d05-a134-595f76780f6f  bridge    docker0 

The two copies of ip routes is possibly to introduce unpredictable behaviors on the VM, e.g., an outbound traffic is possibly to leave the VM from the uplink directly, and the ANP rules are not working because the packets have bypassed the openflow entries.

To Reproduce

  1. Deploy K8s cluster and run antrea-controller
  2. Create a VM on azure with OS type RHEL 8.4
  3. Install vm-agent on the VM
  4. Create ExternalNode for the VM
  5. List the IP addresses and routes after the uplink and host internal interfaces are created.

Expected

After the ExternalNode is created, the IP address and routes are exposed to move to the host internal interface only, and they are not existing on the uplink before the ExternalNode resource is deleted.

Actual behavior

IP address and routes are configured both on the host internal interface and uplink.

Versions:

It is supposed to exist in all antrea releases even on the main branch.

Additional context

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2023
@luolanzone luolanzone removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2023
Anandkumar26 added a commit to antrea-io/nephe that referenced this issue Nov 15, 2023
On Azure RHEL VM, when ExternalNode is added and deleted
repeatedly, antrea-agent goes into a weired state where
ExternalNode add event is ignored, as agent things there
is no change in the ExternalNode interface.

Tracking issue on antrea:

antrea-io/antrea#5192
antrea-io/antrea#5111
Signed-off-by: Anand Kumar <[email protected]>
reachjainrahul pushed a commit to antrea-io/nephe that referenced this issue Nov 15, 2023
* Expose ANP priority in test templates for Agented VMs

Signed-off-by: Anand Kumar <[email protected]>

* Use ubuntu Vms for Azure agented tests

On Azure RHEL VM, when ExternalNode is added and deleted
repeatedly, antrea-agent goes into a weired state where
ExternalNode add event is ignored, as agent things there
is no change in the ExternalNode interface.

Tracking issue on antrea:

antrea-io/antrea#5192
antrea-io/antrea#5111
Signed-off-by: Anand Kumar <[email protected]>

---------

Signed-off-by: Anand Kumar <[email protected]>
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 12, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants