IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

wenyingd · 2023-07-03T09:28:51Z

Describe the bug

On a RHEL 8.4 VM running on azure cloud, two copies of IP addresses and routes are configured after an ExternalNode is created.

This is the configuretions before creating ExternalNode,

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff

After an ExternalNode is created for the VM, there two copies of ip address and routes, one is configured on eth0 which is an OVS internal port created by antrea-agent, and the other is configured on eth0~ which is renamed by antrea-agent and expected to work as the uplink.

[root@rhel84 nsxadmin]# ip route
default via 10.110.0.1 dev eth0 proto dhcp metric 100 
default via 10.110.0.1 dev eth0~ proto dhcp metric 100 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 
10.110.0.0/24 dev eth0 proto kernel scope link src 10.110.0.5 metric 100 
10.110.0.0/24 dev eth0~ proto kernel scope link src 10.110.0.5 metric 100 
168.63.129.16 via 10.110.0.1 dev eth0 proto dhcp metric 100 
168.63.129.16 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 10.110.0.1 dev eth0~ proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
[root@rhel84 nsxadmin]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0~: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global noprefixroute eth0~
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:93:a5:5e:22 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fe:b8:04:c0:b2:da brd ff:ff:ff:ff:ff:ff
16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:0d:3a:36:cd:de brd ff:ff:ff:ff:ff:ff
    inet 10.110.0.5/24 brd 10.110.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20d:3aff:fe36:cdde/64 scope link 
       valid_lft forever preferred_lft forever

After listing the processes, we can see that dhclient is working on the uplink (eth0~) who configures the IP and routes.

[root@rhel84 nsxadmin]# ps -ef | grep dhclient
root      458577    1044  0 09:16 ?        00:00:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /run/NetworkManager/dhclient-eth0~.pid -lf /var/lib/NetworkManager/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0~.lease -cf /var/lib/NetworkManager/dhclient-eth0~.conf eth0~
root      458697  453827  0 09:18 pts/0    00:00:00 grep --color=auto dhclient

This is observed only on RHEL 8.4 on azure cloud. After some comparation, we found that RHEL defaultly configure NetworkManager to use dhclient for dhcp on azure, and NetworkManager will start dhclient process on the uplink although it is renamed.

[root@rhel84 nsxadmin]# nmcli d
DEVICE      TYPE         STATE                   CONNECTION  
eth0~       ethernet     connected               System eth0 
docker0     bridge       connected (externally)  docker0     
lo          loopback     unmanaged               --          
eth0        openvswitch  unmanaged               --          
ovs-system  openvswitch  unmanaged               --          
[root@rhel84 nsxadmin]# nmcli con
NAME         UUID                                  TYPE      DEVICE  
System eth0  5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  ethernet  eth0~   
docker0      e4450544-8933-4d05-a134-595f76780f6f  bridge    docker0

The two copies of ip routes is possibly to introduce unpredictable behaviors on the VM, e.g., an outbound traffic is possibly to leave the VM from the uplink directly, and the ANP rules are not working because the packets have bypassed the openflow entries.

To Reproduce

Deploy K8s cluster and run antrea-controller
Create a VM on azure with OS type RHEL 8.4
Install vm-agent on the VM
Create ExternalNode for the VM
List the IP addresses and routes after the uplink and host internal interfaces are created.

Expected

After the ExternalNode is created, the IP address and routes are exposed to move to the host internal interface only, and they are not existing on the uplink before the ExternalNode resource is deleted.

Actual behavior

IP address and routes are configured both on the host internal interface and uplink.

Versions:

It is supposed to exist in all antrea releases even on the main branch.

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2023-10-02T00:03:35Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

On Azure RHEL VM, when ExternalNode is added and deleted repeatedly, antrea-agent goes into a weired state where ExternalNode add event is ignored, as agent things there is no change in the ExternalNode interface. Tracking issue on antrea: antrea-io/antrea#5192 antrea-io/antrea#5111 Signed-off-by: Anand Kumar <[email protected]>

* Expose ANP priority in test templates for Agented VMs Signed-off-by: Anand Kumar <[email protected]> * Use ubuntu Vms for Azure agented tests On Azure RHEL VM, when ExternalNode is added and deleted repeatedly, antrea-agent goes into a weired state where ExternalNode add event is ignored, as agent things there is no change in the ExternalNode interface. Tracking issue on antrea: antrea-io/antrea#5192 antrea-io/antrea#5111 Signed-off-by: Anand Kumar <[email protected]> --------- Signed-off-by: Anand Kumar <[email protected]>

github-actions · 2024-02-12T00:03:52Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Jul 3, 2023

wenyingd mentioned this issue Jul 3, 2023

On Azure rhel8.4 VM antrea-agent goes into a state where it cannot manage ExternalNode. #5111

Closed

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2023

tnqn mentioned this issue Nov 6, 2023

Support applying ClusterNetworkPolicy to Nodes in Antrea #5671

Closed

luolanzone removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2023

Anandkumar26 mentioned this issue Nov 15, 2023

Expose ANP priority in test templates for Agented VMs antrea-io/nephe#316

Merged

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 12, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

wenyingd commented Jul 3, 2023

github-actions bot commented Oct 2, 2023

github-actions bot commented Feb 12, 2024

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

IP address and routes are configured twice on RHEL 8.4 on azure cloud after ExternalNode is created #5192

Comments

wenyingd commented Jul 3, 2023

github-actions bot commented Oct 2, 2023

github-actions bot commented Feb 12, 2024