Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Azure rhel8.4 VM antrea-agent goes into a state where it cannot manage ExternalNode. #5111

Closed
Anandkumar26 opened this issue Jun 12, 2023 · 1 comment · Fixed by #5191
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Anandkumar26
Copy link
Contributor

Describe the bug
When an ExternalNode for a RHEL VM is deleted, there is an error in antrea-agent while restoring interface configuration back to uplink. Because of the nephe-ci breaks, where each test adds ExternalNode, performs NetworkPolicy operation and then deletes ExternalNode.

To Reproduce
Spin up RHEL 8.4 VM on Azure.
Create an ExternalNode of RHEL VM, and install antrea-agent on VM.
Delete the ExternalNode, check antrea-agent logs (there should not be any error while restoring the interface config)
Repeat this step, to reproduce the issue. (Usually on the 2nd try we hit the issue)

Expected
There should not be any error while restoring interface configuration on the uplink.

Actual behavior
Obseved an error while restoring configuration on the uplink. As a consequence of it, when we re-add the ExternalNode, antrea-agent will not be able to process the ExternalNode ADD event.

Versions:
Antrea version: v1.12.0

Additional context
antrea-agent logs:

I0605 08:34:22.346209 1 reconciler.go:990] Releasing stale priority 14800
I0605 08:34:22.348248 1 reconciler.go:990] Releasing stale priority 14800
I0605 08:34:22.348278 1 reconciler.go:983] Uninstalling ofRule 2
I0605 08:34:22.348284 1 network_policy.go:1594] ofPriority 14900 is now stale
I0605 08:34:22.352550 1 reconciler.go:990] Releasing stale priority 14900
I0605 08:34:22.363920 1 net_linux.go:283] "Renaming interface" oldName="eth0~" newName="eth0"
I0605 08:34:22.476222 1 external_node_controller.go:601] "Recovered uplink name to the host interface name" uplinkIfName="eth0~" hostInterface="eth0"
I0605 08:34:22.476356 1 net_linux.go:175] "Expected addresses" count=2 addresses=[10.110.0.5/24 fe80::20d:3aff:fe36:cdde/64]
I0605 08:34:22.476434 1 net_linux.go:188] "Existing addresses" count=1 addresses=[10.110.0.5/24 eth0]
I0605 08:34:22.476443 1 net_linux.go:206] Adding address fe80::20d:3aff:fe36:cdde/64 to interface eth0
E0605 08:34:22.476550 1 net_linux.go:330] "Failed to replace route entry" err="network is unreachable" route="{Ifindex: 2 Dst: Src: Gw: 10.110.0.1 Flags: [] Table: 254 Realm: 0}"
E0605 08:34:22.493773 1 external_node_controller.go:237] "Error syncing ExternalNode" err="network is unreachable" ExternalNode="vm-ns/rhel84"
I0605 08:34:27.494580 1 external_node_controller.go:386] "Deleting interface" ifName="eth0"
I0605 08:34:27.494595 1 client.go:490] "Cached flow with provided key was not found" key="eth0"
I0605 08:34:27.494603 1 external_node_controller.go:558] "Removed the flows installed to forward packet between uplinkPort and hostPort" hostInterface="eth0"
I0605 08:34:27.495272 1 external_node_controller.go:577] "Deleted host port in OVS" hostInterface="eth0"
I0605 08:34:27.495433 1 external_node_controller.go:581] "Deleted uplink port in OVS" uplinkIfName="eth0~"
E0605 08:34:29.501705 1 external_node_controller.go:237] "Error syncing ExternalNode" err="failed to wait for host interface eth0 deletion in 2s, err timed out waiting for the condition" ExternalNode="vm-ns/rhel84"
I0605 08:34:39.502354 1 external_node_controller.go:386] "Deleting interface" ifName="eth0"
I0605 08:34:39.502369 1 client.go:490] "Cached flow with provided key was not found" key="eth0"
I0605 08:34:39.502376 1 external_node_controller.go:558] "Removed the flows installed to forward packet between uplinkPort and hostPort" hostInterface="eth0"

@Anandkumar26 Anandkumar26 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 12, 2023
@wenyingd
Copy link
Contributor

wenyingd commented Jul 3, 2023

I created another issue (#5192) to track the dhclient unexpected behavior.

Anandkumar26 added a commit to antrea-io/nephe that referenced this issue Nov 15, 2023
On Azure RHEL VM, when ExternalNode is added and deleted
repeatedly, antrea-agent goes into a weired state where
ExternalNode add event is ignored, as agent things there
is no change in the ExternalNode interface.

Tracking issue on antrea:

antrea-io/antrea#5192
antrea-io/antrea#5111
Signed-off-by: Anand Kumar <[email protected]>
reachjainrahul pushed a commit to antrea-io/nephe that referenced this issue Nov 15, 2023
* Expose ANP priority in test templates for Agented VMs

Signed-off-by: Anand Kumar <[email protected]>

* Use ubuntu Vms for Azure agented tests

On Azure RHEL VM, when ExternalNode is added and deleted
repeatedly, antrea-agent goes into a weired state where
ExternalNode add event is ignored, as agent things there
is no change in the ExternalNode interface.

Tracking issue on antrea:

antrea-io/antrea#5192
antrea-io/antrea#5111
Signed-off-by: Anand Kumar <[email protected]>

---------

Signed-off-by: Anand Kumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants