Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antrea-agent DaemonSet not ready consistently when running e2e with all features enabled #4673

Closed
tnqn opened this issue Mar 2, 2023 · 1 comment · Fixed by #4674
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tnqn
Copy link
Member

tnqn commented Mar 2, 2023

Describe the bug

TestWireguard failed consistently because of DaemonSet not ready within 1m30.
https://github.com/antrea-io/antrea/actions/runs/4312092832/jobs/7522929072
https://github.com/antrea-io/antrea/actions/runs/4312423243/jobs/7524153226

--- PASS: TestTrafficControl (16.72s)
    --- PASS: TestTrafficControl/TestMirrorToRemote (2.33s)
    --- PASS: TestTrafficControl/TestMirrorToLocal (2.42s)
    --- PASS: TestTrafficControl/TestRedirectToLocal (4.01s)
=== RUN   TestUpgrade
    upgrade_test.go:30: Skipping test as we are not testing for upgrade
--- SKIP: TestUpgrade (0.00s)
=== RUN   TestVMAgent
    fixtures.go:147: Skipping test as there no Linux or Windows VMs
--- SKIP: TestVMAgent (0.00s)
=== RUN   TestWireGuard
    fixtures.go:228: Creating 'testwireguard-lxbil1rz' K8s Namespace
2023/03/02 10:27:31 Applying Antrea YAML
2023/03/02 10:27:32 Waiting for all Antrea DaemonSet Pods
2023/03/02 10:27:33 Checking CoreDNS deployment
    fixtures.go:120: The following modules have been found on Node 'kind-control-plane': [wireguard]
    fixtures.go:120: The following modules have been found on Node 'kind-worker': [wireguard]
    fixtures.go:120: The following modules have been found on Node 'kind-worker2': [wireguard]
I0302 10:27:33.837933   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:33.916128   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-9542l'
I0302 10:27:34.223265   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:34.307809   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-jg4tq'
I0302 10:27:34.664204   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:34.750361   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-tsthd'
    wireguard_test.go:53: Failed to enable WireGuard tunnel: error when restarting antrea-agent Pod: antrea-agent DaemonSet not ready within 1m30s;

But the problem is not in Wireguard, but because the agent failed to set NO_FLOOD config for traffic control ports created in previous test.

I0302 10:27:40.551956      15 log_file.go:93] Set log file max size to 104857600
I0302 10:27:40.553839      15 feature_gate.go:245] feature gates: &{map[AllAlpha:true AllBeta:true AntreaIPAM:true AntreaPolicy:true AntreaProxy:true Egress:true EndpointSlice:true ExternalNode:true FlowExporter:true IPsecCertAuth:true L7NetworkPolicy:true Multicast:false Multicluster:true NetworkPolicyStats:true NodeIPAM:true NodePortLocal:true SecondaryNetwork:true ServiceExternalIP:true SupportBundleCollection:true TopologyAwareHints:true Traceflow:true TrafficControl:true]}
I0302 10:27:40.554066      15 agent.go:99] Starting Antrea agent (version v1.11.0-dev-6441929)
I0302 10:27:40.554168      15 client.go:87] No kubeconfig file was specified. Falling back to in-cluster config
I0302 10:27:40.555155      15 prometheus.go:171] Initializing prometheus metrics
I0302 10:27:40.555510      15 ovs_client.go:71] Connecting to OVSDB at address /var/run/openvswitch/db.sock
I0302 10:27:40.558001      15 agent.go:400] Setting up node network
I0302 10:27:40.585293      15 agent.go:1017] "Setting Node MTU" MTU=1450
I0302 10:27:40.585489      15 agent.go:1036] "Configured IPv4 Subnet CIDR on this Node" subnet="10.244.2.0/24"
I0302 10:27:40.588088      15 ovs_client.go:114] Bridge exists: c00c270d-9fad-462a-b348-ac31c41e0502
I0302 10:27:40.600836      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-l7-tap1"
I0302 10:27:40.601020      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-tun0"
I0302 10:27:40.601203      15 agent.go:372] "Adding interface to cache" interfaceName="agnhost-5e0db3"
I0302 10:27:40.601317      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-l7-tap0"
I0302 10:27:40.601463      15 agent.go:372] "Adding interface to cache" interfaceName="return1"
I0302 10:27:40.601576      15 agent.go:372] "Adding interface to cache" interfaceName="local-pa-5c132c"
I0302 10:27:40.601684      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-gw0"
I0302 10:27:40.601826      15 agent.go:826] Tunnel port antrea-tun0 already exists on OVS bridge
I0302 10:27:40.601960      15 agent.go:710] Gateway port antrea-gw0 already exists on OVS bridge
I0302 10:27:40.602061      15 agent.go:716] Setting gateway interface antrea-gw0 MTU to 1450
I0302 10:27:40.603077      15 net_linux.go:176] IP configuration for interface antrea-gw0 does not need to change
I0302 10:27:40.603414      15 net_linux.go:176] IP configuration for interface antrea-gw0 does not need to change
I0302 10:27:40.611939      15 agent.go:392] "Set port no-flood successfully" PortName="antrea-l7-tap1"
I0302 10:27:40.617339      15 agent.go:392] "Set port no-flood successfully" PortName="antrea-l7-tap0"
F0302 10:27:40.624021      15 main.go:53] Error running agent: error initializing agent: failed to set port return1 with no-flood: fail to set no-food config for port -1 on bridge br-int: exit status 1, stderr: ovs-ofctl: invalid option -- '1'
goroutine 110 [running]:
k8s.io/klog/v2/internal/dbg.Stacks(0x0)
	/go/pkg/mod/k8s.io/klog/[email protected]/internal/dbg/dbg.go:35 +0x89
k8s.io/klog/v2.(*loggingT).output(0x3e3c1e0, 0x3, 0x0, 0xc0000f3b90, 0x1, {0x30d057a?, 0x1?}, 0xc0002db800?, 0x0)
	/go/pkg/mod/k8s.io/klog/[email protected]/klog.go:935 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x3e3c1e0, 0x0?, 0x0, {0x0, 0x0}, 0x0?, {0x266e80d, 0x17}, {0xc00046d410, 0x1, ...})
	/go/pkg/mod/k8s.io/klog/[email protected]/klog.go:736 +0x1f3
k8s.io/klog/v2.(*loggingT).printf(...)
	/go/pkg/mod/k8s.io/klog/[email protected]/klog.go:718
k8s.io/klog/v2.Fatalf(...)
	/go/pkg/mod/k8s.io/klog/[email protected]/klog.go:1621
antrea.io/antrea/cmd/antrea-agent.newAgentCommand.func1(0xc0001ee300?, {0xc0006fef00, 0x0, 0x8})
	/antrea/cmd/antrea-agent/main.go:53 +0x2f4
github.com/spf13/cobra.(*Command).execute(0xc0001ee300, {0xc0000f90f0, 0x8, 0x8})
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001ee300)
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:1044 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/pkg/mod/github.com/spf13/[email protected]/command.go:968
antrea.io/antrea/cmd/antrea-agent.main()
	/antrea/cmd/antrea-agent/main.go:32 +0x1e
github.com/confluentinc/bincover.RunTest(0x27898f0)
	/go/pkg/mod/github.com/confluentinc/[email protected]/instrument_bin.go:93 +0x210
antrea.io/antrea/cmd/antrea-agent.TestBincoverRunMain(0x11?)
	/antrea/cmd/antrea-agent/bincover_run_main_test.go:27 +0x25
testing.tRunner(0xc00018f040, 0x27898b8)
	/usr/local/go/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1493 +0x35f

It should be related to #4654. There are a few problems need to resolve:

  1. If TrafficControl is deleted during antrea-agent's downtime, how to ensure the corresponding ports are deleted.
  2. Setting NO_FLOOD should check if the OVS port is still valid.
@tnqn tnqn added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2023
@tnqn
Copy link
Member Author

tnqn commented Mar 2, 2023

cc @hongliangl @xliuxu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant