Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #3516: Use uplink MAC as source if packet is output to #3528

Conversation

xliuxu
Copy link
Contributor

@xliuxu xliuxu commented Mar 28, 2022

Cherry pick of #3516 on release-1.2.

#3516: Use uplink MAC as source if packet is output to

For details on the cherry pick process, see the cherry pick requests page.

@xliuxu xliuxu added the kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release label Mar 28, 2022
@xliuxu
Copy link
Contributor Author

xliuxu commented Mar 28, 2022

/test-windows-all

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2022

Codecov Report

Merging #3528 (1a8c213) into release-1.2 (359e0bf) will increase coverage by 13.63%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           release-1.2    #3528       +/-   ##
================================================
+ Coverage        46.82%   60.46%   +13.63%     
================================================
  Files              279      285        +6     
  Lines            21623    22455      +832     
================================================
+ Hits             10125    13577     +3452     
+ Misses           10168     7436     -2732     
- Partials          1330     1442      +112     
Flag Coverage Δ
kind-e2e-tests 46.81% <0.00%> (-0.02%) ⬇️
unit-tests 42.84% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/openflow/pipeline.go 72.91% <0.00%> (+2.30%) ⬆️
...agent/controller/traceflow/traceflow_controller.go 69.61% <0.00%> (-3.54%) ⬇️
pkg/antctl/command_definition.go 53.86% <0.00%> (ø)
pkg/antctl/command_message.go 42.85% <0.00%> (ø)
pkg/antctl/command_list.go 47.61% <0.00%> (ø)
pkg/antctl/raw/traceflow/command.go 24.40% <0.00%> (ø)
pkg/antctl/client.go 1.38% <0.00%> (ø)
pkg/antctl/antctl.go 100.00% <0.00%> (ø)
pkg/agent/route/route_linux.go 44.68% <0.00%> (+0.53%) ⬆️
pkg/ovs/ovsconfig/ovs_client.go 48.74% <0.00%> (+0.83%) ⬆️
... and 85 more

@xliuxu
Copy link
Contributor Author

xliuxu commented Mar 28, 2022

/test-windows-networkpolicy

@xliuxu
Copy link
Contributor Author

xliuxu commented Mar 28, 2022

/test-networkpolicy
/test-e2e
/test-conformance

@xliuxu
Copy link
Contributor Author

xliuxu commented Mar 30, 2022

/test-e2e

@xliuxu
Copy link
Contributor Author

xliuxu commented Mar 30, 2022

@tnqn The e2e failure is caused by TestFlowAggregator cases.

pkg/agent/openflow/pipeline.go Show resolved Hide resolved
@@ -308,6 +308,7 @@ func (c *client) l3FwdFlowToRemoteViaRouting(localGatewayMAC net.HardwareAddr, r
Action().LoadRegRange(int(PortCacheReg), config.UplinkOFPort, ofPortRegRange).
Action().LoadRegRange(int(marksReg), macRewriteMark, ofPortMarkRange).
Action().GotoTable(conntrackCommitTable).
Action().SetSrcMAC(c.nodeConfig.UplinkNetConfig.MAC).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the MAC already set in L3Forwarding table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

With noEncap mode the Pod packet to remote Pod/Node is output to the
uplink interface directly. This change modifies the source MAC with the
uplink interface's MAC, so that it doesn't require hybrid configurations
on the host interface.

Signed-off-by: wenyingd <[email protected]>
@xliuxu xliuxu force-pushed the automated-cherry-pick-of-#3516-upstream-release-1.2 branch from b5350e4 to 1a8c213 Compare April 6, 2022 12:19
@xliuxu xliuxu requested a review from tnqn April 13, 2022 02:22
@xliuxu xliuxu mentioned this pull request Apr 13, 2022
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Apr 13, 2022

/test-all
/test-windows-all

@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 13, 2022

/test-conformance
/test-e2e

@tnqn
Copy link
Member

tnqn commented Apr 14, 2022

/skip-e2e failed on unrelated test: TestFlowAggregator

@tnqn
Copy link
Member

tnqn commented Apr 14, 2022

/test-windows-conformance

1 similar comment
@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 14, 2022

/test-windows-conformance

@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 14, 2022

the Windows conformance test is failing due to

STEP: creating linux and windows pods
STEP: checking connectivity to 8.8.8.8 853 (google.com) from Linux
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:23.678: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:23.678: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:24.771: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:24.771: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:25.850: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:25.850: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:26.929: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:26.929: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:28.016: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:28.016: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:29.102: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:29.102: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:30.182: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:30.182: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:31.261: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:31.261: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:32.340: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:32.340: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba
Apr 13 04:15:33.415: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-7430 PodName:pod-6bb4406d-bd34-4632-be9d-3c59c5dc21ba ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:33.415: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity to https://www.google.com from Windows
STEP: checking connectivity of windows-container in pod-d7c56adc-3d54-4c6b-a5ed-20078086ff77
Apr 13 04:15:33.679: INFO: ExecWithOptions {Command:[cmd /c curl.exe https://www.google.com --connect-timeout 10 --fail] Namespace:hybrid-network-7430 PodName:pod-d7c56adc-3d54-4c6b-a5ed-20078086ff77 ContainerName:windows-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Apr 13 04:15:33.679: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
Apr 13 04:15:43.774: FAIL: Failed after 10.095s.
Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "command terminated with exit code 28",
        },
        Code: 28,
    }
    command terminated with exit code 28
occurred

------------------------------
{"msg":"FAILED [sig-windows] Hybrid cluster network for all supported CNIs should have stable networking for Linux and Windows pods","total":16,"completed":0,"skipped":212,"failed":1,"failures":["[sig-windows] Hybrid cluster network for all supported CNIs should have stable networking for Linux and Windows pods"]}
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------

It seems to be the same as #2981, but the issue should have been fixed.
@tnqn Do you think it is a regression introduced by this PR?

@tnqn
Copy link
Member

tnqn commented Apr 14, 2022

@xliuxu It failed twice on same test, likely related to this PR. But I haven't figured out the relationship with #2981. May need to live debug this issue.

@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 15, 2022

@tnqn I saw the same error on the 1.2 branch, which does not include this change. Is this caused by testbed changes?

@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 15, 2022

/test-windows-conformance

@xliuxu
Copy link
Contributor Author

xliuxu commented Apr 15, 2022

@tnqn The windows-conformance test passed. It should be related to the wrong configurations on the testbed.
Thank @wenyingd and @XinShuYang for help.

@tnqn
Copy link
Member

tnqn commented Apr 15, 2022

@xliuxu could you share more details about the misconfiguration? is it related to mac spoofguard?

@XinShuYang
Copy link
Contributor

@tnqn Wenying found existing NetNat rules could cause error in conntrack stage on Antrea 1.2. Then I checked windows snapshot on testbed and found there are remaining NetNat rules. After removing them and update snapshot, the failed test can pass. @wenyingd Do you have more details to add?

@wenyingd
Copy link
Contributor

@tnqn Wenying found existing NetNat rules could cause error in conntrack stage on Antrea 1.2. Then I checked windows snapshot on testbed and found there are remaining NetNat rules. After removing them and update snapshot, the failed test can pass. @wenyingd Do you have more details to add?

NetNat is introduced since Antrea v1.3 which is expected to perform SNAT for Pod-to-external traffic on the Windows host. But for Antrea v1.2-, SNAT is performed in OVS. Since the CI testbed is shared for all versions, and some misconfiguration leads to the NetNAT is included into the Windows snapshot, Antrea v1.2 traffic is enforced to perform SNAT on the host again after OVS operations. So the reply packet is mis-forwarded to the host instead of Pods. I am also a bit confused for I originally thought the host SNAT should not impact on the OVS (I thought the reply packet is first entering OVS pipeline from uplink and completed de-SNAT, and then output to the sender), but the fact is not as I expected. The reply was output to host directly. I would sync with OVS team for the root cause on the order of the OVS SNAT and host NetNat later.

But after we remove the misconfiguration, the traffic is working well.

@tnqn
Copy link
Member

tnqn commented Apr 20, 2022

Thanks @XinShuYang @wenyingd for the explanation.

@tnqn tnqn merged commit e9d7d23 into antrea-io:release-1.2 Apr 20, 2022
@XinShuYang
Copy link
Contributor

@tnqn I have a PR for deleting all netnat rules in CI script. Updating all snapshot is not a good way for our CI testbeds. #3674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants