Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows conformance test failed consistently #2981

Closed
tnqn opened this issue Nov 4, 2021 · 4 comments
Closed

Windows conformance test failed consistently #2981

tnqn opened this issue Nov 4, 2021 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@tnqn
Copy link
Member

tnqn commented Nov 4, 2021

Describe the bug

[sig-windows] Hybrid cluster network for all supported CNIs 
  should have stable networking for Linux and Windows pods
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/windows/hybrid_network.go:52
[BeforeEach] [sig-windows] Hybrid cluster network
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/windows/framework.go:28
[BeforeEach] [sig-windows] Hybrid cluster network
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:174
STEP: Creating a kubernetes client
Nov  4 11:07:48.615: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: Building a namespace api object, basename hybrid-network
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-windows] Hybrid cluster network
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/windows/hybrid_network.go:46
[It] should have stable networking for Linux and Windows pods
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/windows/hybrid_network.go:52
STEP: creating linux and windows pods
STEP: checking connectivity to 8.8.8.8 853 (google.com) from Linux
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:07:58.673: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:07:58.673: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:07:59.761: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:07:59.761: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:00.835: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:00.835: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:01.931: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:01.931: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:03.015: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:03.015: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:04.092: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:04.092: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:05.167: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:05.167: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:06.245: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:06.245: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:07.319: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:07.319: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity of linux-container in pod-32f16129-c68c-425f-b3bf-5c39e7050976
Nov  4 11:08:08.415: INFO: ExecWithOptions {Command:[/bin/sh -c nc -vz 8.8.8.8 853 -w 10] Namespace:hybrid-network-3919 PodName:pod-32f16129-c68c-425f-b3bf-5c39e7050976 ContainerName:linux-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:08.415: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
STEP: checking connectivity to https://www.google.com from Windows
STEP: checking connectivity of windows-container in pod-4b861df3-3d6d-44c5-ba45-c034eba5fe86
Nov  4 11:08:08.673: INFO: ExecWithOptions {Command:[cmd /c curl.exe https://www.google.com --connect-timeout 10 --fail] Namespace:hybrid-network-3919 PodName:pod-4b861df3-3d6d-44c5-ba45-c034eba5fe86 ContainerName:windows-container Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Nov  4 11:08:08.674: INFO: >>> kubeConfig: /var/lib/jenkins/kube.conf
Nov  4 11:08:18.747: FAIL: Failed after 10.073s.
Unexpected error:
    <exec.CodeExitError>: {
        Err: {
            s: "command terminated with exit code 28",
        },
        Code: 28,
    }
    command terminated with exit code 28
occurred

It failed when the Windows Pod tried to access external network. However, the Node itself can access external network. This happened for the latest 20+ builds.

@tnqn tnqn added the kind/bug Categorizes issue or PR as related to a bug. label Nov 4, 2021
@tnqn
Copy link
Member Author

tnqn commented Nov 4, 2021

@wenyingd @lzhecheng Let's track the issue here.

@tnqn tnqn added this to the Antrea v1.4 release milestone Nov 4, 2021
@lzhecheng
Copy link
Contributor

So far the conclusion is that it results from bug in OVS. Opened an issue there: openvswitch/ovs-issues#232

@tnqn
Copy link
Member Author

tnqn commented Nov 5, 2021

@wenyingd and I digged deeper into the problem and found it's caused by a wrong handling of ip fragments in OVS windows datapath:
When ip fragments were received on one of the OVS port and there is an action ct in the pipeline, the fragments will be reassembled before sending to connection tracker according to the doc (https://man7.org/linux/man-pages/man7/ovs-actions.7.html):

       If ct is executed on IPv4 (or IPv6) fragments, then the message
       is implicitly reassembled before sending to the connection
       tracker and refragmented upon output, to the original maximum
       received fragment size. Reassembly occurs within the context of
       the zone, meaning that IP fragments in different zones are not
       assembled together. Pipeline processing for the initial fragments
       is halted. When the final fragment is received, the message is
       assembled and pipeline processing continues for that flow. Packet
       ordering is not guaranteed by IP protocols, so it is not possible
       to determine which IP fragment will cause message reassembly (and
       therefore continue pipeline processing). As such, it is strongly
       recommended that multiple flows should not execute ct to
       reassemble fragments from the same IP message.

However, the flow key of the reassembled packet was not updated with the correct 5-tuple, and was still the ones set in the last fragment that triggers the assembly, which are empty values. Then it failed to get the existing conntrack entry when looking up CT and the packets was considered "new". I confirmed Linux datapath doesn't have this issue as it regenerates the L4 header for the flow key from the reassembled packet:
https://github.com/openvswitch/ovs/blob/1bdda7b6d53c92e877b457157676aff326414c53/datapath/conntrack.c#L596-L599

The right fix might be adding similar handling to Windows datapath driver. However, I'm wondering why ip fragments were even generated in this scenario. We captured the packets on both uplink and antrea-gw0 interfaces and found that: the packets received on uplink were not fragmented and the largest size was 1414, but the packets received on antrea-gw0 were in different size and IP-fragmented. It seems RSC took effect and assembled the original packets when it's recieved on uplink but the host found that the packet size was greater than the MTU configured on antrea-gw0 when forwarding the packet to it and refragmented it to IP fragments. This IP reframentation doesn't make sense as the packet is not leaving the host yet and it makes RSC pointless (This may explain why bypassing Windows host network can increase the network performance a lot in #2157 as it avoids IP fragmentation). And I can confirm Linux doesn't do IP fragmentation for such case. I'm not sure whether this IP fragmentation is determined by Windows kernel code or OVS code, we may need OVS experts to confirm.

As a conclusion, there are two issues here:

  1. Windows host shouldn't fragment the packet assembed by RSC when it forwards the packet to OVS internal port.
  2. OVS Windows datapath should use correct 5-tuple to look up conntrack after it assembles IP fragments.

And neither of them are new issues introduced in this release. The reason why it was discovered recently was that even if the conntrack state was marked wrongly, the reassembled packet may be processed after some following packets have been forwarded to container port and the reassembled packet itself can always be forwarded to container port correctly, it's just packets processed after it will be dropped, so the application may receive the HTTP response but the FIN packets are dropped.

To workaround it, @wenyingd had PR #2985 that avoids rewriting dMAC of the reply packets to antrea-gw0 by mistake by matching whether the in_port is antrea-gw0, but the connection was still marked as "originated from gateway interface" wrongly.

@tnqn tnqn added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 6, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2022
@github-actions github-actions bot closed this as completed Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

2 participants