-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows conformance test failed consistently #2981
Comments
@wenyingd @lzhecheng Let's track the issue here. |
So far the conclusion is that it results from bug in OVS. Opened an issue there: openvswitch/ovs-issues#232 |
@wenyingd and I digged deeper into the problem and found it's caused by a wrong handling of ip fragments in OVS windows datapath:
However, the flow key of the reassembled packet was not updated with the correct 5-tuple, and was still the ones set in the last fragment that triggers the assembly, which are empty values. Then it failed to get the existing conntrack entry when looking up CT and the packets was considered "new". I confirmed Linux datapath doesn't have this issue as it regenerates the L4 header for the flow key from the reassembled packet: The right fix might be adding similar handling to Windows datapath driver. However, I'm wondering why ip fragments were even generated in this scenario. We captured the packets on both uplink and antrea-gw0 interfaces and found that: the packets received on uplink were not fragmented and the largest size was 1414, but the packets received on antrea-gw0 were in different size and IP-fragmented. It seems As a conclusion, there are two issues here:
And neither of them are new issues introduced in this release. The reason why it was discovered recently was that even if the conntrack state was marked wrongly, the reassembled packet may be processed after some following packets have been forwarded to container port and the reassembled packet itself can always be forwarded to container port correctly, it's just packets processed after it will be dropped, so the application may receive the HTTP response but the To workaround it, @wenyingd had PR #2985 that avoids rewriting dMAC of the reply packets to antrea-gw0 by mistake by matching whether the in_port is antrea-gw0, but the connection was still marked as "originated from gateway interface" wrongly. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
Describe the bug
It failed when the Windows Pod tried to access external network. However, the Node itself can access external network. This happened for the latest 20+ builds.
The text was updated successfully, but these errors were encountered: