Cannot access ClusterIP service if the endpoint is on another Node when AntreaProxy is disabled #2319
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/critical-urgent
Highest priority. Must be actively worked on as someone's top priority right now.
Milestone
Thanks @hxietkg for finding the issue.
Describe the bug
When AntreaProxy is disabled, a Pod cannot access the ClusterIP of the Service if the selected endpoints is on another Node.
For example, the DNS query against kube-dns service failed because the reply was from unexpected source:
Access a http service got no reply:
The root cause of this issue is that, if the reply traffic of a connection that has been processed by iptables/ipvs rules (of kube-proxy) is received from the tunnel interface, its destination MAC would be rewritten twice because it would have both gatewayCTMark and macRewriteMark set. The latter rewriting would overwrite the former one and would cause the packets to be delivered to the destination Pod directly without doing reversed NAT in the host netns.
To Reproduce
Expected
The access should succeed.
The failure should be caught by CI tests.
Actual behavior
The access failed.
No existing CI tests can catch it reliably because upstream tests don't run with AntreaProxy disabled and the Antrea specific e2e tests don't have dedicated cross-Node Service access case.
Versions:
Please provide the following information:
The text was updated successfully, but these errors were encountered: