-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824
Conversation
/test-all |
Codecov Report
@@ Coverage Diff @@
## main #1824 +/- ##
=======================================
Coverage ? 42.26%
=======================================
Files ? 196
Lines ? 16715
Branches ? 0
=======================================
Hits ? 7065
Misses ? 8654
Partials ? 996
Flags with carried forward coverage won't be shown. Click here to find out more. |
c8e2eef
to
267accb
Compare
/test-all |
89c7786
to
efc9f5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change also fixes traffic to Services with external endpoints, besides Services backed by Node IPs. Could you change the commit message for this?
Yes, exactly. Thanks Jianjun for your review. Will change message soon in next update. |
/test-all |
245fada
to
2915296
Compare
Would you change the title of the commit too? |
Sure, thanks for your reminder. |
/test-all |
// - ct_mark is set to 0x21(ServiceCTMark) | ||
// This flow resubmits the packets to the following table to avoid being forwarded | ||
// to the bridge port by default. | ||
flows = append(flows, c.pipeline[conntrackStateTable].BuildFlow(priorityHigh). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it different from the first flow created in L711? seems duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we add a new match field markTrafficFromUplink
. Or the traffic recieved from uplink will hit L1563 and be forwarded to br-int directly.
// Output the non-SNAT packet to the bridge interface directly if it is received from the uplink interface.
c.pipeline[conntrackStateTable].BuildFlow(priorityNormal).
MatchProtocol(binding.ProtocolIP).
MatchRegRange(int(marksReg), markTrafficFromUplink, binding.Range{0, 15}).
Action().Output(int(bridgeOFPort)).
Cookie(c.cookieAllocator.Request(category).Raw()).
Done(),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, could we make L1563 low priority, L1552 flow can be normal priority too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable by analyzing the flows. Will have a try.
Cookie(c.cookieAllocator.Request(category).Raw()). | ||
Done()) | ||
// If SNAT is needed after DNAT: | ||
// - For new connection: commit to CtZoneSNAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider doing all SNAT in this zone later? It seems currently SNAT is performed in CtZone when it's not DNATed and in CtZoneSNAT otherwise, which seems a little difficult complex. Or you plan to unify them when moving to two bridges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unify them when moving to two bridges. This PR only handle the DNAT + SNAT
case to aovid introducing big change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree we should handle all SNAT in a single way. Need to understand the two bridge proposal better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you have something, could you share? I hope to understand how we are going to organize flows with two bridges, as I am designing flows for SNAT policy, which might be impacted by the two-bridge change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @jianjuns. Actually the two bridges
is just a draft idea for NodePort Service support on Windows and need to be investigated.
Agree with you and Quan, handle all SNAT in a single way
would be better.
But consider the v0.13.0 is near to release, do you think if we could merge current change first and make further step(all SNAT in single ct_zone or use other ways) after v0.13.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can unify SNAT flows in the next release.
When a Pod access cluster service and the selected endpoint uses node IP(hostnetwork mode). The request packets need to be SNATed after have been DNATed. On Windows node, antrea both applied both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>
When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>
Signed-off-by: Rui Cao <[email protected]>
/test-all |
/test-containerd-networkpolicy |
/test-containerd-conformance |
…usterIP Service (antrea-io#1824) When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>
…usterIP Service (antrea-io#1824) When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>
…usterIP Service (#1824) When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: #1759 Signed-off-by: Rui Cao <[email protected]>
When a Pod accesses a ClusterIP Service and the IP of the selected
endpoint is not in "cluster-cidr". The request packets need to be
SNAT'd after have been DNAT'd. For example, the endpoint Pod may
run in hostNetwork and the IP of the endpoint is the current
Node IP. Currently, on Windows Node antrea applies both DNAT
and SNAT in the same ct_zone. That's not supported by OVS.
In this patch, we introduce a new ct_zone to track this kind of
SNATed connection in a different ct_zone.
Fixes: #1759
Signed-off-by: Rui Cao [email protected]