[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824

ruicao93 · 2021-02-05T14:57:55Z

When a Pod accesses a ClusterIP Service and the IP of the selected
endpoint is not in "cluster-cidr". The request packets need to be
SNAT'd after have been DNAT'd. For example, the endpoint Pod may
run in hostNetwork and the IP of the endpoint is the current
Node IP. Currently, on Windows Node antrea applies both DNAT
and SNAT in the same ct_zone. That's not supported by OVS.

In this patch, we introduce a new ct_zone to track this kind of
SNATed connection in a different ct_zone.

Fixes: #1759

Signed-off-by: Rui Cao [email protected]

ruicao93 · 2021-02-05T15:01:04Z

/test-all

codecov-io · 2021-02-05T15:47:21Z

Codecov Report

❗ No coverage uploaded for pull request base (main@abb6c33). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1824   +/-   ##
=======================================
  Coverage        ?   42.26%           
=======================================
  Files           ?      196           
  Lines           ?    16715           
  Branches        ?        0           
=======================================
  Hits            ?     7065           
  Misses          ?     8654           
  Partials        ?      996

Flag	Coverage Δ
kind-e2e-tests	`42.26% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

ruicao93 · 2021-02-08T01:46:02Z

/test-all

jianjuns

I think this change also fixes traffic to Services with external endpoints, besides Services backed by Node IPs. Could you change the commit message for this?

pkg/ovs/openflow/ofctrl_builder.go

pkg/agent/openflow/pipeline.go

ruicao93 · 2021-02-08T03:34:31Z

I think this change also fixes traffic to Services with external endpoints, besides Services backed by Node IPs. Could you change the commit message for this?

Yes, exactly. Thanks Jianjun for your review. Will change message soon in next update.

ruicao93 · 2021-02-08T05:20:44Z

/test-all

jianjuns · 2021-02-08T06:04:27Z

Would you change the title of the commit too?

ruicao93 · 2021-02-08T06:45:47Z

Would you change the title of the commit too?

Sure, thanks for your reminder.

ruicao93 · 2021-02-08T06:45:59Z

/test-all

pkg/agent/openflow/pipeline.go

tnqn · 2021-02-08T09:14:11Z

pkg/agent/openflow/pipeline.go

+		//   - ct_mark is set to 0x21(ServiceCTMark)
+		// This flow resubmits the packets to the following table to avoid being forwarded
+		// to the bridge port by default.
+		flows = append(flows, c.pipeline[conntrackStateTable].BuildFlow(priorityHigh).


How is it different from the first flow created in L711? seems duplicate

Here we add a new match field markTrafficFromUplink . Or the traffic recieved from uplink will hit L1563 and be forwarded to br-int directly.

// Output the non-SNAT packet to the bridge interface directly if it is received from the uplink interface. c.pipeline[conntrackStateTable].BuildFlow(priorityNormal). MatchProtocol(binding.ProtocolIP). MatchRegRange(int(marksReg), markTrafficFromUplink, binding.Range{0, 15}). Action().Output(int(bridgeOFPort)). Cookie(c.cookieAllocator.Request(category).Raw()). Done(),

For simplicity, could we make L1563 low priority, L1552 flow can be normal priority too?

I think it's reasonable by analyzing the flows. Will have a try.

pkg/agent/openflow/pipeline.go

tnqn · 2021-02-08T09:57:21Z

pkg/agent/openflow/pipeline.go

+			Cookie(c.cookieAllocator.Request(category).Raw()).
+			Done())
+		// If SNAT is needed after DNAT:
+		//   - For new connection: commit to CtZoneSNAT


Do we consider doing all SNAT in this zone later? It seems currently SNAT is performed in CtZone when it's not DNATed and in CtZoneSNAT otherwise, which seems a little difficult complex. Or you plan to unify them when moving to two bridges?

I think unify them when moving to two bridges. This PR only handle the DNAT + SNAT case to aovid introducing big change.

I would agree we should handle all SNAT in a single way. Need to understand the two bridge proposal better.

Once you have something, could you share? I hope to understand how we are going to organize flows with two bridges, as I am designing flows for SNAT policy, which might be impacted by the two-bridge change.

Sure @jianjuns. Actually the two bridges is just a draft idea for NodePort Service support on Windows and need to be investigated.

Agree with you and Quan, handle all SNAT in a single way would be better.

But consider the v0.13.0 is near to release, do you think if we could merge current change first and make further step(all SNAT in single ct_zone or use other ways) after v0.13.0?

We can unify SNAT flows in the next release.

When a Pod access cluster service and the selected endpoint uses node IP(hostnetwork mode). The request packets need to be SNATed after have been DNATed. On Windows node, antrea both applied both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>

When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>

Signed-off-by: Rui Cao <[email protected]>

ruicao93 · 2021-02-08T16:44:46Z

/test-all

ruicao93 · 2021-02-09T07:33:59Z

/test-containerd-networkpolicy

ruicao93 · 2021-02-09T07:57:08Z

/test-containerd-conformance

…usterIP Service (antrea-io#1824) When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: antrea-io#1759 Signed-off-by: Rui Cao <[email protected]>

…usterIP Service (#1824) When a Pod accesses a ClusterIP Service and the IP of the selected endpoint is not in "cluster-cidr". The request packets need to be SNAT'd after have been DNAT'd. For example, the endpoint Pod may run in hostNetwork and the IP of the endpoint is the current Node IP. Currently, on Windows Node antrea applies both DNAT and SNAT in the same ct_zone. That's not supported by OVS. In this patch, we introduce a new ct_zone to track this kind of SNATed connection in a different ct_zone. Fixes: #1759 Signed-off-by: Rui Cao <[email protected]>

vmwclabot added the cla-not-required label Feb 5, 2021

ruicao93 changed the title ~~[WIP][Windows] Fix Pod cannot access k8s API server service issue~~ [WIP][Windows] Fix Pod cannot access k8s API server service Feb 5, 2021

ruicao93 force-pushed the service_api branch 2 times, most recently from c8e2eef to 267accb Compare February 8, 2021 01:20

ruicao93 added this to the Antrea v0.13.0 release milestone Feb 8, 2021

ruicao93 force-pushed the service_api branch from 267accb to 45e23ad Compare February 8, 2021 01:44

ruicao93 changed the title ~~[WIP][Windows] Fix Pod cannot access k8s API server service~~ [Windows] Fix Pod cannot access k8s API server service Feb 8, 2021

ruicao93 requested a review from jianjuns February 8, 2021 01:45

ruicao93 requested a review from tnqn February 8, 2021 01:46

ruicao93 force-pushed the service_api branch 2 times, most recently from 89c7786 to efc9f5c Compare February 8, 2021 03:19

jianjuns reviewed Feb 8, 2021

View reviewed changes

ruicao93 force-pushed the service_api branch 3 times, most recently from 245fada to 2915296 Compare February 8, 2021 05:36

ruicao93 changed the title ~~[Windows] Fix Pod cannot access k8s API server service~~ [Windows] Fix Pod cannot access endpoints with external IP through cluster service Feb 8, 2021

ruicao93 changed the title ~~[Windows] Fix Pod cannot access endpoints with external IP through cluster service~~ [Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service Feb 8, 2021

tnqn reviewed Feb 8, 2021

View reviewed changes

ruicao93 added 3 commits February 9, 2021 00:18

Address comments

b514819

Signed-off-by: Rui Cao <[email protected]>

ruicao93 force-pushed the service_api branch from a709ebe to b514819 Compare February 8, 2021 16:27

jianjuns approved these changes Feb 9, 2021

View reviewed changes

ruicao93 merged commit 8027293 into antrea-io:main Feb 9, 2021

antoninbas mentioned this pull request Mar 3, 2021

Replace KubeProxy Design Draft (Linux Only) #1931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824

[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824

ruicao93 commented Feb 5, 2021 •

edited

Loading

ruicao93 commented Feb 5, 2021

codecov-io commented Feb 5, 2021 •

edited

Loading

ruicao93 commented Feb 8, 2021

jianjuns left a comment •

edited

Loading

ruicao93 commented Feb 8, 2021 •

edited

Loading

ruicao93 commented Feb 8, 2021

jianjuns commented Feb 8, 2021

ruicao93 commented Feb 8, 2021

ruicao93 commented Feb 8, 2021

tnqn Feb 8, 2021

ruicao93 Feb 8, 2021

tnqn Feb 9, 2021

ruicao93 Feb 9, 2021

tnqn Feb 8, 2021

ruicao93 Feb 8, 2021

jianjuns Feb 8, 2021

jianjuns Feb 8, 2021

ruicao93 Feb 9, 2021 •

edited

Loading

jianjuns Feb 9, 2021

ruicao93 commented Feb 8, 2021

ruicao93 commented Feb 9, 2021

ruicao93 commented Feb 9, 2021

[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824

[Windows] Fix Pod cannot access endpoints with external IP through ClusterIP Service #1824

Conversation

ruicao93 commented Feb 5, 2021 • edited Loading

ruicao93 commented Feb 5, 2021

codecov-io commented Feb 5, 2021 • edited Loading

Codecov Report

ruicao93 commented Feb 8, 2021

jianjuns left a comment • edited Loading

Choose a reason for hiding this comment

ruicao93 commented Feb 8, 2021 • edited Loading

ruicao93 commented Feb 8, 2021

jianjuns commented Feb 8, 2021

ruicao93 commented Feb 8, 2021

ruicao93 commented Feb 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruicao93 Feb 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruicao93 commented Feb 8, 2021

ruicao93 commented Feb 9, 2021

ruicao93 commented Feb 9, 2021

ruicao93 commented Feb 5, 2021 •

edited

Loading

codecov-io commented Feb 5, 2021 •

edited

Loading

jianjuns left a comment •

edited

Loading

ruicao93 commented Feb 8, 2021 •

edited

Loading

ruicao93 Feb 9, 2021 •

edited

Loading