-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodePort, LoadBalancer and ClusterIP from k8s Node support for AntreaProxy on Linux (implemented by Linux TC) #2239
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2239 +/- ##
==========================================
- Coverage 59.67% 57.91% -1.77%
==========================================
Files 281 284 +3
Lines 22207 23407 +1200
==========================================
+ Hits 13253 13555 +302
- Misses 7546 8398 +852
- Partials 1408 1454 +46
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an updated design doc to learn the new design with TC?
6bf06af
to
768f6a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing..
768f6a0
to
801cf15
Compare
801cf15
to
7456a00
Compare
13da96e
to
4fb3b10
Compare
4fb3b10
to
746d246
Compare
0300af5
to
9894344
Compare
06cedcc
to
3e46e1e
Compare
3e46e1e
to
c8fca49
Compare
24b55d6
to
ef7afc9
Compare
@@ -10,6 +10,10 @@ featureGates: | |||
# this flag will not take effect. | |||
# EndpointSlice: false | |||
|
|||
# Enable full Service support in AntreaProxy in antrea-agent. All type of Services can be | |||
# accessed from outside the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: All types of Service traffic will be handled by AntreaProxy?
Maybe list what more are supported compared to disabled.
ef7afc9
to
8c428b7
Compare
8c428b7
to
b1d3305
Compare
95444b8
to
351bf3c
Compare
"antrea.io/antrea/pkg/agent/util" | ||
) | ||
|
||
func getAvailableNodePortIPs(nodePortIPsFromConfig []string, gateway string) (map[int][]net.IP, map[int][]net.IP, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move it under pkg/agent/util ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. @antoninbas suggested the func could be here, so I moved it here.
pkg/agent/openflow/pipeline.go
Outdated
Done(), | ||
) | ||
} else { | ||
// If externalTrafficPolicy of Service is Local, clients from localhost requires SNAT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to confirm if this switch is hit only in NodePort Service, and the source and destination are the same Node address? If yes, please add comments, otherwise svcIP
as source might cause misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for remind, I almost forgot this.
pkg/agent/openflow/pipeline.go
Outdated
DeleteLearned(). | ||
MatchNetworkSrcAsDst(protocol). | ||
SetLearnedSrcMACAsDstMAC(). | ||
LoadReg(int(PortCacheReg), config.HostGatewayOFPort, ofPortRegRange). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 3 flows in this function has the same learn action, is it possible to use a generic function for the learn action. Then it should be easier to change if needed in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/agent/proxy/proxier.go
Outdated
if p.proxyFullEnabled { | ||
// Install ClusterIP route on k8s node so that ClusterIP can be accessed on k8s node. When thWhen the | ||
// Service is removed, the route does not need to be removed, because the route target IP block is calculated | ||
// every time a new ClusterIP is created. When there are enough ClusterIPs, the Service CIDR can be finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean Antrea will re-calculate the route target IP block when new ClusterIP added? How shall I understand "when there are enough ClusterIPs"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this explanation can be better:
// Install ClusterIP route on Node so that ClusterIP can be accessed on Node. Every time a new ClusterIP
// is created, the routing target IP block will be recalculated for expansion to be able to route the new
// created ClusterIP. Deleting a ClusterIP will not shrink the target routing IP block. The Service CIDR
// can be finally calculated after creating enough ClusterIPs.
pkg/agent/proxy/proxier_windows.go
Outdated
if err := p.ofClient.InstallServiceFlows(groupID, svcIP, svcPort, protocol, affinityTimeout); err != nil { | ||
return err | ||
// Note that, NodePort Service is not supported on Windows currently. | ||
func (p *proxier) installLoadBalancerService(groupID binding.GroupIDType, loadBalancerIPStrings []string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the implementations for LoadBalancerService are the same on Linux and Windows, maybe move the function to proxier.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok, Windows
has one more function, Linux
will just hit a ni function.
@@ -169,3 +169,102 @@ func GetIPWithFamily(ips []net.IP, addrFamily uint8) (net.IP, error) { | |||
return nil, errors.New("no IP found with IPv4 AddressFamily") | |||
} | |||
} | |||
|
|||
// GetAllNodeIPs gets all Node IP addresses (not including IPv6 link local address). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean IPv4 link local address is included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPv6 link local is special, its prefix is fe80::/64
. It can be only routed. I think it should be as NodePort IP. For IPv4 link local, you mean 169.254.x.x?
915260a
to
68016f0
Compare
/test-all |
pkg/agent/openflow/client.go
Outdated
@@ -536,7 +549,16 @@ func (c *client) InstallEndpointFlows(protocol binding.Protocol, endpoints []pro | |||
portVal := portToUint16(endpointPort) | |||
cacheKey := generateEndpointFlowCacheKey(endpoint.IP(), endpointPort, protocol) | |||
flows = append(flows, c.endpointDNATFlow(endpointIP, portVal, protocol)) | |||
if endpoint.GetIsLocal() { | |||
|
|||
// If Endpoint network is host network, don't add flow to hairpinSNATFlow table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we need to SNAT hairpin traffic for host network pod? Is it becasue there is another SNAT flow that will do this? If so, can we unify the flows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flow will never be hit. Another flow is used to do SNAT for hairpin Service traffic from gateway with a different ct_zone. Existing hairpin SNAT for Service from pod is stateless(just modify source IP without ct commit), I think we don't need unify these two flows.
pkg/agent/openflow/pipeline.go
Outdated
hardTimeout := uint16(0) | ||
// If the SessionAffinity of Service is set, the learned flow should be set with hard timeout. Assumed that the SessionAffinity | ||
// is 300s, if the learned flow is set with idle timeout 60s, the learned flow may be deleted when idle timeout is reached, | ||
// then the response traffic cannot be returned correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why this is specific to the case when sessionAffinity is set. Even if it's not set, isn't the problem there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the previous design, I forget to remove this.
pkg/agent/openflow/pipeline.go
Outdated
// gateway, so set servicePktMark to let the response packet go back to Antrea gateway. | ||
// Note that, the priority of this flow is lower than above two flows. The purpose is to avoid | ||
// the packet of ClusterIP from Antrea gateway hits the flow. | ||
serviceConnectionTrackCommitTable.BuildFlow(priorityNormal).MatchProtocol(proto). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this can work, it makes the implementation a bit complicated with policy routing involved, so I'm thinking whether this is necessary, want to discuss it here:
- Even it's always expected to preserve client IP for ClusterIP traffic, we still have to SNAT the traffic when it's hairpin case, as well as kube-proxy.
- Is it a real case that a Service is NodePort type and its Endpoints are host network Pod? If a Pod is host network, a Node cannot hold more than one Pod for this Service so the Pods are unlikely managed by a regular controller like Deployment, ReplicationController, the most common case is DaemonSet Pod running some system service like antrea-agent. If the Pods are running in the host network, there is no point to expose them as a NodePort service which just introduces another hop and occupies another port. For example, the Kubernetes service's own Endpoints are the kube-apiserver Pods but the Service's itself is ClusterIP type and typically a loadbalancer will use the Node IP + kube-apiserver port as the backends directly, not via a NodePort.
Given the above reasons, I feel at least for the first version, we could just SNAT the traffic to be simpler. Then multiple cases can be unified as external Endpoints, and we can have single SNAT flow for all of them. Since this is in very early stage, we can still use this policy routing approach when users have real need of preserving client IP for NodePort services with host network Pods as backends. What do you think? @jianjuns @antoninbas @hongliangl @wenyingd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the solution is to use TC to match the pkt mark and redirect traffic - so TC does not work? I would agree policy routing sounds complicated.
I agree your 2) is a valid point. Are we able to check? If it does complicate the implementation, I would second to start with SNAT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the solution is to use TC to match the pkt mark and redirect traffic - so TC does not work? I would agree policy routing sounds complicated.
The implementation of this case has there steps:
Step 1 Mark the first request packet of the connection with pkt mark within OVS pipeline.
Step 2 Mark the first request packet of the connection with ct mark by matching above pkt mark in iptables mangle table.
Step 3 Mark the response packets of the connection with fw mark by matching above ct mark.
Step 4 Packets with above fw mark will be routed to Antrea gateway.
If using TC,
Step 1 is still step 1.
Step 2 cannot be implemented by TC, I didn't found any doc about how to match a pkt mark and make a ct mark.
Step 3 can be implemented by TC. https://man7.org/linux/man-pages/man8/tc-connmark.8.html
Step 4 can be also implemented by TC. https://man7.org/linux/man-pages/man8/tc-fw.8.html
I don't think the implementation with TC is more simple that that with policy routing. In addition, we may need to add more conditions to match the related packets. This design also adds more overhead. For step 4, TC rule will be installed at default interface's egress(after routing decision), then matched packets will be redirected to Antrea gateway. If using policy routing, before routing decision, the output interface is decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want not many ways (e.g. TC for request and policy routing for response) to redirect/route the traffic which will be hard to understand/troubleshoot/maintain. I am kindof leaning towards what Quan suggested to do SNAT first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if we just do SNAT, should we add some comments about the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely we should comment the behavior, and briefly describe why we choose to do SNAT.
pkg/agent/openflow/pipeline.go
Outdated
// Case 4: LoadBalancer, client from remote, Endpoint is on host network, perform SNAT with IP pf serviceGWHairpinIPv4 / serviceGWHairpinIPv6. | ||
// Case 5: LoadBalancer, client from localhost, Endpoint is on pod CIDR. As it is routed via Antrea gateway, its source IP is | ||
// Antrea gateway's IP, SNAT is not needed. | ||
// Case 6: LoadBalancer, client from localhost, Endpoint is on host network, perform SNAT with IP pf serviceGWHairpinIPv4 / serviceGWHairpinIPv6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think service only determines whether SNAT is required and which IP to SNAT is related to endpoint only, so the comment should be placed to endpoint flow.
Here it could just say NodePort/LoadBalancer traffic should be SNATed.
pkg/agent/proxy/proxier.go
Outdated
if svcInfo.NodeLocalExternal() { | ||
groupIDLocal, _ := p.groupCounter.Get(svcPortName, false) | ||
if err := p.ofClient.UninstallServiceGroup(groupIDLocal); err != nil { | ||
klog.Errorf("Failed to remove flows of Service %v: %v", svcPortName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structured logging
pkg/agent/proxy/proxier.go
Outdated
func (p *proxier) installNodePortService(groupID binding.GroupIDType, svcPort uint16, protocol binding.Protocol, affinityTimeout uint16, nodeLocalExternal bool) error { | ||
for _, nodeIPs := range p.nodePortIPMap { | ||
for _, nodeIP := range nodeIPs { | ||
if err := p.ofClient.InstallServiceFlows(groupID, nodeIP, svcPort, protocol, affinityTimeout); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even today, installing all service/endpoints flows takes long time as typically a single flow takes 2ms. I'm a little concerned that the number of the IPs would impact the time and memory. Basically if it originally takes 5s to install all service flows, 10 local IPs will make it 50s, and the memory usage in agent and ovs-vswitchd for service flow will have same impact. Some unusual cases will make this even worse, if we have some external IPs assigned to the Node, or kube-proxy ipvs is enabled at the same time.
Do we have a way to reduce the flows' redundancy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using a conjunctive flow to reduce the redundancy.
- Every NodePort IP is clause 1, conj_id is N,
- NodePort Port is clause 2, conj_id is N, load another register X, the register stores group number.
- A flow matching conj_id N, and a action of the flow is
group: NXM_NX_REGX[]
.
Unfortunately, above design cannot work. Below explanation is from https://man7.org/linux/man-pages/man7/ovs-actions.7.html
OTHER ACTIONS top
The conjunction action
Syntax:
conjunction(id, k/n)
This action allows for sophisticated ``conjunctive match’’ flows.
Refer to ``Conjunctive Match Fields’’ in ovs-fields(7) for
details.
A flow that has one or more conjunction actions may not have any
other actions except for note actions.
Conformance:
Open vSwitch 2.4 introduced the conjunction action and conj_id
field. They are Open vSwitch extensions to OpenFlow.
@wenyingd , do you have any good ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnqn I have tried to use conjunction to reduce flow entries, but I failed. The design I tried:
- We need three conjun_id to distinguish NodePort traffic
- externalTrafficPolicy is Cluster
- condition 1: nw_dst (all NodePort IPs)
- condition 2: nodePort port
- actions: set SNAT reg, generate learn flow.
- externalTrafficPolicy is Local, client is from remote
- condition 1: nw_dst (all NodePort IPs)
- condition 2: nodePort port
- actions: generate learn flow.
- externalTrafficPolicy is Local, client is from localhost
- condition 1: nw_dst, nw_src (all NodePort IPs)
- condition 2: nodePort port
- actions: set SNAT reg, generate learn flow.
- externalTrafficPolicy is Cluster
As we can see, the first two cases have the same condition 1, I have no idea how to do. The flow can be added with different priority, but the lower priority one will never be hit.
For Table 41, I have tried to unify the group action, but it seems that action of group cannot be set by a register.
pkg/agent/openflow/pipeline.go
Outdated
@@ -336,6 +352,9 @@ var ( | |||
// Endpoint, still needs to select an Endpoint, or if an Endpoint has already | |||
// been selected and the selection decision needs to be learned. | |||
serviceLearnRegRange = binding.Range{16, 18} | |||
// serviceSNATMarkRange takes a 1-bit range of register serviceSnatReg to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add these comments here or maybe in tunnelClassifierFlow() if that is the place all SNAT flows are added?
pkg/agent/openflow/pipeline.go
Outdated
// gateway, so set servicePktMark to let the response packet go back to Antrea gateway. | ||
// Note that, the priority of this flow is lower than above two flows. The purpose is to avoid | ||
// the packet of ClusterIP from Antrea gateway hits the flow. | ||
serviceConnectionTrackCommitTable.BuildFlow(priorityNormal).MatchProtocol(proto). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the solution is to use TC to match the pkt mark and redirect traffic - so TC does not work? I would agree policy routing sounds complicated.
I agree your 2) is a valid point. Are we able to check? If it does complicate the implementation, I would second to start with SNAT.
@@ -5,6 +5,10 @@ featureGates: | |||
# Service traffic. | |||
# AntreaProxy: true | |||
|
|||
# Enable full Service support in AntreaProxy in antrea-agent. All type of Services can be | |||
# accessed from outside the cluster. | |||
# AntreaProxyFull: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not have a good sense a Feature should be always enabled finally. But do you think we can use a Feature Gate to isolate the change, and change it to a config option later?
68016f0
to
431e55e
Compare
…Proxy on Linux This PR implements: - The connection request of NodePort whose client is from remote or localhost. - The connection request of LoadBalancer whose client is from remote or localhost. - The connection request of ClusterIP whose client is from localhost. For NodePort support, on each interface whose IP addresses can be NodePort IP addresses, Linux TC is used to redirect the request packets to Antrea gateway. For response packets, on interface Antrea gateway, Linux TC is used to redirect the packets back to the interface where the requests packets are from. For LoadBalancer support, when client is from remote hosts, on default route output interface, Linux TC is used to redirect the request packets to Antrea gateway. For response packets, on interface Antrea gateway,Linux TC is used to redirect the packets back to the default route output interface. When client is from localhost, the request packets are routed to Antrea gateway. For ClusterIP support, the request packets are routed to Antrea gateway. To support the Service traffic of above cases, there are main changes of OVS pipeline. - Add table serviceConntrackCommitTable 106 to perform SNAT for Service traffic. - Modify table hairpinSNATTable ID from 106 to 108. - Modify table serviceHairpinTable ID from 29 to 23. - Add table serviceConntrackTable 24 to transform SNATed connnections. - Add table serviceClassifierTable 35 to classify the Service traffic. - Add table serviceDstMacRewriteTable 75 to rewrite destination MAC address of response Service packets. Signed-off-by: Hongliang Liu <[email protected]>
431e55e
to
12e260c
Compare
a3d7dfc
to
be8a40f
Compare
Signed-off-by: Hongliang Liu <[email protected]>
be8a40f
to
ec20098
Compare
Signed-off-by: Hongliang Liu <[email protected]>
/test-all |
pkg/agent/route/interfaces.go
Outdated
@@ -51,6 +52,24 @@ type Interface interface { | |||
// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark. | |||
DeleteSNATRule(mark uint32) error | |||
|
|||
// InitService should add the basic TC configuration on Linux. | |||
InitService(nodePortIPMap map[int][]net.IP, isIPv6 bool) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name InitService is too generic. If it is for NodePort only, call it InitNodePortRoutes; if it also handles other types, call it InitServiceProxyRoutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This func will do:
- Remove stale LoadBalancer/ClusterIP route entries
- Add Linux basic filters for NodePort/LoadBalancer
- Add route entry for routing packets whose destination IP is serviceGWHairpinIP to Antrea gateway.
Maybe call it InitServiceProxyConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InitServiceProxyConfig sounds good to me.
pkg/agent/route/interfaces.go
Outdated
@@ -51,6 +52,24 @@ type Interface interface { | |||
// DeleteSNATRule should delete rule to SNAT outgoing traffic with the mark. | |||
DeleteSNATRule(mark uint32) error | |||
|
|||
// InitService should add the basic TC configuration on Linux. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know existing funcs go this way, but I suggest to change "should add" to "adds". The same for other funcs you added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding.
DeleteNodePort(nodePortIPMap map[int][]net.IP, port uint16, protocol binding.Protocol) error | ||
|
||
// AddClusterIPRoute should add route on k8s node for Service ClusterIP. | ||
AddClusterIPRoute(svcIP net.IP, isIPv6 bool) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with other funcs, I suggest either add "Routes" to other func names (I know they are TC operations, but that needs not be reflected in the RouteClient interface), or remove "Route" from this func name too.
Probably Route -> Routes, as I assume it might not be a single route entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is not a func to remove the routes for a ClusterIP Service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with other funcs, I suggest either add "Routes" to other func names (I know they are TC operations, but that needs not be reflected in the RouteClient interface), or remove "Route" from this func name too.
Probably Route -> Routes, as I assume it might not be a single route entry?
I don't know if adding the suffix "Routes" to func name will cause misunderstood.
- For funcs
AddNodePort
/DeleteNodePort
, no route entry will be added/deleted, only TC rules. - For funcs
AddLoadBalancer
/DeleteLoadBalancer
, route entries and TC rules are both added/deleted. - For func
AddClusterIPRoute
, only route entry will be added or updated.
I don't know if "Route" is an appropriate suffix for all the above funcs.
Why there is not a func to remove the routes for a ClusterIP Service?
We don't a func to remove the route for a ClusterIP Service. There is alway one route entry for all ClusterIP Services.
- If there is no route entry when a ClusterIP is added, create a route entry for it.
- If current route entry can route the ClusterIP, there is nothing to do.
- If current route cannot route the ClusterIP, replace the current route with a new route entry , and the new route entry 's destination IP block can route the ClusterIP IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how if the ClusterIP CIDR is changed and a ClusterIP is no more valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's no matter. What we wanted is to calculate the IP block of ClusterIP finally as possible, and this IP block is the destination of the only one routing entry for ClusterIP. Adding a ClusterIP may help enlarge the destination IP block(ClusterIP block) of routing entry.
Delete a ClusterIP is ok, as we want just only one routing entry which is used to route ClusterIP traffic. If we add a ClusterIP, then we create a routing entry for it, and when it is deleted, we delete the routing entry, then there will be lots of routing entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Then if we assume CIDR can only be extended we should be ok.
Could you add some comments before the AddClusterRouteIP() implementation in route_linux.go to explain what it does? Actually good to add comments before other RouteClient funcs you added too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add some comments.
pkg/agent/route/route_linux.go
Outdated
@@ -67,6 +78,9 @@ var ( | |||
// IPTablesSyncInterval is exported so that sync interval can be configured for running integration test with | |||
// smaller values. It is meant to be used internally by Run. | |||
IPTablesSyncInterval = 60 * time.Second | |||
|
|||
serviceGWHairpinIPv4 = net.ParseIP("169.254.169.253") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely should not define the constants multiple times. Please define them in a single file, and import from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about pkg/agent/config? If in the package, we should add a new file which is only to contain this two variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/agent/config sounds good to me. Maybe adding to node_config.go is fine too, if we want not to add a new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think node_config.go is okay.
@@ -431,6 +455,24 @@ func (c *Client) restoreIptablesData(podCIDR *net.IPNet, podIPSet string, snatMa | |||
"-j", iptables.MasqueradeTarget, "--random-fully", | |||
}...) | |||
|
|||
if c.proxyFull { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - why we do not put this in InitService(), or why not put InitService() code here? I think the former way is better to avoid passing/checking proxyFull in route and other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we put it to InitService
, I think we need to write a lot of duplicated code about iptables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got the point. Do not have a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comments to explain what rules are restored here when AntreaProxyFull is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
pkg/agent/route/route_linux.go
Outdated
writeLine(iptablesData, []string{ | ||
"-A", antreaPostRoutingChain, | ||
"-m", "comment", "--comment", `"Antrea: masquerade Service host network Endpoint traffic"`, | ||
"-s", serviceGWHairpinIPv4.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually feel bad route code needs to understand AntreaProxy logics and things like Service hairpin IP. Ideally we should isolate the logics, to avoid many components need to understand a feature.
Maybe we can say Route Client and OpenflowClient are special and need have feature specific code. But do you have some ideas? Also add @tnqn and @wenyingd to see what they think.
@@ -431,6 +455,24 @@ func (c *Client) restoreIptablesData(podCIDR *net.IPNet, podIPSet string, snatMa | |||
"-j", iptables.MasqueradeTarget, "--random-fully", | |||
}...) | |||
|
|||
if c.proxyFull { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comments to explain what rules are restored here when AntreaProxyFull is enabled?
Signed-off-by: Hongliang Liu <[email protected]>
This PR implements:
localhost.
or localhost.
For NodePort support, on each interface whose IP addresses can be
NodePort IP addresses, Linux TC is used to redirect the request packets
to Antrea gateway. For response packets, on interface Antrea gateway,
Linux TC is used to redirect the packets back to the interface where
the requests packets are from.
For LoadBalancer support, when client is from remote hosts, on default
route output interface, Linux TC is used to redirect the request
packets to Antrea gateway. For response packets, on interface Antrea
gateway,Linux TC is used to redirect the packets back to the default
route output interface. When client is from localhost, the request
packets are routed to Antrea gateway.
For ClusterIP support, the request packets are routed to Antrea gateway.
To support the Service traffic of above cases, there are main changes of
OVS pipeline.
Service traffic.
address of response Service packets.
The below cases are supported on:
Signed-off-by: Hongliang Liu [email protected]