-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NodePort/LoadBalancer issue when proxyAll is enabled #3295
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3295 +/- ##
==========================================
- Coverage 59.08% 53.05% -6.04%
==========================================
Files 331 462 +131
Lines 28444 54663 +26219
==========================================
+ Hits 16806 29000 +12194
- Misses 9804 23200 +13396
- Partials 1834 2463 +629
Flags with carried forward coverage won't be shown. Click here to find out more.
|
pkg/agent/proxy/proxier.go
Outdated
var localEndpointUpdateList []k8sproxy.Endpoint | ||
// If externalTrafficPolicy of the previous Service is Cluster, a group which only has local Endpoints | ||
// should be installed. | ||
if pSvcInfo != nil && !pSvcInfo.NodeLocalExternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be 3 issues:
- If a Service's policy is local from the begining, the above code won't create a local group, adding code here will lead to inconsistent behavior.
- Code is duplicated with the block under
needUpdateEndpoints
. - There is no cleanup when policy is changed from local to cluster.
I think we could just set needUpdateEndpoints
to true if policy changes, just like when SessionAffinityType changes, then the existing code can take care of installing the local group.
For cleanup, it should uninstall the local group when NodeLocalExternal is false and local group id is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the idea that just setting needUpdateEndpoints
to true if policy changes, and this will be easier to understand. We also need a cleanup when policy is from Local to Cluster.
Please file an issue with steps to reproduce the problem and link the PR to it. |
aa411cd
to
8802d1b
Compare
/test-all-features-conformance |
Added. |
/test-e2e |
/test-networkpolicy |
/test-windows-e2e |
/test-integration |
pkg/agent/proxy/proxier.go
Outdated
@@ -370,7 +370,7 @@ func (p *proxier) installServices() { | |||
pSvcInfo = installedSvcPort.(*types.ServiceInfo) | |||
needRemoval = serviceIdentityChanged(svcInfo, pSvcInfo) || (svcInfo.SessionAffinityType() != pSvcInfo.SessionAffinityType()) | |||
needUpdateService = needRemoval || (svcInfo.StickyMaxAgeSeconds() != pSvcInfo.StickyMaxAgeSeconds()) | |||
needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType() | |||
needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType() || pSvcInfo.NodeLocalExternal() != svcInfo.NodeLocalInternal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType() || pSvcInfo.NodeLocalExternal() != svcInfo.NodeLocalInternal() | |
needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType() || pSvcInfo.NodeLocalExternal() != svcInfo.NodeLocalExternal() |
Could you add an e2e test to verify it?
pkg/agent/proxy/proxier.go
Outdated
@@ -170,15 +170,15 @@ func (p *proxier) removeStaleServices() { | |||
if svcInfo.NodeLocalExternal() { | |||
groupIDLocal, _ := p.groupCounter.Get(svcPortName, true) | |||
if err := p.ofClient.UninstallServiceGroup(groupIDLocal); err != nil { | |||
klog.ErrorS(err, "Failed to remove flows of Service", "Service", svcPortName) | |||
klog.ErrorS(err, "Failed to remove Group for Service with only local Endpoints", "Service", svcPortName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.ErrorS(err, "Failed to remove Group for Service with only local Endpoints", "Service", svcPortName) | |
klog.ErrorS(err, "Failed to remove Group of local Endpoints for Service", "Service", svcPortName) |
pkg/agent/proxy/proxier.go
Outdated
continue | ||
} | ||
p.groupCounter.Recycle(svcPortName, true) | ||
} | ||
// Remove Service group which has all Endpoints. | ||
groupID, _ := p.groupCounter.Get(svcPortName, false) | ||
if err := p.ofClient.UninstallServiceGroup(groupID); err != nil { | ||
klog.ErrorS(err, "Failed to remove flows of Service", "Service", svcPortName) | ||
klog.ErrorS(err, "Failed to remove Group of Service with all Endpoints", "Service", svcPortName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.ErrorS(err, "Failed to remove Group of Service with all Endpoints", "Service", svcPortName) | |
klog.ErrorS(err, "Failed to remove Group of all Endpoints for Service ", "Service", svcPortName) |
pkg/agent/proxy/proxier.go
Outdated
continue | ||
// Uninstall the group with only local Endpoints when Service externalTrafficPolicy is changed from Local | ||
// to Cluster. | ||
} else if !svcInfo.NodeLocalInternal() && pSvcInfo != nil && pSvcInfo.NodeLocalExternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if !svcInfo.NodeLocalInternal() && pSvcInfo != nil && pSvcInfo.NodeLocalExternal() { | |
} else if !svcInfo.NodeLocalExternal() && pSvcInfo != nil && pSvcInfo.NodeLocalExternal() { |
But doesn't "else if" already mean it?
8802d1b
to
17629d7
Compare
/test-all-features-conformance |
pkg/agent/proxy/proxier.go
Outdated
continue | ||
} | ||
|
||
} else if !svcInfo.NodeLocalExternal() && pSvcInfo != nil && pSvcInfo.NodeLocalExternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in #3295 (comment).
And you didn't choose the latter approach in https://github.com/antrea-io/antrea/pull/3295/files#r803823027, what if there is a transient error when processing the service in this round during which the policy is changed from local to cluster, could the code take care of cleaning this group next round? will pSvcInfo be local or cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. BTW, I have verified that calling UninstallServiceGroup
to uninstall a group which doesn't exist on OVS returns nil. I thought of this will get an error.
test/e2e/proxy_test.go
Outdated
time.Sleep(3 * time.Second) | ||
|
||
for idx, node := range nodes { | ||
agentName, err := data.getAntreaPodOnNode(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the err below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
3a0d481
to
4d4da0f
Compare
4d4da0f
to
838a79d
Compare
/test-all-features-conformance |
pkg/agent/proxy/proxier_test.go
Outdated
@@ -624,8 +629,7 @@ func testClusterIPRemoval(t *testing.T, svcIP net.IP, epIP net.IP, isIPv6 bool) | |||
mockRouteClient.EXPECT().AddClusterIPRoute(svcIP).Times(1) | |||
mockOFClient.EXPECT().UninstallServiceFlows(svcIP, uint16(svcPort), bindingProtocol).Times(1) | |||
mockOFClient.EXPECT().UninstallEndpointFlows(bindingProtocol, gomock.Any()).Times(1) | |||
mockOFClient.EXPECT().UninstallServiceGroup(groupID).Times(1) | |||
|
|||
mockOFClient.EXPECT().UninstallServiceGroup(gomock.Any()).AnyTimes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be more accurate? It should be called twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Updated.
test/e2e/framework.go
Outdated
@@ -1528,6 +1528,20 @@ func (data *TestData) createAgnhostNodePortService(serviceName string, affinity, | |||
return data.createService(serviceName, testNamespace, 8080, 8080, map[string]string{"app": "agnhost"}, affinity, nodeLocalExternal, corev1.ServiceTypeNodePort, ipFamily) | |||
} | |||
|
|||
func (data *TestData) updateAgnhostNodePortServiceExternalTrafficPolicy(serviceName string, nodeLocalExternal bool) (*corev1.Service, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the method could just be updateServiceExternalTrafficPolicy
to be generic? otherwise we have to define other methods updateNginxNodePortServiceExternalTrafficPolicy, updateAgnhostLoadBalancerServiceExternalTrafficPolicy in future following this style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
test/e2e/proxy_test.go
Outdated
} | ||
|
||
// Function of checking the number of occurrences of the target IP in given OVS group output. | ||
checkOutputGroup := func(groupOutput string, podIP *net.IP, expectedCount int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it doesn't check the traffic and returned client ip directly but check the OVS groups? I think it's more end to end to do the former and can find issues that cannot detected by the latter check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated.
When proxyAll is enabled, create a NodePort/LoadBalancer Service whose externalTrafficPolicy is Cluster, then only an OVS group with all Endpoints will be installed. If change externalTrafficPolicy of the Service from Cluster to Local, an OVS group with only local Endpoints should be also installed since externalTrafficPolicy is Local, but it is not. This patch fixes the issue that OVS group with only local Endpoints is not installed when externalTrafficPolicy of Service is changed from Cluster to Local. Signed-off-by: Hongliang Liu <[email protected]>
838a79d
to
c479ecd
Compare
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one question
@@ -146,6 +146,14 @@ func probeClientIPFromPod(data *TestData, pod string, baseUrl string) (string, e | |||
return host, err | |||
} | |||
|
|||
func reverseStrs(strs []string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why it needs to reverse the urls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confused to read.
For example, there are two Nodes, and assumed that A is 192.168.1.1; B is 192.168.1.2. portStr
is the port number of a NodePort Service.
The test URLs are generated by be following code:
nodeIPs := []string{controlPlaneNodeIPv4(), workerNodeIPv4(1)}
...
var urls []string
for _, nodeIP := range nodeIPs {
urls = append(urls, net.JoinHostPort(nodeIP, portStr))
}
To make Node A connect to Node B's NodePort and Node B connect to Node A's NodePort in a for
loop, the slice should be reversed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
/test-all-features-conformance |
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-e2e |
@hongliangl Can you backport this back to previous versions if applicable? |
) When proxyAll is enabled, create a NodePort/LoadBalancer Service whose externalTrafficPolicy is Cluster, then only an OVS group with all Endpoints will be installed. If change externalTrafficPolicy of the Service from Cluster to Local, an OVS group with only local Endpoints should be also installed since externalTrafficPolicy is Local, but it is not. This patch fixes the issue that OVS group with only local Endpoints is not installed when externalTrafficPolicy of Service is changed from Cluster to Local. Signed-off-by: Hongliang Liu <[email protected]>
Fix #3301
When proxyAll is enabled, create a NodePort/LoadBalancer Service whose
externalTrafficPolicy is Cluster, then only an OVS group with all
Endpoints will be installed. If change externalTrafficPolicy of the
Service from Cluster to Local, an OVS group with only local Endpoints
should be also installed since externalTrafficPolicy is Local, but it
is not. This patch fixes the issue that OVS group with only local
Endpoints is not installed when externalTrafficPolicy of Service is
changed from Cluster to Local.
Signed-off-by: Hongliang Liu [email protected]