-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ProxyTerminatingEndpoints in AntreaProxy #4607
Support ProxyTerminatingEndpoints in AntreaProxy #4607
Conversation
4d056d0
to
2182b34
Compare
Codecov Report
@@ Coverage Diff @@
## main #4607 +/- ##
==========================================
- Coverage 69.93% 69.92% -0.01%
==========================================
Files 400 403 +3
Lines 59457 60259 +802
==========================================
+ Hits 41582 42137 +555
- Misses 15077 15303 +226
- Partials 2798 2819 +21
*This pull request uses carry forward flags. Click here to find out more.
|
2182b34
to
9becf81
Compare
9becf81
to
a030f1c
Compare
a030f1c
to
0a54517
Compare
a565f69
to
c9fe386
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits.
9dca8c6
to
361b805
Compare
/test-all |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
pkg/agent/proxy/proxier.go
Outdated
if internalPolicyLocal { | ||
endpoints = localEndpoints | ||
} else { | ||
endpoints = clusterEndpoints | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't endpoints just allReachableEndpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterEndpoints may contain remote Endpoints that aren't in localEndpoints, while localEndpoints may contain
terminating or topologically-unavailable local endpoints that aren't in clusterEndpoints. So we have to merge
the two lists, that's allReachableEndpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how it's related to this code, and
- When the Service is ClusterIP, if the internal policy is local, why allReachableEndpoints ever contains non local endpoints? if the internal policy is Cluster, why allReachableEndpoints doesn't contain all cluster endpoints?
- If allReachableEndpoints is not the parameter of the single
InstallServiceGroup
call, why we install use it as the parameter ofInstallEndpointFlows
? If there are different endpoints, what are they installed for?
pkg/agent/proxy/proxier.go
Outdated
if bothPolicyLocal { | ||
endpoints = localEndpoints | ||
} else { | ||
endpoints = clusterEndpoints | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
docs/feature-gates.md
Outdated
|
||
### ProxyTerminatingEndpoints | ||
|
||
`ProxyTerminatingEndpoints` enables ProxyTerminatingEndpoints support in AntreaProxy. When ProxyTerminatingEndpoints is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if the support is controlled by feature gate and disabled by default, could Antrea pass latest K8s conformance test? I remember you mentioned latest conformance requires some EndpointSlice related feature to be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll verify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified that with Zhengsheng:
- It is not conformance test, it is sig-network test.
- For sig-network test in Kubernetes 1.26, EndpointSliceTerminatingCondition is GA, and terminating Endpoints will be included in EndpointSlice anyway, then we need to filter out terminating Endpoints when ProxyTerminatingEndpoints is not enabled. In current code, we don't have such code to filter out terminating Endpoints when EndpointSlice is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if we merge it as is, the test case won't pass unless the ProxyTerminatingEndpoints feature is enabled?
If so, I feel not really necessary to add this feature gate, the code isolated by the feature gate is very few and no real risk, and proxy terminating endpoints has been proved an acceptable behavior in kube-proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jianjuns for input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to pass corresponding sig-network tests about ProxyTerminatingEndpoints, we need to enable both this feature and proxyAll.
Why? If proxyAll is not running, kube-proxy must be running, right? How could kube-proxy fail the test?
I think we might need a feature gate since like TopologyAwareHints in AntreaProxy has a feature gateway, ProxyTerminatingEndpoints should be consistent with that.
I don't feel such consistency is meaningful. We do have many small features that are added directly, like you did in #2792. The rule to add a feature gate should be related to the code/feature's risk and maturity, instead of whether there is one in K8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? If proxyAll is not running, kube-proxy must be running, right? How could kube-proxy fail the test?
Sorry, I missed something: as you said, the tests will always pass with kube-proxy. The code of ProxyTerminatingEndpoints in AntreaProxy takes effect when proxyAll is enabled and no kube-proxy.
I don't feel such consistency is meaningful. We do have many small features that are added directly, like you did in #2792. The rule to add a feature gate should be related to the code/feature's risk and maturity, instead of whether there is one in K8s.
If so, do you think we could also remove the feature gate TopologyAwareHints in Antrea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then there should be no problem if we just remove ProxyTerminatingEndpoints featuregate regardless of the configuration?
I didn't evaluate the risk of TopologyAwareHints but at least it's not worth to change what has been added, which would just cause more confusion. We should consider promoting it to Beta and GA instead according to it's maturity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have anymore problem if we just remove ProxyTerminatingEndpoints. Then with proxyAll enabled and without kube-proxy, AntreaProxy can pass the follow sig-network test cases:
[It] [sig-network] Services should fallback to terminating endpoints when there are no ready endpoints with internalTrafficPolicy=Cluster [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fallback to local terminating endpoints when there are no ready endpoints with internalTrafficPolicy=Local [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fallback to terminating endpoints when there are no ready endpoints with externallTrafficPolicy=Cluster [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fail health check node port if there are only terminating endpoints [Feature:ProxyTerminatingEndpoints]
The test that Antrea cannot pass is:
[It] [sig-network] Services should fallback to local terminating endpoints when there are no ready endpoints with externalTrafficPolicy=Local [Feature:ProxyTerminatingEndpoints]
The root cause is that AntreaProxy doesn't select any remote Endpoints as backend when externalTrafficPolicy is Local (in this test, no local Endpoint is available in a sub test), while kube-proxy could select remote Endpoints as backend when externalTrafficPolicy is Local and client is from in-cluster. I'll add another PR to support this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the featuregate.
Signed-off-by: Hongliang Liu <[email protected]>
8c2d9a3
to
d4955ed
Compare
/test-all |
e2e failed on known flaky test. |
No, that's a new error. I think it's introduced by #4654. The TrafficControl port was not deleted somehow, and antrea-agent failed to set no flood for it after restart. I meant the one in https://github.com/antrea-io/antrea/actions/runs/4312092832/jobs/7522929072 |
Anyway it's not related to this PR. /skip-e2e |
Signed-off-by: Hongliang Liu <[email protected]>
Signed-off-by: Hongliang Liu [email protected]