Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ProxyTerminatingEndpoints in AntreaProxy #4607

Merged
merged 1 commit into from
Mar 2, 2023

Conversation

hongliangl
Copy link
Contributor

Signed-off-by: Hongliang Liu [email protected]

@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch from 4d056d0 to 2182b34 Compare February 7, 2023 02:52
@hongliangl hongliangl marked this pull request as ready for review February 7, 2023 02:52
@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Merging #4607 (8c2d9a3) into main (cf90cfa) will decrease coverage by 0.01%.
The diff coverage is 96.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4607      +/-   ##
==========================================
- Coverage   69.93%   69.92%   -0.01%     
==========================================
  Files         400      403       +3     
  Lines       59457    60259     +802     
==========================================
+ Hits        41582    42137     +555     
- Misses      15077    15303     +226     
- Partials     2798     2819      +21     
Flag Coverage Δ *Carryforward flag
e2e-tests 38.29% <ø> (-0.05%) ⬇️ Carriedforward from 361b805
integration-tests 34.38% <0.00%> (-0.20%) ⬇️ Carriedforward from 361b805
kind-e2e-tests 47.69% <80.12%> (+0.59%) ⬆️
unit-tests 59.84% <95.94%> (+0.06%) ⬆️ Carriedforward from 361b805

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
pkg/features/antrea_features.go 100.00% <ø> (ø)
pkg/agent/proxy/proxier.go 76.48% <91.30%> (+0.40%) ⬆️
pkg/agent/proxy/topology.go 99.04% <100.00%> (+17.22%) ⬆️
...nt/apiserver/handlers/serviceexternalip/handler.go 29.62% <0.00%> (-22.23%) ⬇️
pkg/agent/cniserver/ipam/antrea_ipam.go 63.02% <0.00%> (-12.73%) ⬇️
pkg/agent/route/route_linux.go 66.19% <0.00%> (-5.47%) ⬇️
pkg/controller/networkpolicy/tier.go 53.84% <0.00%> (-4.62%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 75.25% <0.00%> (-3.02%) ⬇️
...gent/controller/networkpolicy/status_controller.go 79.16% <0.00%> (-2.50%) ⬇️
.../flowexporter/connections/conntrack_connections.go 81.42% <0.00%> (-2.39%) ⬇️
... and 32 more

@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch from 2182b34 to 9becf81 Compare February 8, 2023 03:11
@hongliangl hongliangl added area/proxy Issues or PRs related to proxy functions in Antrea action/release-note Indicates a PR that should be included in release notes. labels Feb 9, 2023
@hongliangl hongliangl added this to the Antrea v1.11 release milestone Feb 9, 2023
@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch from 9becf81 to a030f1c Compare February 17, 2023 00:24
docs/feature-gates.md Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch from a030f1c to 0a54517 Compare February 20, 2023 07:38
docs/feature-gates.md Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch 2 times, most recently from a565f69 to c9fe386 Compare February 23, 2023 03:04
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits.

pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved
pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved
pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch 2 times, most recently from 9dca8c6 to 361b805 Compare February 24, 2023 00:28
@hongliangl
Copy link
Contributor Author

/test-all

@hongliangl
Copy link
Contributor Author

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

Comment on lines 472 to 476
if internalPolicyLocal {
endpoints = localEndpoints
} else {
endpoints = clusterEndpoints
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't endpoints just allReachableEndpoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterEndpoints may contain remote Endpoints that aren't in localEndpoints, while localEndpoints may contain
terminating or topologically-unavailable local endpoints that aren't in clusterEndpoints. So we have to merge
the two lists, that's allReachableEndpoints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how it's related to this code, and

  1. When the Service is ClusterIP, if the internal policy is local, why allReachableEndpoints ever contains non local endpoints? if the internal policy is Cluster, why allReachableEndpoints doesn't contain all cluster endpoints?
  2. If allReachableEndpoints is not the parameter of the single InstallServiceGroup call, why we install use it as the parameter of InstallEndpointFlows? If there are different endpoints, what are they installed for?

Comment on lines 491 to 495
if bothPolicyLocal {
endpoints = localEndpoints
} else {
endpoints = clusterEndpoints
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


### ProxyTerminatingEndpoints

`ProxyTerminatingEndpoints` enables ProxyTerminatingEndpoints support in AntreaProxy. When ProxyTerminatingEndpoints is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if the support is controlled by feature gate and disabled by default, could Antrea pass latest K8s conformance test? I remember you mentioned latest conformance requires some EndpointSlice related feature to be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll verify that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified that with Zhengsheng:

  • It is not conformance test, it is sig-network test.
  • For sig-network test in Kubernetes 1.26, EndpointSliceTerminatingCondition is GA, and terminating Endpoints will be included in EndpointSlice anyway, then we need to filter out terminating Endpoints when ProxyTerminatingEndpoints is not enabled. In current code, we don't have such code to filter out terminating Endpoints when EndpointSlice is enabled.

Copy link
Member

@tnqn tnqn Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if we merge it as is, the test case won't pass unless the ProxyTerminatingEndpoints feature is enabled?
If so, I feel not really necessary to add this feature gate, the code isolated by the feature gate is very few and no real risk, and proxy terminating endpoints has been proved an acceptable behavior in kube-proxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jianjuns for input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to pass corresponding sig-network tests about ProxyTerminatingEndpoints, we need to enable both this feature and proxyAll.

Why? If proxyAll is not running, kube-proxy must be running, right? How could kube-proxy fail the test?

I think we might need a feature gate since like TopologyAwareHints in AntreaProxy has a feature gateway, ProxyTerminatingEndpoints should be consistent with that.

I don't feel such consistency is meaningful. We do have many small features that are added directly, like you did in #2792. The rule to add a feature gate should be related to the code/feature's risk and maturity, instead of whether there is one in K8s.

Copy link
Contributor Author

@hongliangl hongliangl Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? If proxyAll is not running, kube-proxy must be running, right? How could kube-proxy fail the test?

Sorry, I missed something: as you said, the tests will always pass with kube-proxy. The code of ProxyTerminatingEndpoints in AntreaProxy takes effect when proxyAll is enabled and no kube-proxy.

I don't feel such consistency is meaningful. We do have many small features that are added directly, like you did in #2792. The rule to add a feature gate should be related to the code/feature's risk and maturity, instead of whether there is one in K8s.

If so, do you think we could also remove the feature gate TopologyAwareHints in Antrea?

Copy link
Member

@tnqn tnqn Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there should be no problem if we just remove ProxyTerminatingEndpoints featuregate regardless of the configuration?

I didn't evaluate the risk of TopologyAwareHints but at least it's not worth to change what has been added, which would just cause more confusion. We should consider promoting it to Beta and GA instead according to it's maturity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have anymore problem if we just remove ProxyTerminatingEndpoints. Then with proxyAll enabled and without kube-proxy, AntreaProxy can pass the follow sig-network test cases:

[It] [sig-network] Services should fallback to terminating endpoints when there are no ready endpoints with internalTrafficPolicy=Cluster [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fallback to local terminating endpoints when there are no ready endpoints with internalTrafficPolicy=Local [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fallback to terminating endpoints when there are no ready endpoints with externallTrafficPolicy=Cluster [Feature:ProxyTerminatingEndpoints]
[It] [sig-network] Services should fail health check node port if there are only terminating endpoints [Feature:ProxyTerminatingEndpoints]

The test that Antrea cannot pass is:

[It] [sig-network] Services should fallback to local terminating endpoints when there are no ready endpoints with externalTrafficPolicy=Local [Feature:ProxyTerminatingEndpoints]

The root cause is that AntreaProxy doesn't select any remote Endpoints as backend when externalTrafficPolicy is Local (in this test, no local Endpoint is available in a sub test), while kube-proxy could select remote Endpoints as backend when externalTrafficPolicy is Local and client is from in-cluster. I'll add another PR to support this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the featuregate.

@hongliangl hongliangl force-pushed the 20230118-endpointslice-enhance-v2 branch from 8c2d9a3 to d4955ed Compare March 2, 2023 08:39
@tnqn
Copy link
Member

tnqn commented Mar 2, 2023

/test-all

@hongliangl
Copy link
Contributor Author

e2e failed on known flaky test.

@tnqn
Copy link
Member

tnqn commented Mar 2, 2023

e2e failed on known flaky test.

No, that's a new error. I think it's introduced by #4654. The TrafficControl port was not deleted somehow, and antrea-agent failed to set no flood for it after restart.

I meant the one in https://github.com/antrea-io/antrea/actions/runs/4312092832/jobs/7522929072

@tnqn
Copy link
Member

tnqn commented Mar 2, 2023

Anyway it's not related to this PR. /skip-e2e

@tnqn tnqn merged commit 6339388 into antrea-io:main Mar 2, 2023
@hongliangl hongliangl deleted the 20230118-endpointslice-enhance-v2 branch March 3, 2023 01:54
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/proxy Issues or PRs related to proxy functions in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants