-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
service topologies: add install flag for EndpointSlices #4740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far!
charts/linkerd2/values.yaml
Outdated
# enables the use of EndpointSlice informers for the destination service; | ||
# EndpointSliceMode should be set to true only if EndpointSlice feature gate is on; | ||
# the feature is still experimental. | ||
enableEndpointSliceMode: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Mode" feels a bit extraneous. what do you think about just "enableEndpointSlices"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it reads much better! :)
cli/cmd/install.go
Outdated
@@ -49,6 +49,7 @@ type ( | |||
disableH2Upgrade bool | |||
disableHeartbeat bool | |||
cniEnabled bool | |||
endpointSliceModeEnabled bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a validation that causes linkerd install
to error out if endpointSlices are enabled but the resource type doesn't exist on the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I have added it in. I ended up creating a different function to test it, since we have to initialise a k8s client to check if the resource type exists. The only point of concern is if I make the validation in the right place -- let me know if it should be done in a different place/phase.
@@ -479,7 +481,7 @@ func (sp *servicePublisher) subscribe(srcPort Port, hostname string, listener En | |||
} | |||
port, ok := sp.ports[key] | |||
if !ok { | |||
port = sp.newPortPublisher(srcPort, hostname) | |||
port = sp.newPortPublisher(srcPort, hostname, enableEndpointSlices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I'd structure this is that I'd make enableEndpointsSlices
a field on servicePublisher and on portPublisher. That way we don't need to pass this boolean through the subscribe
methods. Instead, when calling the new
methods, we can pass in the enableEndpointsSlices
from the parent struct.
@@ -75,11 +67,30 @@ func Main(args []string) { | |||
} | |||
} | |||
|
|||
enableEndpointSlices := global.GetEndpointSliceEnabled() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call EndpointSliceAccess
and error out eagerly if enableEndpointsSlices
is set but we don't have access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've resolved this, after we initialise the K8s.API if enableEndpointSlices=true
but the resource is not registered we have a call to log.Fatal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to move the EndpointSliceAccess check up to before initializing the k8s API. If EndpointSlices are not present on the cluster then initializing the k8s API will hang at "Waiting for caches to sync" instead of getting to the fatal log line.
pkg/k8s/authz.go
Outdated
@@ -109,7 +109,11 @@ func ServiceProfilesAccess(k8sClient kubernetes.Interface) error { | |||
// access to EndpointSlice resources. | |||
//TODO: Uncomment function and change return type once EndpointSlices | |||
// are supported and made opt-in through install flag | |||
func EndpointSliceAccess(k8sClient kubernetes.Interface) bool { | |||
func EndpointSliceAccess(k8sClient kubernetes.Interface, enableEndpointSlice bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave the second parameter off of this method. This should just check if we have access.
@@ -149,7 +151,7 @@ func NewEndpointsWatcher(k8sAPI *k8s.API, log *logging.Entry) *EndpointsWatcher | |||
UpdateFunc: func(_, obj interface{}) { ew.addService(obj) }, | |||
}) | |||
|
|||
if !pkgK8s.EndpointSliceAccess(k8sAPI.Client) { | |||
if !pkgK8s.EndpointSliceAccess(k8sAPI.Client, ew.enableEndpointSlices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we check for access in the main method as I suggest below, we can remove all of these checks in this file and instead treat the ew.enableEndpointSlices
as authoritative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing suggestion, it reads much better and makes matters less complicated. It also simplifies writing tests; I have uncommented and split up the unit test cases for EndpointSlices since they are no longer coupled to EndpointSliceAccess
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good! but I was wondering why not have a cli flag for destination service to enable this feature? and we can have a helm condition to toggle the same in the charts.
Was thinking that this would make it easier to make the feature enabled by default at destination, once the feature is enabled by default in k8s. WDYT?
I like @Pothulapati's approach, and I think he is right; it will be easier to ensure backwards compatibility when The only downside that I can think of at this point in time is having to wire up each component to use the |
A flag to the destination controller was my first thought too! But I think the current approach where the destination controller reads the value out of the configmap has some advantages:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good 👍
Please add the enableEndpointSlices
entry into charts/linkerd2/templates/_config.tpl
so that it gets picked up when installing through Helm 😉
cli/cmd/install.go
Outdated
if err = validateEndpointSlicesFeature(options.enableEndpointSlices); err != nil { | ||
return nil, nil, errors.New("--enableEndpointSlice enabled but resource non-existent in cluster") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hiding the error returned by validateEndpointSlicesFeature()
, which might not necessarily be that the resource is non-existent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alpeb thanks for the comments! I haven't used helm much so the charts/linkerd2/templates/_config.tpl
file escaped me! Will run some manual tests with helm installations to make sure everything works as intended 👍
For validateEndpointSlicesFeature()
I have changed the returned error by chaining --enableEndpointSlice true not supported:
with whatever error validateEndpointSlicesFeature
returns. I could also take another crack at validateEndpointSlicesFeature()
and return some more customised errors instead of chaining them if you think that'd be a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Matei207 chaining errors like that works for me 👍
Update: after a few failed integration tests, @alpeb kindly pointed out to me that there was a missing comma in the configmap which made installing/updating through helm error out. Also, as a result of of the Update 2: added another event message to the list of exclusions; seems to be a known issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I haven't tested this manually yet but the code looks great (one small style suggestion). Since this flag is defaulted to false I think this is safe to merge. I'm looking forward to doing some end-to-end testing of this on main.
cli/cmd/install.go
Outdated
@@ -690,6 +699,19 @@ func (options *installOptions) validate() error { | |||
return nil | |||
} | |||
|
|||
func validateEndpointSlicesFeature(enableEndpointSlices bool) error { | |||
if !enableEndpointSlices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally I prefer to let the calling code control things like directly rather than passing instructions into the function. In this case, I'd remove the enableEndpointsSlices argument and instead call this function like:
if enableEndpointSlices {
if err := validateEndpointsSlicesFeature(); err != nil {
return ....
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why that's a better choice :) Thanks for signalling this, I've changed the code, squashed the commits and pushed.
…ndpointslices. Fix some helm template issues Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Matei207 , this looks good to me 👍
You told me in slack that you still have to correct something. If that's the case, could you also add a space after the colon when concatenating the error, to avoid Error: --enableEndpointSlice=true not supported:EndpointSlice not supported
?
EndpointSlices have been made opt-in due to their experimental nature. This PR introduces a new install flag 'enableEndpointSlices' that will allow adopters to specify in their cli install or helm install step whether they would like to use endpointslices as a resource in the destination service, instead of the endpoints k8s resource. Signed-off-by: Matei David <[email protected]>
EndpointSlices have been made opt-in due to their experimental nature. This PR introduces a new install flag 'enableEndpointSlices' that will allow adopters to specify in their cli install or helm install step whether they would like to use endpointslices as a resource in the destination service, instead of the endpoints k8s resource. Signed-off-by: Matei David <[email protected]> Signed-off-by: Eric Solomon <[email protected]>
* Small PR that uncomments the `EndpointSliceAcess` method and cleans up left over todos in the destination service. * Based on the past three PRs related to `EndpointSlices` (#4663 #4696 #4740); they should now be functional (albeit prone to bugs) and ready to use. Signed-off-by: Matei David <[email protected]>
Following the conclusion of EndpointSlice PR
How
What: Make EndpointSlices opt-in through helm values/cli install values.
Helm: update the values file to include a global
enableEndpointSliceMode
value, along with comments explaining both what the value is for, and that slices are still "experimental". I have also added a conditional in the destination rbac template, so if slices are not enabled rbac is not rendered -- my rationale here is that we shouldn't have any slice specific resources created if they are not going to be in use.CLI: first instance was to add the flag and the values so it can be parsed. Another thing that I did for the cli was to add the boolean in the config map (more reasoning for this in a later section). The
enableEndpointSliceMode
flag takes aftercniEnabled
andsmiMetrics.Enabled
.Destination Service: the approach I went with is storing the flag value in the configmap, each component mounts the config in a volume -- when the destination service is started up, we pass the flag value to
NewEndpointsWatcher
. This way, only theEndpointsWatcher
part of the destination service will know of the flag and condition existence. Unfortunately, this also meant that I had to put the value in the struct, and add it as an argument to a few functions so it can be passed down the call stack.Tests: I have not added any unit tests yet for the flag. One more important point of consideration is the changes in tests for the destination service since
EndpointsWatcher
now needs theEndpointSlice
boolean when created, I had to change the tests to reflect this. Tests that had nothing to do with slices had the watcher initialised with a false, while tests that make use of slices have it set to true. Note that setting just theendpointsSliceEnabled
to true will not mean endpoint slices can be used, slices, in order to be used, have to:Approaches considered
There are a few ways to do this. I came up with three approaches and then asked @alpeb to give me his opinion, given his extensive experience with flags and the cli.
Approach 1: my initial idea was to use the
linkerd-configmap
inEndpointSliceAccess
, it looks a lot like the approach I ended up using. Initially, I considered to pass down the controller namespace, in theEndpointSliceAccess
function I would have fetched the configmap, check the value and return early in caseEndpointSlices
are disabled. This is an infeasible solution though; I found getting the controller namespace passed to the function is quite hard -- ultimately the solution would have looked like what it does now.Approach 2: the second approach I thought of was to not use a configmap and instead rely on rbac. In
EndpointSliceAccess
I would pass a (hardcoded) component name, since rbac is not namespaced and we know the name ahead of time we can hard code the values and get the rbac for our component throughclient-go
. If the rbac does not include permissions for our component to list/get slices, then we can't use them. The downside to this is that it looks a bit hacky in nature, also, it would mean every time we want to use slices somewhere we have to change that components rbac meaning more PRs and a bit of tech debt (?).Approach 3: use a global value, read the boolean from the configmap, set the global value to the boolean in the
EndpointsWatcher
. Passed on this because global state is not exciting.Feedback and notes:
Signed-off-by: Matei David [email protected]