Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 4770: EndpointSlice Controller Flexibility #4771
base: master
Are you sure you want to change the base?
KEP 4770: EndpointSlice Controller Flexibility #4771
Changes from 6 commits
e8c1bd6
f2b955b
77ffcef
51fd477
77dd0b2
e00d1b0
89a2e54
b7dd3da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? What if you define a Service without a selector and then do your own reconciliation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment, I clarified that it is not natively possible while using the Service Selector field (The KEP is about this, the fact to use the service as close as possible to how a user would define it for the Kubernetes default network).
About this topic, there are details in this conversation: #4771 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if it's "N/A" you can just remove the section, since it's "(Optional)")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove them later, if anyone has any concern that some are in fact applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest risk is that we could introduce significant bugs because this is making significant changes to the code behind a critical GA controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also another risk, controllers that rely on Services that consume the Pods directly instead of the EndpointSlices (cc @bowei @swetharepakula as the ingress-gce does this for NEG) will need to be modified for these services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This counts as a point in favor of "use an empty selector". But also, that just seems broken. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott @danwinship @thockin The mutability here is a friction point, as this is "the same" as removing the selector on a Service , something we didn't reach a consensus yet and this is defining a behavior that is inconsistent with current state kubernetes/kubernetes#118376.
I'm leaning more to avoid to deal with Updates on Services, specially because I can not understand why one Service with one selector managed by the cluster endpoint slice controller will switch to an external one slice controller, it seems that should be better handled by deleting and recreating the Service.
This will be straight forward to implement, just validating on the Service API that this special label can not be added or mutated after the Service has been created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the annotation is mutable (which it has to be) then we have to deal with changes to it. Even if kcm tries to ignore changes, what happens if you restart kcm after changing the annotation?
I feel like 118376 was mostly waiting for Rob to bless a solution, and then we all stopped thinking about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I miss something. To me, 118376 is different issue than the one describe in this KEP.
Here is how I see the implementation for this KEP: The service informer passed to the EndpointSlice Controller will have a selector to not include the services that are labeled with
service.kubernetes.io/endpoint-controller-name
. This means the controller will consider adding the label as if the service has been removed (which should already be covered as of now: when you remove a service, the EndpointSlices for it are removed).If KCM is restarting, then the EndpointSlice controller will see the previous EndpointSlice(s) and enqueue the service associated (service name is in the EndpointSlice annotation). The service will be considered as non-existing (as the informer is not selecting labeled service), so the EndpointSlice(s) will be deleted.
Here is the implementation: kubernetes/kubernetes@master...LionelJouin:kubernetes:custom-endpointslice-controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The similarity between this and 118376 is that in both cases someone (potentially) has a Service with endpoints that have been generated in a particular way, and then later they want to change the Service so that the endpoints will be generated in a different way. Antonio was suggesting that if we couldn't figure out a solution for 118376, then we won't be able to come up with a solution for this either (and so we should forbid the case where you change things after the endpoints have already been generated), and I was disputing that we couldn't figure out a solution for 118376.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this comment: kubernetes/kubernetes#103576 (comment). The behavior that I see with this KEP is different. To me, the first interpretation seems more logical for this KEP, so if we add the label, controllers are disabled for the service, and objects (Endpoints and EndpointSlices) previously created by the KCM are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean label instead of annotation ... why it has to be mutable?
I see, so this is using this property of the informers so the controller "sees" a Delete event when the label is added, with 118376 the problem is that it "sees" an Update event ... I can not think of any edge cases from the top of my head, I think it may work ...
@robscott WDYT? maybe should we modify the controllers to process the selector removal as a delete event too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has to be some e2e conformance of these out of tree controllers, since is an external controller adding an endpointslice, how do we validate these out of tree controllers are generating the endpointslices accordingly ? what if these controllers does not respect the selectors and generete the endpointslices anyway?
@robscott I really like to hear your thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting thought. I'd assumed that the purpose of this KEP was to build controllers that would intentionally have different behavior than the default in-tree EndpointSlice controller. In some cases that might be to select a different IP for a Pod, but in other cases that might be to select an entirely different type of resource, for example a custom backend type.
This does certainly make it more likely that unexpected variations of EndpointSlices will exist, which is certainly something to watch out for. Since the outset of the EndpointSlice API anyone has been able to create EndpointSlices, and sometimes unexpected variations have led to bugs in controllers consuming those EndpointSlices, but I don't think that's necessarily new with this KEP, just potentially a more common set of edge cases that we'll need to be sure that controller authors are aware of.
For example,
nodeName
is optional in EndpointSlices, but some controllers likely assume it exists because the EndpointSlice controllers in k/k generally populate it. The same is largely true for zone and hostname (see podToEndpoint for a reference).I think this risk is unavoidable with this KEP, so I'd like to see it called out + a corresponding commitment to improved documentation around this as a requirement for graduating past alpha. Unfortunately I can't think of a way to write conformance tests that would help us here though.
/cc @swetharepakula @gauravkghildiyal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial reaction to this KEP is that we should not do it. This alternative is better for a bunch of reasons, but mostly because it already exists.
We have a way of expressing "don't generate endpoints for me" - leave the selector blank. Fortunately for you, it means exactly what you want.
The argument of inconsistency across controllers doesn't hold a lot of water for me. If you specify a pod selector, you're going to get pods as endpoints. If you don't specify it, you get something else. IMO, being able to specify that something else in your own schema is an advantage, not "fragmentation". Suppose you need an expression that is more flexible than the legacy selector-map (e.g. modern selectors can use "In" and "NotIn" and other operators).
This is not an accidental side-effect that I am trying to shoe-horn you into, it's literally why this mechanism was designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty selector feature exists to cover use-cases that are not covered by the existing selector, i.e. selecting backends that are not selectable via labels. For example (from the Service documentation), selecting backends in a different namespace/cluster, selecting backends that are running outside of a Kubernetes, or as you mentioned, selecting backends with a more flexible expression covered with modern selectors. In these cases I agree the empty selector must be used and the users must define their own way to select backends, via, for example, annotations.
I realized that the goals and non-goals are not clear enough, so I fixed them in the latest commit. The goal is to disable the Kubernetes EndpointSlice controller and allow the user to re-use the selector field from the Service Spec. Custom EndpointSlice Controller should be able to replicate the usage of the selector, by selecting workloads by labels (labels specified in the selector field) but the difference with the Kubernetes EndpointSlice Controller is the compilation of these EndpointSlices, in this KEP the main use case is to select different IPs than the Pod.Status.PodIPs.
A well-known label would also provide an explicit and standard way to inform which component is managing the EndpointSlices for a particular service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still struggling with this one. I spoke with Rob a little, since he seemed "pro". The argument that carried the most weight for me was that there are (or might be?) cases where the alternative controller is doing exactly what the "real" controller does but extracting some different information (e.g. the multi-network case). In that case, you have to specify the network(s) somewhere right? Another example, albeit hypothetical, was something doing different topology hinting.
I'm not sure how I feel about this, still. It seems "simple" to implement, but I am worried about the tail of side-effects.
As a thought experiment - what if you could specify a function, e.g. CEL, which took a pod as argument and returned a list of IPs to use -- would that eliminate the need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, multi-network is my primary use case, I have personally no plan, at least for now, to do something different such as topology hinting. But, multi-network also means more than just picking the IP for the pods, it also means the network must be attached and up for the selected pods.
Yes, I need to specify which network(s) must be used for the selected pods. Having this label (
service.kubernetes.io/endpoint-controller-name
) will give me freedom to choose the way to specify those networks. For my case, I am currently thinking of using Services together with Gateway API (I made a PoC about it: https://github.com/LionelJouin/l-3-4-gateway-api-poc). Gateway in Gateway API allows me to specify new Service-Proxies (Secondary Network Service (load-balancing) acts directly in the Gateway instance) and attach networks via the Gateway.Spec.Infrastructure field, this way I can also specify which networks must be selected. Here is an example on how the Service/Gateway could look like: https://github.com/LionelJouin/l-3-4-gateway-api-poc/blob/main/examples/kpng-gateway-api.yaml#L59-L62.If I understand correctly, CEL would select a specific field based on what the user would specify via a new field in the Service. I think this would be more complex to maintain, more difficult to use, and less flexible. As of today, the most used solution is Multus, the secondary networks are attached and reported via pod annotation in a json format, I guess it would be quite challenging to match the IPs for a particular network. This could have worked with the Multi-Network KEP changes since the Secondary Network IPs would have been reported in Pod.Status. Now the discussion for multi-network is about DRA and ResourceClaim.Status to report the Secondary Network IPs, I am not sure how CEL could fit here.
For reference, at the beginning, another goal for this KEP was to improve the Kubernetes EndpointSlice Reconciler to make it generic enough so it can be re-used for Secondary Network IPs (No matter if Multus, KEP#3698 or DRA is used). This way, the custom EndpointSlice Controllers for secondary networks would be quite easy to implement. I kept it in the commit history (removed in this commit) and I am still planning to propose again these changes somehow in a better way in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big no for me, if someone wants to create EndpointSlices controllers out of tree is ok to share some common types and logic but adding more load to the core maintainers just to remove from the custom endpointslice controllers maintainers is simply unfair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, the idea with this part (generic EndpointSlice Reconciler) of the KEP was to solve what is stated in the readme of EndpointSlice Reconciler:
This EndpointSlice reconciler library is not sufficiently generic to be used by the EndpointSlice Mirroring controller...
. But also keeping what is written for compatibility:There are NO compatibility guarantees for this repository, yet. It is in direct support of Kubernetes, so branches will track Kubernetes and be compatible with that repo.
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should - I pretty much think must - document the "don't specify a selector, then do your own thing" option. It is an alternative and we shouldn't pretend otherwise.