Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 4770: EndpointSlice Controller Flexibility #4771

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

LionelJouin
Copy link

  • One-line PR description: KEP-4770: Create KEP document (Generic reconciler and disabling controller on particular services)
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 19, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @LionelJouin!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jul 19, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @LionelJouin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 19, 2024

## Summary

This proposal adds a new well-known label `service.kubernetes.io/endpoint-controller-name` to Kubernetes Services. This label disables the default Kubernetes EndpointSlice controller for the services where this label is applied and delegates the control of EndpointSlices to a custom EndpointSlice controller.
Copy link
Member

@aojea aojea Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the label managed-by already on the endpointslice controller and the ipaddress, I think we should be consistent and use the same here

rep -r managed-by staging/src/k8s.io/api
staging/src/k8s.io/api/discovery/v1/well_known_labels.go:       LabelManagedBy = "endpointslice.kubernetes.io/managed-by"
staging/src/k8s.io/api/discovery/v1beta1/well_known_labels.go:  LabelManagedBy = "endpointslice.kubernetes.io/managed-by"
staging/src/k8s.io/api/networking/v1alpha1/well_known_labels.go:        LabelManagedBy = "ipaddress.kubernetes.io/managed-by"
staging/src/k8s.io/api/networking/v1beta1/well_known_labels.go: LabelManagedBy = "ipaddress.kubernetes.io/managed-by"

Also, should we add this label to each Service (if it already does not have the label set) to indicate that is managed by the default endpoint slice controller? I kind of think this will be better

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any name is fine to me. For reference, the service.kubernetes.io/service-proxy-name is not added to the service. As highlighted in #4771 (comment), a default controller name would have to be handled for the kubernetes controller manager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I added this comment just for opening the debate, I'd like to know if there are more thoughts on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think endpointslice.kubernetes.io/managed-by is probably fitting here. I'm not entirely sure whether we want this to apply to Endpoints as well though, if so, that could change what's best here.


This proposal adds a new well-known label `service.kubernetes.io/endpoint-controller-name` to Kubernetes Services. This label disables the default Kubernetes EndpointSlice controller for the services where this label is applied and delegates the control of EndpointSlices to a custom EndpointSlice controller.

Additionally, this KEP aims to give more flexibility for the EndpointSlice Reconciler module to allow users to reconcile EndpointSlices with any type of resources (Currently only Service/Pods is supported). For example, Endpoints could be supported for the Kubernetes EndpointSlice Mirroring Controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I can not really understand it, maybe is explained later ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, "Endpoints could be supported for the Kubernetes EndpointSlice Mirroring Controller" is particularly confusing to me.


### Well-Known Label

The kube-controller-manager should pass to the Endpoints, EndpointSlice and EndpointSlice Mirroring Controllers an informer selecting services that are not labelled with `service.kubernetes.io/endpoint-controller-name`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/kubernetes/enhancements/pull/4771/files#r1685502488 , I think there is 2 things to flesh out here:

  1. label name
  2. default behavior, do we add a label with the default controller name or we assume no-label means default controller

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, kube-proxy has no default controller name, non-labelled services are managed by kube-proxy.

Copy link
Member

@aojea aojea Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah,, but those are decision from the past, if we have to do it again, will we do it the same?
I don't have much feedback from this feature beside using it to workaround cilium conformance tests :)
I think that least we should evaluate the different options and see pros and cons and decide

@aojea
Copy link
Member

aojea commented Jul 20, 2024

did a first pass, this KEP is missing some more explanation on the high level details, is too low level, it describe perfectly the changes you want to do in the reconciler logic but is not clear the motivations and how is this planned to be used, some more high level information for the reviewers to understand the changes will help ... an example of how these externals controllers plan to use it will be interesting .... an example of how an external endpoint slice controller consuming this library will be used per example

@LionelJouin
Copy link
Author

did a first pass, this KEP is missing some more explanation on the high level details, is too low level, it describe perfectly the changes you want to do in the reconciler logic but is not clear the motivations and how is this planned to be used, some more high level information for the reviewers to understand the changes will help ... an example of how these externals controllers plan to use it will be interesting .... an example of how an external endpoint slice controller consuming this library will be used per example

Thank you for the review. Should I only add high level overview, or should I remove too low level details too? Do we have to provide a way to use the EndpointSlice Reconciler for external EndpointSlice Controller consumers? The example for the usage in the EndpointSlice and EndpointSlice Mirroring Controllers are there already, I was thinking maybe those are sufficient. Also, the usage might change in the future as written in the EndpointSlice Reconciler readme: There are NO compatibility guarantees for this repository, yet. It is in direct support of Kubernetes, so branches will track Kubernetes and be compatible with that repo..

@aojea
Copy link
Member

aojea commented Jul 23, 2024

I think is fine to keep it the way you have it, just expand , I'm trying to highlight that without understanding the motivations and the expectations of what your proposing is hard to fully understand all the details

@aojea
Copy link
Member

aojea commented Jul 23, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 8, 2024
@LionelJouin
Copy link
Author

As discussed during the sig-network meeting on 1st August 2024 (Kubernetes SIG Network meeting 20240801), I removed the part making the EndpointSlice Reconciler generic. This could still be re-introduced later as part of a future improvement of this KEP or via a completely new KEP. For reference, the commits containing the description of these changes are still available. A large part of the implementation is also available here: kubernetes/kubernetes#126304

Otherwise the part about the label, the question is about whether or not to keep the possibility to add/remove/update the label without re-creating the service. To me, keeping the mutability of the label simplifies the implementation as the EndpointSlice Controller would consider the service as deleted if the label is added (as in the implementation we should change the informer passed to the controller to not select labeled services). So, the deletion of the EndpointSlices should already be covered by other tests (Service creation/deletion).

For the use-cases on the mutability of the label, we could say the service must be switched from a controller to another, to, for example, switch between network providers (Multus to Multi-Network), but I am not sure if this will really be used, so I haven't added it.

Another argument about keeping the mutability was that it will probably be used together with service.kubernetes.io/service-proxy-name, and this label can be mutated.


As of now, a service can be delegated to a custom Service-Proxy/Gateway if the label `service.kubernetes.io/service-proxy-name` is set. Introduced in [KEP-2447](https://github.com/kubernetes/enhancements/issues/2447), this allows custom Service-Proxies/Gateways to implement services in different ways to address different purposes / use-cases. However, the EndpointSlices attached to this service will still be reconciled in the same way as any other service. Addressing more purposes / use-cases, for example, different pod IP addresses, is therefore not natively possible.

Delegating EndpointSlice control would allow custom controllers to define their own criteria for pod availability, selecting different pod IPs than the pod.status.PodIPs and more. As a reference implementation, and since [KEP-3685](https://github.com/kubernetes/enhancements/issues/3685), the reconciler logic used by Kubernetes can be reused by custom EndpointSlice controllers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should explain what KEP-3685 is rather than just referring to it by number, but I think this is mostly talking about the withdrawn plan to extend the existing reconciler anyway, so probably it should have been deleted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added the short description of the KEP. No, I think this is an additional argument as an implementation is required when using the label, so the EndpointSlice Reconciler in staging could be used as reference.

* Change / Replace / Deprecate the existing behavior of the Kubernetes EndpointSlice, EndpointSlice Mirroring and Endpoints controllers.
* Introduce additional supported types of the EndpointSlice controllers/Reconciler as part of Kubernetes.
* Modify the Service / EndpointSlice / Endpoints Specs.
* Add new features to the EndpointSlice Mirroring Controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful as a clarification to add "Changing the behavior of kube-proxy" as a Non-Goal: Kube-proxy doesn't need to care who creates the EndpointSlices for a Service or what logic they use to do so. It just knows that, unless the service has a service-proxy-name label indicating otherwise, it should just create rules pointing to its endpoints. Right?

And likewise cloud load balancers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added it. Yes, this applies to any service (described in the next section: "When set on a service, no matter the service specs...")


`service.kubernetes.io/endpoint-controller-name` will be added as a well-known label applying on the Service object. When set on a service, no matter the service specs, the Endpoint, EndpointSlice, and EndpointSlice Mirroring controllers for that service will be disabled, thus Endpoints and EndpointSlices for this service will not be created by the Kubernetes Controller Manager. If the label is not set, the Endpoint, EndpointSlice, and EndpointSlice Mirroring controllers will be enabled for that service and the Endpoints and EndpointSlices will be handled as of today.

The Kubernetes Controller Manager will implement this label both at object creation and on dynamic addition/removal/updates of this label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Kubernetes Controller Manager will implement this label both at object creation and on dynamic addition/removal/updates of this label.
The endpoint-related controllers will obey this label both at object creation and on dynamic addition/removal/updates of this label.

Comment on lines 101 to 103
As a Cloud Native Network Function (CNF) vendor, some of my services are handled by custom Service-Proxies/Gateways over secondary networks provided by, for example, [Multus](https://github.com/k8snetworkplumbingwg/multus-cni). IPs configured in the service and registered by the EndpointSlice controller must be only the secondary IPs provided by the secondary network provider (e.g. [Multus](https://github.com/k8snetworkplumbingwg/multus-cni)).

Therefore, it must be possible to disable the default Kubernetes Endpoints and EndpointSlice Controller for certain services and use a specialized EndpointSlice reconciler implementation to create a controller for secondary network providers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear where you're using "service" to mean v1.Service and where you're using it in a generic sense...

How is the Service configured? Is it type: ClusterIP? Does it have the service-proxy-name label set?

Who is calling the Service and from where? (Pods that share the same secondary network? Pods that don't share the same secondary network? Nodes? Cluster-external hosts?) What is the expectation for if the "wrong" people try to connect to it?

(You have both "for example, [Multus]" and "e.g. [Multus]" there btw. Second one can probably go away.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always meant the Kubernetes service, I added it for clarity, thank you.
I also added the mention to the service-proxy label. But I think the rest of the configuration of the service doesn't really matter, it could be both internal service traffic or traffic originating from outside of the cluster. I am not sure we should try to fully describe secondary networks and services here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the rest of the configuration of the service doesn't really matter

Ideally, user stories should help the reader understand why you made some decisions and not others. But they can only do that if the user stories are sufficiently clear and evocative.

(And you especially don't want your user stories to be making the reviewers think about things like "but how will type: LoadBalancer services work if the load balancer doesn't have access to the secondary network?")

Comment on lines +105 to +107
### Notes/Constraints/Caveats (Optional)

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if it's "N/A" you can just remove the section, since it's "(Optional)")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove them later, if anyone has any concern that some are in fact applicable.


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these aren't N/A.

e.g., if a cluster is making use of this feature, and you roll back to a version of kcm that doesn't know about the label, then that kcm will start overwriting your custom endpoints. So that's something we need to warn users about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it for disabling and re-enabling the feature, I will have to check the testing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test in the integration tests for when the feature gate is enabled and disabled.


### Well-Known Label

The kube-controller-manager will pass to the Endpoints, EndpointSlice and EndpointSlice Mirroring Controllers an informer selecting services that are not labeled with `service.kubernetes.io/endpoint-controller-name`. Thus, if the label is added to an existing service (by updating the service), the service with the label will be considered as a deleted service for the controllers, and the Endpoints and EndpointSlices will be deleted. If a Service is created with the label, the controllers will not be informed about it, so the Endpoints and EndpointSlices will not be created. If the label is removed from an existing service (by updating the service), the service with the label will be considered as a newly created service for the controllers, and the Endpoints and EndpointSlices will be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott @danwinship @thockin The mutability here is a friction point, as this is "the same" as removing the selector on a Service , something we didn't reach a consensus yet and this is defining a behavior that is inconsistent with current state kubernetes/kubernetes#118376.

I'm leaning more to avoid to deal with Updates on Services, specially because I can not understand why one Service with one selector managed by the cluster endpoint slice controller will switch to an external one slice controller, it seems that should be better handled by deleting and recreating the Service.

This will be straight forward to implement, just validating on the Service API that this special label can not be added or mutated after the Service has been created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the annotation is mutable (which it has to be) then we have to deal with changes to it. Even if kcm tries to ignore changes, what happens if you restart kcm after changing the annotation?

I feel like 118376 was mostly waiting for Rob to bless a solution, and then we all stopped thinking about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I miss something. To me, 118376 is different issue than the one describe in this KEP.

Here is how I see the implementation for this KEP: The service informer passed to the EndpointSlice Controller will have a selector to not include the services that are labeled with service.kubernetes.io/endpoint-controller-name. This means the controller will consider adding the label as if the service has been removed (which should already be covered as of now: when you remove a service, the EndpointSlices for it are removed).

If KCM is restarting, then the EndpointSlice controller will see the previous EndpointSlice(s) and enqueue the service associated (service name is in the EndpointSlice annotation). The service will be considered as non-existing (as the informer is not selecting labeled service), so the EndpointSlice(s) will be deleted.

Here is the implementation: kubernetes/kubernetes@master...LionelJouin:kubernetes:custom-endpointslice-controller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similarity between this and 118376 is that in both cases someone (potentially) has a Service with endpoints that have been generated in a particular way, and then later they want to change the Service so that the endpoints will be generated in a different way. Antonio was suggesting that if we couldn't figure out a solution for 118376, then we won't be able to come up with a solution for this either (and so we should forbid the case where you change things after the endpoints have already been generated), and I was disputing that we couldn't figure out a solution for 118376.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this comment: kubernetes/kubernetes#103576 (comment). The behavior that I see with this KEP is different. To me, the first interpretation seems more logical for this KEP, so if we add the label, controllers are disabled for the service, and objects (Endpoints and EndpointSlices) previously created by the KCM are deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the annotation is mutable (which it has to be)

I think you mean label instead of annotation ... why it has to be mutable?

if we add the label, controllers are disabled for the service, and objects (Endpoints and EndpointSlices) previously created by the KCM are deleted

Here is the implementation: kubernetes/kubernetes@master...LionelJouin:kubernetes:custom-endpointslice-controller

I see, so this is using this property of the informers so the controller "sees" a Delete event when the label is added, with 118376 the problem is that it "sees" an Update event ... I can not think of any edge cases from the top of my head, I think it may work ...

@robscott WDYT? maybe should we modify the controllers to process the selector removal as a delete event too?

Signed-off-by: Lionel Jouin <[email protected]>
# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: EndpointControllerNameWellKnownLabel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest another name, ExternalEndpointSlicesController or similar

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to ExternalEndpointController (to be aligned with the label service.kubernetes.io/endpoint-controller-name)

- Initial unit, integration and e2e tests completed and enabled.

#### Beta

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has to be some e2e conformance of these out of tree controllers, since is an external controller adding an endpointslice, how do we validate these out of tree controllers are generating the endpointslices accordingly ? what if these controllers does not respect the selectors and generete the endpointslices anyway?

@robscott I really like to hear your thoughts on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting thought. I'd assumed that the purpose of this KEP was to build controllers that would intentionally have different behavior than the default in-tree EndpointSlice controller. In some cases that might be to select a different IP for a Pod, but in other cases that might be to select an entirely different type of resource, for example a custom backend type.

This does certainly make it more likely that unexpected variations of EndpointSlices will exist, which is certainly something to watch out for. Since the outset of the EndpointSlice API anyone has been able to create EndpointSlices, and sometimes unexpected variations have led to bugs in controllers consuming those EndpointSlices, but I don't think that's necessarily new with this KEP, just potentially a more common set of edge cases that we'll need to be sure that controller authors are aware of.

For example, nodeName is optional in EndpointSlices, but some controllers likely assume it exists because the EndpointSlice controllers in k/k generally populate it. The same is largely true for zone and hostname (see podToEndpoint for a reference).

I think this risk is unavoidable with this KEP, so I'd like to see it called out + a corresponding commitment to improved documentation around this as a requirement for graduating past alpha. Unfortunately I can't think of a way to write conformance tests that would help us here though.

/cc @swetharepakula @gauravkghildiyal

@aojea
Copy link
Member

aojea commented Aug 21, 2024

overall this LGTM, there may be some edges to flesh out, but this feature looks clear and scoped now

@maiqueb
Copy link

maiqueb commented Aug 22, 2024

/cc

@k8s-ci-robot
Copy link
Contributor

@maiqueb: GitHub didn't allow me to request PR reviews from the following users: maiqueb.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@LionelJouin
Copy link
Author

Thank you @aojea for the review. Some of your comments might be resolved, but I am not sure, could you mark them as resolved if they are, so it's more clear what I still need to answer and fix:

Rename feature-gate from EndpointControllerNameWellKnownLabel to ExternalEndpointController

Signed-off-by: Lionel Jouin <[email protected]>
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on this KEP, because I don't see why it is better than the existing mechanism (empty selector).

spec: {}
```

This alternative potentially leads to confusion among users and inconsistency in how services are managed as each implementation is using its own annotation (see the nokia/danm and projectcalico/vpp-dataplane examples), leading to a fragmented approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction to this KEP is that we should not do it. This alternative is better for a bunch of reasons, but mostly because it already exists.

We have a way of expressing "don't generate endpoints for me" - leave the selector blank. Fortunately for you, it means exactly what you want.

The argument of inconsistency across controllers doesn't hold a lot of water for me. If you specify a pod selector, you're going to get pods as endpoints. If you don't specify it, you get something else. IMO, being able to specify that something else in your own schema is an advantage, not "fragmentation". Suppose you need an expression that is more flexible than the legacy selector-map (e.g. modern selectors can use "In" and "NotIn" and other operators).

This is not an accidental side-effect that I am trying to shoe-horn you into, it's literally why this mechanism was designed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty selector feature exists to cover use-cases that are not covered by the existing selector, i.e. selecting backends that are not selectable via labels. For example (from the Service documentation), selecting backends in a different namespace/cluster, selecting backends that are running outside of a Kubernetes, or as you mentioned, selecting backends with a more flexible expression covered with modern selectors. In these cases I agree the empty selector must be used and the users must define their own way to select backends, via, for example, annotations.

I realized that the goals and non-goals are not clear enough, so I fixed them in the latest commit. The goal is to disable the Kubernetes EndpointSlice controller and allow the user to re-use the selector field from the Service Spec. Custom EndpointSlice Controller should be able to replicate the usage of the selector, by selecting workloads by labels (labels specified in the selector field) but the difference with the Kubernetes EndpointSlice Controller is the compilation of these EndpointSlices, in this KEP the main use case is to select different IPs than the Pod.Status.PodIPs.

A well-known label would also provide an explicit and standard way to inform which component is managing the EndpointSlices for a particular service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still struggling with this one. I spoke with Rob a little, since he seemed "pro". The argument that carried the most weight for me was that there are (or might be?) cases where the alternative controller is doing exactly what the "real" controller does but extracting some different information (e.g. the multi-network case). In that case, you have to specify the network(s) somewhere right? Another example, albeit hypothetical, was something doing different topology hinting.

I'm not sure how I feel about this, still. It seems "simple" to implement, but I am worried about the tail of side-effects.

As a thought experiment - what if you could specify a function, e.g. CEL, which took a pod as argument and returned a list of IPs to use -- would that eliminate the need for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct, multi-network is my primary use case, I have personally no plan, at least for now, to do something different such as topology hinting. But, multi-network also means more than just picking the IP for the pods, it also means the network must be attached and up for the selected pods.

Yes, I need to specify which network(s) must be used for the selected pods. Having this label (service.kubernetes.io/endpoint-controller-name) will give me freedom to choose the way to specify those networks. For my case, I am currently thinking of using Services together with Gateway API (I made a PoC about it: https://github.com/LionelJouin/l-3-4-gateway-api-poc). Gateway in Gateway API allows me to specify new Service-Proxies (Secondary Network Service (load-balancing) acts directly in the Gateway instance) and attach networks via the Gateway.Spec.Infrastructure field, this way I can also specify which networks must be selected. Here is an example on how the Service/Gateway could look like: https://github.com/LionelJouin/l-3-4-gateway-api-poc/blob/main/examples/kpng-gateway-api.yaml#L59-L62.

If I understand correctly, CEL would select a specific field based on what the user would specify via a new field in the Service. I think this would be more complex to maintain, more difficult to use, and less flexible. As of today, the most used solution is Multus, the secondary networks are attached and reported via pod annotation in a json format, I guess it would be quite challenging to match the IPs for a particular network. This could have worked with the Multi-Network KEP changes since the Secondary Network IPs would have been reported in Pod.Status. Now the discussion for multi-network is about DRA and ResourceClaim.Status to report the Secondary Network IPs, I am not sure how CEL could fit here.

For reference, at the beginning, another goal for this KEP was to improve the Kubernetes EndpointSlice Reconciler to make it generic enough so it can be re-used for Secondary Network IPs (No matter if Multus, KEP#3698 or DRA is used). This way, the custom EndpointSlice Controllers for secondary networks would be quite easy to implement. I kept it in the commit history (removed in this commit) and I am still planning to propose again these changes somehow in a better way in the future.

Copy link
Member

@aojea aojea Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, at the beginning, another goal for this KEP was to improve the Kubernetes EndpointSlice Reconciler to make it generic enough so it can be re-used for Secondary Network IPs (No matter if Multus, KEP#3698 or DRA is used). This way, the custom EndpointSlice Controllers for secondary networks would be quite easy to implement. I kept it in the commit history (removed in this commit) and I am still planning to propose again these changes somehow in a better way in the future.

This is a big no for me, if someone wants to create EndpointSlices controllers out of tree is ok to share some common types and logic but adding more load to the core maintainers just to remove from the custom endpointslice controllers maintainers is simply unfair.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big no for me, if someone wants to create EndpointSlices controllers out of tree is ok to share some common types and logic but adding more load to the core maintainers just to remove from the custom endpointslice controllers maintainers is simply unfair.

I agree with this, the idea with this part (generic EndpointSlice Reconciler) of the KEP was to solve what is stated in the readme of EndpointSlice Reconciler: This EndpointSlice reconciler library is not sufficiently generic to be used by the EndpointSlice Mirroring controller.... But also keeping what is written for compatibility: There are NO compatibility guarantees for this repository, yet. It is in direct support of Kubernetes, so branches will track Kubernetes and be compatible with that repo..

@thockin thockin self-assigned this Sep 10, 2024
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to proceed? Time is running for KEPs.

@robscott @aojea @danwinship @shaneutt


The existing behavior will be kept by default, and the Kubernetes EndpointSlice Controller will not manage the Services with the label. This ensures services without the label to continue to be managed as usual.

This will have no effect on other EndpointSlice controller implementations since they will not be influenced by the presence of this label.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This counts as a point in favor of "use an empty selector". But also, that just seems broken. :)

spec: {}
```

This alternative potentially leads to confusion among users and inconsistency in how services are managed as each implementation is using its own annotation (see the nokia/danm and projectcalico/vpp-dataplane examples), leading to a fragmented approach.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still struggling with this one. I spoke with Rob a little, since he seemed "pro". The argument that carried the most weight for me was that there are (or might be?) cases where the alternative controller is doing exactly what the "real" controller does but extracting some different information (e.g. the multi-network case). In that case, you have to specify the network(s) somewhere right? Another example, albeit hypothetical, was something doing different topology hinting.

I'm not sure how I feel about this, still. It seems "simple" to implement, but I am worried about the tail of side-effects.

As a thought experiment - what if you could specify a function, e.g. CEL, which took a pod as argument and returned a list of IPs to use -- would that eliminate the need for this?

@shaneutt
Copy link
Member

shaneutt commented Sep 17, 2024

I'm generally in favor of this one, since I see it as door opening for a lot of implementations that don't want to be a complete replacement, and I think this optionality is very good for users at a reasonably low maintenance cost.

/approve
/hold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LionelJouin, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 17, 2024
@thockin
Copy link
Member

thockin commented Sep 17, 2024

Let's hold off on merge until everyone has a minute to think about it :)

@shaneutt shaneutt added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 17, 2024
@aojea
Copy link
Member

aojea commented Sep 17, 2024

I'm not in favor of merging because we are constantly bitten by edge problems with Endpoints and EndpointSlices, more recently kubernetes/kubernetes#127370 , this just adds another permutation to maintain in an already complicated area and there are alternatives, like completely replacing the endpointslice controller on the kube-controller-manager or empty selector.

However, in case we move forward, I expect beta criteria to have at least 2 implementations of these endpointslices controller out of tree in use, adding features is expensive, at least we need to be sure that the cost we are paying is worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants