-
Notifications
You must be signed in to change notification settings - Fork 529
Enable LeaderElect for federation controller #394
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fisherxu If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
charts/federation-v2/values.yaml
Outdated
@@ -2,7 +2,7 @@ | |||
# This is a YAML-formatted file. | |||
# Declare variables to be passed into your templates. | |||
|
|||
## Configuration values for federation v2 controllermanager statefulset. | |||
## Configuration values for federation v2 controllermanager deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #245 (comment) for why we are using sts
here.
/cc @marun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reminding @gyliu513 :) I have checked the comment, and that's in the absence of leader election. We need deploy multi instances with leader election, and to the multi instances of controller, deployment should be fine like kube-controller-manager.
I think this kind of feature deserves some discussion, maybe even a design document, before implementation is proposed. Edit: specifically - I'd like to have the use cases this is intended to target clearly documented and agreed upon before work is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see an e2e test of leader election that ensures the code adding by this PR is exercised and won't rot. I'm happy to assist if you would find that useful. I think that will likely involve adding a minimal 3rd test run alongside managed and managed to ensure test isolation.
cmd/controller-manager/main.go
Outdated
RetryPeriod: retryPeriod, | ||
Callbacks: leaderelection.LeaderCallbacks{ | ||
OnStartedLeading: run, | ||
OnStoppedLeading: func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leader election callback would appear to ensure that a master is started, but how will a master be stopped if it loses the leadership? It appears that a master once started would never be stopped since there is nothing that closes the stop channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When starting leadership, the leaderelection.RunOrDie will create the stop channel, and close it when loss of leadership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests that the stop channel for the non-leader elect path be created in the if !enableLeaderElection
so that it's clear that it won't be used outside of that block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marun Have updated, now we can pass the stopChan through the ctx to the run func. I have checked in my environment, it works fine :) PTAL.
cmd/controller-manager/main.go
Outdated
Callbacks: leaderelection.LeaderCallbacks{ | ||
OnStartedLeading: run, | ||
OnStoppedLeading: func() { | ||
glog.Fatalf("leaderelection lost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On loss of leadership, it seems like it would be sufficient to send on the stop channel passed to run.
@fisherxu Do you intend to follow up on this PR? It's been dormant for over a month. |
@fisherxu Thank you for following up! I'm ok with upgrading to the latest client-go if it's possible. I think you mentioned on slack that upgrading client-go may depend on kubebuilder upgrading? |
@marun I have checked today. the controller-runtime (only used in e2e now) use the v0.1.1 version which is generated by kubebuilder(v1.0.4) , the controller-runtime(v0.1.1) is not compatible with client-go(kubernetes-1.12.0), so can we upgrade it to v0.1.8 manuall as in #530 , although it's generated by kubebuilder(v1.0.4)? If we upgrade the kubebuilder to 1.0.5 or .6, project maybe will have big changes... but kubebuilder(v1.0.4) can only support up to client-go(kubernetes-1.12.0). |
@fisherxu What changes to the project do you expect with kubebuilder > 1.04? fedv2 doesn't use any kb-generated controllers, we use it to bootstrap new types and maintain the generated deepcopy, clients/listers/informers, and crd yaml. @shashidharatd has agreed to work on removing the need for generated clients/listers/informers, so only the deepcopy and crd yaml will remain. |
@marun @shashidharatd The func |
@fisherxu Does that mean that running |
@marun Agree, also desirable to wait until the generated client is removed from the tree. |
@fisherxu Regarding testing the changes in this PR, I suggest writing a managed (fixture is in-process and test-managed) e2e test in which 2 controller managers are started in-process (using etcd fixture), one is killed, and the test validates that the second controller is elected leader. |
Also, removal of the generated client is complicated by the apparent lack of support of the |
Sure, Let's close this pr in favour of #632. |
Thanks @fisherxu |
Enable LeaderElect for federation controller, then can deployed as multiple instances.
/cc @marun @font @irfanurrehman @shashidharatd