Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

Enable LeaderElect for federation controller #394

Closed
wants to merge 2 commits into from
Closed

Enable LeaderElect for federation controller #394

wants to merge 2 commits into from

Conversation

fisherxu
Copy link
Contributor

@fisherxu fisherxu commented Nov 6, 2018

Enable LeaderElect for federation controller, then can deployed as multiple instances.
/cc @marun @font @irfanurrehman @shashidharatd

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 6, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fisherxu
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: irfanurrehman

If they are not already assigned, you can assign the PR to them by writing /assign @irfanurrehman in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2018
@@ -2,7 +2,7 @@
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

## Configuration values for federation v2 controllermanager statefulset.
## Configuration values for federation v2 controllermanager deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check #245 (comment) for why we are using sts here.

/cc @marun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reminding @gyliu513 :) I have checked the comment, and that's in the absence of leader election. We need deploy multi instances with leader election, and to the multi instances of controller, deployment should be fine like kube-controller-manager.

@marun
Copy link
Contributor

marun commented Nov 6, 2018

I think this kind of feature deserves some discussion, maybe even a design document, before implementation is proposed.

Edit: specifically - I'd like to have the use cases this is intended to target clearly documented and agreed upon before work is merged.

@fisherxu
Copy link
Contributor Author

fisherxu commented Nov 7, 2018

I'd like to have the use cases this is intended to target clearly documented

@marun Has raised the issue #402 to describe some use cases, and we can have some discussion there :)
And split the deployment to another PR #404.

Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an e2e test of leader election that ensures the code adding by this PR is exercised and won't rot. I'm happy to assist if you would find that useful. I think that will likely involve adding a minimal 3rd test run alongside managed and managed to ensure test isolation.

RetryPeriod: retryPeriod,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: run,
OnStoppedLeading: func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leader election callback would appear to ensure that a master is started, but how will a master be stopped if it loses the leadership? It appears that a master once started would never be stopped since there is nothing that closes the stop channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When starting leadership, the leaderelection.RunOrDie will create the stop channel, and close it when loss of leadership.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suggests that the stop channel for the non-leader elect path be created in the if !enableLeaderElection so that it's clear that it won't be used outside of that block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marun Have updated, now we can pass the stopChan through the ctx to the run func. I have checked in my environment, it works fine :) PTAL.

Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: run,
OnStoppedLeading: func() {
glog.Fatalf("leaderelection lost")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On loss of leadership, it seems like it would be sufficient to send on the stop channel passed to run.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 19, 2018
@marun
Copy link
Contributor

marun commented Dec 17, 2018

@fisherxu Do you intend to follow up on this PR? It's been dormant for over a month.

@fisherxu
Copy link
Contributor Author

fisherxu commented Dec 19, 2018

@marun Sorry for the delay, and I think this PR should depend on this PR #57932, to pass stop channel to run function.
And this PR is merged in client-go v1.12.0, so now the client-go we used(v1.10.1) really can't pass the stop channel.
Do we have any plane to upgrade the client-go? :)

@marun
Copy link
Contributor

marun commented Dec 20, 2018

@fisherxu Thank you for following up!

I'm ok with upgrading to the latest client-go if it's possible. I think you mentioned on slack that upgrading client-go may depend on kubebuilder upgrading?

@fisherxu
Copy link
Contributor Author

fisherxu commented Dec 20, 2018

@marun I have checked today.
the code(pkg/client/clientset/versioned/typed/GROUP/VERSION/fake) generated by kubebuilder(v1.0.4) is not compatible with client-go(kubernetes-1.13.0), but is compatible with client-go(kubernetes-1.12.0).

the controller-runtime (only used in e2e now) use the v0.1.1 version which is generated by kubebuilder(v1.0.4) , the controller-runtime(v0.1.1) is not compatible with client-go(kubernetes-1.12.0), so can we upgrade it to v0.1.8 manuall as in #530 , although it's generated by kubebuilder(v1.0.4)?

If we upgrade the kubebuilder to 1.0.5 or .6, project maybe will have big changes... but kubebuilder(v1.0.4) can only support up to client-go(kubernetes-1.12.0).

@marun
Copy link
Contributor

marun commented Jan 2, 2019

@fisherxu What changes to the project do you expect with kubebuilder > 1.04? fedv2 doesn't use any kb-generated controllers, we use it to bootstrap new types and maintain the generated deepcopy, clients/listers/informers, and crd yaml. @shashidharatd has agreed to work on removing the need for generated clients/listers/informers, so only the deepcopy and crd yaml will remain.

@fisherxu
Copy link
Contributor Author

fisherxu commented Jan 3, 2019

@marun @shashidharatd The func NewPatchSubresourceAction in pkg/client/clientset/versioned/typed/GROUP/VERSION/fake generayed by kubebuilder(v1.0.4) (line for example) don't pass the PatchType parameter. But in the latest client-go1.13.+ this func need the PatchType parameter. So upgrade client-go to 1.13 will make the CI failed. Have any suggestion here? :)

@marun
Copy link
Contributor

marun commented Jan 3, 2019

@fisherxu Does that mean that running kubebuilder generate with a version of kubebuilder > 1.04 will fix the problem? If it's more complicated than that, it may be desirable to wait until the generated client is removed from the tree.

@fisherxu
Copy link
Contributor Author

fisherxu commented Jan 7, 2019

If it's more complicated than that, it may be desirable to wait until the generated client is removed from the tree.

@marun Agree, also desirable to wait until the generated client is removed from the tree.

@marun
Copy link
Contributor

marun commented Jan 24, 2019

@fisherxu Regarding testing the changes in this PR, I suggest writing a managed (fixture is in-process and test-managed) e2e test in which 2 controller managers are started in-process (using etcd fixture), one is killed, and the test validates that the second controller is elected leader.

@marun
Copy link
Contributor

marun commented Jan 24, 2019

Also, removal of the generated client is complicated by the apparent lack of support of the Watch method in controller-runtime's generic client. Do you know of an alternative way of watching without a generated client (without having to adopt the new and largely untested controller scheme of the newer versions of kubebuilder)? Or would it be possible to update the generated clients to be compatible?

@shashidharatd
Copy link
Contributor

Hi @fisherxu, Shall we close this pr in favour of #632 ?

@fisherxu
Copy link
Contributor Author

fisherxu commented Mar 7, 2019

Sure, Let's close this pr in favour of #632.

@fisherxu fisherxu closed this Mar 7, 2019
@shashidharatd
Copy link
Contributor

Thanks @fisherxu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants