Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Remove defaulting for leader election ID #446

Conversation

alvaroaleman
Copy link
Member

With the current defaulting for the leader election ID, there is a clash
as soon as two controller that do not have it explicitly configured run
in the same namespace or have the same namespace configured for leader
election.

This is especially bad since there is no logging about the lock being
held by a different controller, so from a users perspective this looks
like the controller just froze.

Fixes #445

/assign @DirectXMan12
/cc @JoelSpeed

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 25, 2019
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 25, 2019
With the current defaulting for the leader election ID, there is a clash
as soon as two controller that do not have it explicitly configured run
in the same namespace or have the same namespace configured for leader
election.

This is especially bad since there is no logging about the lock being
held by a different controller, so from a users perspective this looks
like the controller just froze.
@alvaroaleman alvaroaleman force-pushed the remove-defaulting-for-leader-election branch from d25b1ee to 687d7fc Compare May 25, 2019 17:57
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 25, 2019
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I do wonder if we should also remove the namespace defaulting too, sounds like it could have the same problem

@JoelSpeed
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2019
@DirectXMan12
Copy link
Contributor

/hold

till we resolve the discussion on #445 (I'm happy to be corrected on that issue, just want to finish up that discussion first)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 2, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DirectXMan12 DirectXMan12 reopened this Nov 5, 2019
@DirectXMan12
Copy link
Contributor

I don't think we meant to close this...

Copy link
Contributor

@pires pires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

As I already mentioned I'm good with increasing the cognitive load if it means the adopter is cognizant, and sole responsible, of the issue(s) that may arise if resource names collide.

@k8s-ci-robot
Copy link
Contributor

@pires: changing LGTM is restricted to collaborators

In response to this:

/lgtm

As I already mentioned I'm good with increasing the cognitive load if it means the adopter is cognizant, and sole responsible, of the issue(s) that may arise if resource names collide.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alvaroaleman
Copy link
Member Author

@DirectXMan12 re #445 (comment): Can we add this change to the breaking-next milestone?

@DirectXMan12 DirectXMan12 added this to the breaking-next milestone Jan 6, 2020
@DirectXMan12
Copy link
Contributor

Added. I think we've got a breaking one coming up due to #749, which I think we need in for supportability reasons.

@alexeldeib
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 21, 2020
@shawn-hurley
Copy link

@DirectXMan12 bump to get this in as #749 merged

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ready to go.

/hold cancel
/lgtm
/assign @gerred @DirectXMan12

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2020
@DirectXMan12
Copy link
Contributor

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2020
@gerred
Copy link
Contributor

gerred commented Feb 5, 2020

/approve

@k8s-ci-robot k8s-ci-robot merged commit 552b43d into kubernetes-sigs:master Feb 5, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, DirectXMan12, gerred

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [DirectXMan12,gerred]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -143,15 +143,16 @@ var _ = Describe("manger.Manager", func() {
m, err := New(cfg, Options{
LeaderElection: true,
LeaderElectionNamespace: "default",
LeaderElectionID: "test-leader-election-id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this test should have a different name now. I'll fix it elsewhere.

wallrj added a commit to wallrj/etcd-cluster-operator that referenced this pull request Feb 14, 2020
wallrj added a commit to improbable-eng/etcd-cluster-operator that referenced this pull request Feb 17, 2020
* Update to controller-runtime-0.5.0
* And K8S 1.17 libraries
* No need to set --advertise-address This has now been fixed upstream.
* Controller-runtime-0.5.0 requires the leaderElectionID to be set

See kubernetes-sigs/controller-runtime#446
sophieliu15 added a commit to sophieliu15/multi-tenancy that referenced this pull request Mar 6, 2020
Reason for upgrade:
The new version uses [DynamicRESTMapper](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/apiutil/dynamicrestmapper.go) as default `RESTMapper` for controller runtime manager. `DynamicRESTMapper` will "reload the delegated `meta.RESTMapper` on a cache miss" (see kubernetes-sigs/controller-runtime#554), which can solve problem that we need to restart HNC after adding a new CRD to create corresponding object reconciler when using controller-runtime v0.2.2 (see details in kubernetes-retired#488).

Incompatibility issues addressed in this PR:

- Upgrade go version in `go.mod` and `Dockerfile` to 1.13. The [errors.As](https://golang.org/pkg/errors/#As) in [dynamicrestmapper.go](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/client/apiutil/dynamicrestmapper.go#L48) requires go 1.13.
- A higher version for k8s.io/cli-runtime and k8s.io/client-go are required after upgrading controller-runtime to v0.5.0
- Version changes of other packages in `go.mod` are updated automatically.
- serializer.DirectCodecFactory was renamed to [serializer.WithoutConversionCodecFactory](https://godoc.org/k8s.io/apimachinery/pkg/runtime/serializer#WithoutConversionCodecFactory) after k8s.io/apimachinery 1.16 (see [here](kubernetes/apimachinery@ed8af17), and [here](kubernetes/apimachinery@4fac835))
- Default [LeaderElectionID](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/leaderelection/leader_election.go#L46) in controller-runtime manager is removed after kubernetes-sigs/controller-runtime#446.
- [NewlineReporter](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer) is moved from `sigs.k8s.io/controller-runtime/pkg/envtest/` to `https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer` by [this](kubernetes-sigs/controller-runtime@748f55d#diff-42de1d59fbe8f8b90154f01dd01f5748) commit.
- In controller-runtime v0.2.2, if a resource does not exist, a
controller cannot be created successfully. After update
controller-runtime to v0.5.0, a controller can be created without error.
However, when the `Reconcile` method is triggered, there will be an
error complaining the resource does not exist. Therefore, we will
explicitly check if a resource exists before creating the corresponding
object reconciler in `createObjectReconciler` in `hnc_config.go` (see
details in kubernetes-sigs/controller-runtime#840)

Tested:

- Unit tests.
- Went through [demo script](https://docs.google.com/document/d/1tKQgtMSf0wfT3NOGQx9ExUQ-B8UkkdVZB6m4o3Zqn64/edit#) to make sure HNC behaves as expected on a GKE cluster.
- Manually test if the PR solves the restart problem as described in kubernetes-retired#488 with following workflow:
     - Install HNC
     - Install a new CRD
     - Config the new type in `config` singleton

Before this PR, corresponding object reconciler for the new type will not be created unless we restart HNC. After the change, corresponding object reconciler can be created and it reconciles objects of the new type as expected without restarting HNC.

This partly solve:  kubernetes-retired#488
steved added a commit to dominodatalab/forge that referenced this pull request Jun 26, 2020
kubernetes-sigs/controller-runtime#446 removed
the default and #30 updated kubebuilder to v0.5.0 when this was
released.
steved added a commit to dominodatalab/forge that referenced this pull request Jul 1, 2020
kubernetes-sigs/controller-runtime#446 removed
the default and #30 updated kubebuilder to v0.5.0 when this was
released.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove defaulting for leader election ID
10 participants