-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚠️ Remove defaulting for leader election ID #446
⚠️ Remove defaulting for leader election ID #446
Conversation
With the current defaulting for the leader election ID, there is a clash as soon as two controller that do not have it explicitly configured run in the same namespace or have the same namespace configured for leader election. This is especially bad since there is no logging about the lock being held by a different controller, so from a users perspective this looks like the controller just froze.
d25b1ee
to
687d7fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I do wonder if we should also remove the namespace defaulting too, sounds like it could have the same problem
/lgtm |
/hold till we resolve the discussion on #445 (I'm happy to be corrected on that issue, just want to finish up that discussion first) |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't think we meant to close this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
As I already mentioned I'm good with increasing the cognitive load if it means the adopter is cognizant, and sole responsible, of the issue(s) that may arise if resource names collide.
@pires: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@DirectXMan12 re #445 (comment): Can we add this change to the |
Added. I think we've got a breaking one coming up due to #749, which I think we need in for supportability reasons. |
/remove-lifecycle rotten |
@DirectXMan12 bump to get this in as #749 merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ready to go.
/hold cancel
/lgtm
/assign @gerred @DirectXMan12
/approve |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, DirectXMan12, gerred The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -143,15 +143,16 @@ var _ = Describe("manger.Manager", func() { | |||
m, err := New(cfg, Options{ | |||
LeaderElection: true, | |||
LeaderElectionNamespace: "default", | |||
LeaderElectionID: "test-leader-election-id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, this test should have a different name now. I'll fix it elsewhere.
* Update to controller-runtime-0.5.0 * And K8S 1.17 libraries * No need to set --advertise-address This has now been fixed upstream. * Controller-runtime-0.5.0 requires the leaderElectionID to be set See kubernetes-sigs/controller-runtime#446
Reason for upgrade: The new version uses [DynamicRESTMapper](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/apiutil/dynamicrestmapper.go) as default `RESTMapper` for controller runtime manager. `DynamicRESTMapper` will "reload the delegated `meta.RESTMapper` on a cache miss" (see kubernetes-sigs/controller-runtime#554), which can solve problem that we need to restart HNC after adding a new CRD to create corresponding object reconciler when using controller-runtime v0.2.2 (see details in kubernetes-retired#488). Incompatibility issues addressed in this PR: - Upgrade go version in `go.mod` and `Dockerfile` to 1.13. The [errors.As](https://golang.org/pkg/errors/#As) in [dynamicrestmapper.go](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/client/apiutil/dynamicrestmapper.go#L48) requires go 1.13. - A higher version for k8s.io/cli-runtime and k8s.io/client-go are required after upgrading controller-runtime to v0.5.0 - Version changes of other packages in `go.mod` are updated automatically. - serializer.DirectCodecFactory was renamed to [serializer.WithoutConversionCodecFactory](https://godoc.org/k8s.io/apimachinery/pkg/runtime/serializer#WithoutConversionCodecFactory) after k8s.io/apimachinery 1.16 (see [here](kubernetes/apimachinery@ed8af17), and [here](kubernetes/apimachinery@4fac835)) - Default [LeaderElectionID](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/leaderelection/leader_election.go#L46) in controller-runtime manager is removed after kubernetes-sigs/controller-runtime#446. - [NewlineReporter](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer) is moved from `sigs.k8s.io/controller-runtime/pkg/envtest/` to `https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer` by [this](kubernetes-sigs/controller-runtime@748f55d#diff-42de1d59fbe8f8b90154f01dd01f5748) commit. - In controller-runtime v0.2.2, if a resource does not exist, a controller cannot be created successfully. After update controller-runtime to v0.5.0, a controller can be created without error. However, when the `Reconcile` method is triggered, there will be an error complaining the resource does not exist. Therefore, we will explicitly check if a resource exists before creating the corresponding object reconciler in `createObjectReconciler` in `hnc_config.go` (see details in kubernetes-sigs/controller-runtime#840) Tested: - Unit tests. - Went through [demo script](https://docs.google.com/document/d/1tKQgtMSf0wfT3NOGQx9ExUQ-B8UkkdVZB6m4o3Zqn64/edit#) to make sure HNC behaves as expected on a GKE cluster. - Manually test if the PR solves the restart problem as described in kubernetes-retired#488 with following workflow: - Install HNC - Install a new CRD - Config the new type in `config` singleton Before this PR, corresponding object reconciler for the new type will not be created unless we restart HNC. After the change, corresponding object reconciler can be created and it reconciles objects of the new type as expected without restarting HNC. This partly solve: kubernetes-retired#488
kubernetes-sigs/controller-runtime#446 removed the default and #30 updated kubebuilder to v0.5.0 when this was released.
kubernetes-sigs/controller-runtime#446 removed the default and #30 updated kubebuilder to v0.5.0 when this was released.
With the current defaulting for the leader election ID, there is a clash
as soon as two controller that do not have it explicitly configured run
in the same namespace or have the same namespace configured for leader
election.
This is especially bad since there is no logging about the lock being
held by a different controller, so from a users perspective this looks
like the controller just froze.
Fixes #445
/assign @DirectXMan12
/cc @JoelSpeed