Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add installControllers and uninstallControllers for leaderElectionStopping #3132

Merged
merged 1 commit into from
May 30, 2024

Conversation

ramramu3433
Copy link

@ramramu3433 ramramu3433 commented May 16, 2024

Summary

At times we faced workspace controller creation stuck at scheduling phase and never recovers, we found that RCA of the issue is that , when the controllers are started using leaderElection, if the leader is lost once, the Queue processing is never restored ,because workqueue depth increases

  • initialize informers before starting the controllers and informers ,and stop controllers when the pod is no longer a leader
  • reinitialize when the pod is elected as leader
  • move the openapi controller to a seperate post hook to avoid stopping when the pod is lost leader election.

Related issue(s)

Fixes #

Release Notes

 Fix sequencing of controllers/informers start and leader election 

@kcp-ci-bot kcp-ci-bot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. dco-signoff: no Indicates the PR's author has not signed the DCO. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 16, 2024
@kcp-ci-bot
Copy link
Contributor

Hi @ramramu3433. Thanks for your PR.

I'm waiting for a kcp-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kcp-ci-bot kcp-ci-bot added dco-signoff: yes Indicates the PR's author has signed the DCO. and removed dco-signoff: no Indicates the PR's author has not signed the DCO. labels May 16, 2024
@embik
Copy link
Member

embik commented May 16, 2024

/ok-to-test

Please restore the PR description format and give a release note!

@kcp-ci-bot kcp-ci-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 16, 2024
pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
@embik
Copy link
Member

embik commented May 16, 2024

Thank you for the contribution @ramramu3433! I've left some comments, overall this looks like a very promising approach to resolving the issue at hand.

pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
@ramramu3433
Copy link
Author

/test all

@ramramu3433 ramramu3433 force-pushed the fix-leader-election branch 2 times, most recently from f0d93e2 to 6c857ab Compare May 19, 2024 05:29
pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/controllers.go Outdated Show resolved Hide resolved
pkg/server/controllers.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
@ramramu3433 ramramu3433 requested review from sttts and embik May 24, 2024 04:57
Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me now, but please update the PR description before this can be approved and lgtm'd: #3132 (comment).

@kcp-ci-bot kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 24, 2024
@ramramu3433 ramramu3433 changed the title ✨ Add installControllers and uninstallControllers for leaderElectionStopping ✨ Fix controller start issues with leader election state changes May 24, 2024
@ramramu3433 ramramu3433 requested a review from embik May 24, 2024 14:06
@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 24, 2024
@ramramu3433 ramramu3433 changed the title ✨ Fix controller start issues with leader election state changes ✨ Add installControllers and uninstallControllers for leaderElectionStopping May 24, 2024
Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Will leave final approval to others who have reviewed this.

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label May 25, 2024
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 04ca1446bdc0bcd2bbee2e61dc1bb65e8e7548ab

Copy link
Member

@palnabarun palnabarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mjudeikis
Copy link
Contributor

/approve

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mjudeikis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2024
@kcp-ci-bot kcp-ci-bot merged commit 3fb4c44 into kcp-dev:main May 30, 2024
24 checks passed
@embik
Copy link
Member

embik commented Aug 27, 2024

/kind bug

@kcp-ci-bot kcp-ci-bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants