Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support br-federation-manager managing resources in specific-namespace tidb clusters #5410

Merged

Conversation

michaelmdeng
Copy link
Contributor

@michaelmdeng michaelmdeng commented Nov 21, 2023

What problem does this PR solve?

Our infra setup deploys a given TiDB cluster across multiple k8s clusters, so we use federation to manage br resources across these k8s clusters. Additionally, we deploy multiple of these TiDB clusters in a single k8s cluster across different namespaces -- for ease of deployment/upgrade and least-access, we want to deploy the br-federation-manager control plane independently for each of these TiDB clusters to manage volume backups/restores in separate namespaces.

What is changed and how does it work?

This PR applies two sets of changes that enable br-federation-manager (bfm) to run in a namespace-scoped way.

  1. Support clusterScoped in helm charts. When clusterScoped=false, charts generate manifests for Roles and RoleBindings in the specified namespace rather than ClusterRoles and ClusterRoleBindings.
  2. Support clusterScoped in the bfm executable. When -cluster-scoped=false, the manager will listen to changes in br resources only within the specified namespace rather than cluster-wide.

I modeled these changes after how clusterScoped is used in the controller-manager helm charts and executable.

These changes only affect how the control-plane access resources in its namespace and does not affect how bfm manages data-plane resources in other k8s clusters using provided kubeconfig. Managing data-plane resources continues to be done through .spec.clusters.tcNamespace in the federation CRDs.

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test

Manual testing to set up bfm in a namespace-scoped way on a control plane cluster and validate it attempts to reconcile namespaced br resources and does not attempt to reconcile br resources in other namespaces.

  1. Set up a Kubernetes cluster for testing.
  2. Apply CRDs
cd $TIDB_OPERATOR_REPO
kubectl create -f manifests/crd.yaml
kubectl apply -f manifests/federation-crd.yaml
  1. Set up two namespaces, one for the bfm control-plane and one other (representing other namespaces for other tidb clusters)
apiVersion: v1
kind: Namespace
metadata:
  name: cluster1
---
apiVersion: v1
kind: Namespace
metadata:
  name: cluster2
  1. Set up bfm kubeconfig following directions here. This can be mocked as we won't be testing managing data-plane resources

  2. Build image and install namespace-scoped bfm

cd $TIDB_OPERATOR_REPO
make build
cd images/br-federation-manager
docker build -t {bfm-image} .

cd $TIDB_OPERATOR_REPO
helm install --namespace cluster1 br-federation-manager charts/br-federation --set image={bfm-image} --set clusterScoped=false
  1. Create an empty backup in cluster1 namespace control plane.
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackup
metadata:
  name: basic-volumebackup
spec: {}
  1. Validate that deployment attempts to reconcile this backup.
...
2023-11-21T16:32:34.313084051Z I1121 16:32:34.312972       1 fed_volume_backup_controller.go:176] VolumeBackup object cluster1/cluster1-volumebackup enqueue
2023-11-21T16:32:34.410123093Z I1121 16:32:34.410044       1 backup_manager.go:53] sync VolumeBackup cluster1/cluster1-volumebackup
2023-11-21T16:32:34.410132551Z I1121 16:32:34.410078       1 backup_manager.go:120] VolumeBackup cluster1/cluster1-volumebackup set status running
2023-11-21T16:32:34.410134801Z I1121 16:32:34.410090       1 fed_volume_backup_controller.go:126] Finished syncing VolumeBackup "cluster1/cluster1-volumebackup" (78.25µs)
...
  1. Repeat validation for VolumeRestore and VolumeBackupSchedule

  2. Create br resources in other namespace.

apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackup
metadata:
  name: cluster2-volumebackup
  namespace: cluster2
spec: {}
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeRestore
metadata:
  name: cluster2-volumerestore
  namespace: cluster2
spec: {}
  1. Validate that bfm does not register those changes and does not attempt to reconcile those resources.
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.


@sre-bot
Copy link
Contributor

sre-bot commented Nov 21, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

ti-chi-bot bot commented Nov 21, 2023

Welcome @michaelmdeng! It looks like this is your first PR to pingcap/tidb-operator 🎉

@ti-chi-bot ti-chi-bot bot added the size/M label Nov 21, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 23, 2023

Codecov Report

Merging #5410 (63f115c) into master (8ebc21f) will increase coverage by 5.75%.
The diff coverage is 9.09%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5410      +/-   ##
==========================================
+ Coverage   61.43%   67.19%   +5.75%     
==========================================
  Files         228      239      +11     
  Lines       29093    32875    +3782     
==========================================
+ Hits        17873    22090    +4217     
+ Misses       9463     8988     -475     
- Partials     1757     1797      +40     
Flag Coverage Δ
e2e 46.12% <0.00%> (?)
unittest 61.40% <9.09%> (-0.04%) ⬇️

@BornChanger
Copy link
Contributor

/cc @WangLe1321 @csuzhangxc

Copy link
Contributor

@WangLe1321 WangLe1321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Nov 27, 2023
@BornChanger
Copy link
Contributor

/retest

@BornChanger
Copy link
Contributor

/run-pull-e2e-kind-br

@BornChanger
Copy link
Contributor

/test-pull-e2e-kind-serial

@ti-chi-bot ti-chi-bot bot removed the lgtm label Nov 27, 2023
@BornChanger
Copy link
Contributor

/retest

2 similar comments
@BornChanger
Copy link
Contributor

/retest

@BornChanger
Copy link
Contributor

/retest

@michaelmdeng
Copy link
Contributor Author

@BornChanger just pushed a fix to the failing tests, can I get some help rerunning?

@BornChanger
Copy link
Contributor

/retest

@BornChanger
Copy link
Contributor

@michaelmdeng please sign the CLA as required.

@BornChanger
Copy link
Contributor

image

@BornChanger
Copy link
Contributor

/retest

`clusterScoped` controls whether the manager should control resources
cluster-wide or only scoped to the specific namespace
@michaelmdeng
Copy link
Contributor Author

@michaelmdeng please sign the CLA as required.

woops, used the wrong email in the commit, should be fixed now

@BornChanger
Copy link
Contributor

/run-pull-e2e-kind-across-kubernetes

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind-br

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind-across-kubernetes

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind-basic

@WangLe1321
Copy link
Contributor

/run-pull-e2e-kind-serial

@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 1, 2023
Copy link
Contributor

ti-chi-bot bot commented Dec 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BornChanger, WangLe1321

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [BornChanger,WangLe1321]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Dec 1, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-24 06:19:25.661358538 +0000 UTC m=+557994.326584733: ☑️ agreed by WangLe1321.
  • 2023-11-24 06:19:40.224198575 +0000 UTC m=+558008.889424771: ✖️🔁 reset by ti-chi-bot[bot].
  • 2023-11-27 03:19:16.254858274 +0000 UTC m=+806384.920084470: ☑️ agreed by BornChanger.
  • 2023-11-27 14:16:08.670247675 +0000 UTC m=+845797.335473871: ✖️🔁 reset by BornChanger.
  • 2023-12-01 06:54:00.245699502 +0000 UTC m=+1164868.910925682: ☑️ agreed by WangLe1321.

@WangLe1321
Copy link
Contributor

/merge

Copy link
Contributor

ti-chi-bot bot commented Dec 1, 2023

@WangLe1321: We have migrated to builtin LGTM and approve plugins for reviewing.

Please use /approve when you want approve this pull request.

The changes announcement: LGTM plugin changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 6c87ec1 into pingcap:master Dec 1, 2023
11 checks passed
@WangLe1321
Copy link
Contributor

/cherry-pick release-1.5

@ti-chi-bot
Copy link
Member

@WangLe1321: new pull request created to branch release-1.5: #5434.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

csuzhangxc pushed a commit that referenced this pull request Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants