Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb-controller-manager panics if tidb-operator chart is configured without any cluster permissions #4990

Closed
maeb opened this issue Apr 26, 2023 · 3 comments · Fixed by #5058

Comments

@maeb
Copy link

maeb commented Apr 26, 2023

Bug Report

What version of Kubernetes are you using?
Client Version: v1.25.8
Kustomize Version: v4.5.7
Server Version: v1.25.6

What version of TiDB Operator are you using?
v1.4.4

What did you do?
Configure tidb-operator helm chart with the following values:

clusterScoped: false
features:
  - AdvancedStatefulSet=false
  - StableScheduling=false
  - AutoScaling=false

controllerManager:
  clusterPermissions:
    nodes: false
    persistentvolumes: false
    storageclasses: false

scheduler:
  create: false

What did you expect to see?
Since it is not explicitly stated that it is not possible to run tidb-operator without any cluster permissions I expected it to work.

# clusterPermissions are some cluster scoped permissions that will be used even if `clusterScoped: false`.
# the default value of these fields is `true`. if you want them to be `false`, you MUST set them to `false` explicitly.
clusterPermissions:
nodes: true
persistentvolumes: true
storageclasses: true

What did you see instead?

$ kubectl logs -n my-namespace tidb-controller-manager-88f75cd85-rd44n 
I0426 07:52:53.818830       1 features.go:126] feature gates: map[AdvancedStatefulSet:false AutoScaling:false StableScheduling:false VolumeModifying:false]
I0426 07:52:53.818890       1 version.go:38] Welcome to TiDB Operator.
I0426 07:52:53.818900       1 version.go:39] TiDB Operator Version: version.Info{GitVersion:"v1.4.4", GitCommit:"4b1c55c9aabf0410e276efcb460e37488f203b66", GitTreeState:"clean", BuildDate:"2023-03-13T06:41:33Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
I0426 07:52:53.818929       1 main.go:76] FLAG: --V="false"
I0426 07:52:53.818940       1 main.go:76] FLAG: --add_dir_header="false"
I0426 07:52:53.818946       1 main.go:76] FLAG: --alsologtostderr="false"
I0426 07:52:53.818952       1 main.go:76] FLAG: --auto-failover="true"
I0426 07:52:53.818958       1 main.go:76] FLAG: --cluster-permission-node="false"
I0426 07:52:53.818964       1 main.go:76] FLAG: --cluster-permission-pv="false"
I0426 07:52:53.818969       1 main.go:76] FLAG: --cluster-permission-sc="false"
I0426 07:52:53.818974       1 main.go:76] FLAG: --cluster-scoped="false"
I0426 07:52:53.818980       1 main.go:76] FLAG: --detect-node-failure="false"
I0426 07:52:53.818985       1 main.go:76] FLAG: --dm-master-failover-period="5m0s"
I0426 07:52:53.818992       1 main.go:76] FLAG: --dm-worker-failover-period="5m0s"
I0426 07:52:53.818997       1 main.go:76] FLAG: --features="AdvancedStatefulSet=false,AutoScaling=false,StableScheduling=false,VolumeModifying=false"
I0426 07:52:53.819016       1 main.go:76] FLAG: --kube-client-burst="0"
I0426 07:52:53.819024       1 main.go:76] FLAG: --kube-client-qps="0"
I0426 07:52:53.819039       1 main.go:76] FLAG: --leader-lease-duration="15s"
I0426 07:52:53.819047       1 main.go:76] FLAG: --leader-renew-deadline="10s"
I0426 07:52:53.819053       1 main.go:76] FLAG: --leader-retry-period="2s"
I0426 07:52:53.819058       1 main.go:76] FLAG: --log_backtrace_at=":0"
I0426 07:52:53.819069       1 main.go:76] FLAG: --log_dir=""
I0426 07:52:53.819075       1 main.go:76] FLAG: --log_file=""
I0426 07:52:53.819081       1 main.go:76] FLAG: --log_file_max_size="1800"
I0426 07:52:53.819088       1 main.go:76] FLAG: --logtostderr="true"
I0426 07:52:53.819093       1 main.go:76] FLAG: --pd-failover-period="5m0s"
I0426 07:52:53.819105       1 main.go:76] FLAG: --pod-hard-recovery-period="24h0m0s"
I0426 07:52:53.819111       1 main.go:76] FLAG: --resync-duration="30s"
I0426 07:52:53.819116       1 main.go:76] FLAG: --selector=""
I0426 07:52:53.819121       1 main.go:76] FLAG: --skip_headers="false"
I0426 07:52:53.819127       1 main.go:76] FLAG: --skip_log_headers="false"
I0426 07:52:53.819132       1 main.go:76] FLAG: --stderrthreshold="2"
I0426 07:52:53.819139       1 main.go:76] FLAG: --test-mode="false"
I0426 07:52:53.819144       1 main.go:76] FLAG: --tidb-backup-manager-image="pingcap/tidb-backup-manager:v1.4.4"
I0426 07:52:53.819151       1 main.go:76] FLAG: --tidb-discovery-image="pingcap/tidb-operator:v1.4.4"
I0426 07:52:53.819156       1 main.go:76] FLAG: --tidb-failover-period="5m0s"
I0426 07:52:53.819162       1 main.go:76] FLAG: --tiflash-failover-period="5m0s"
I0426 07:52:53.819168       1 main.go:76] FLAG: --tikv-failover-period="5m0s"
I0426 07:52:53.819173       1 main.go:76] FLAG: --v="2"
I0426 07:52:53.819180       1 main.go:76] FLAG: --version="false"
I0426 07:52:53.819185       1 main.go:76] FLAG: --vmodule=""
I0426 07:52:53.819191       1 main.go:76] FLAG: --workers="5"
I0426 07:52:53.819223       1 main.go:93] HELM_RELEASE environment variable not set
I0426 07:52:53.841205       1 dependences.go:320] no permission for nodes, skip creating node lister
I0426 07:52:53.841241       1 dependences.go:325] no permission for persistent volumes, skip creating pv lister
I0426 07:52:53.841249       1 dependences.go:330] no permission for storage classes, skip creating sc lister
I0426 07:52:53.842690       1 leaderelection.go:243] attempting to acquire leader lease  my-namespace/tidb-controller-manager...
I0426 07:52:53.850854       1 leaderelection.go:253] successfully acquired lease my-namespace/tidb-controller-manager
I0426 07:52:53.853322       1 upgrader.go:109] Upgrader: APIGroup apps.pingcap.com is not registered, skip checking Advanced Statfulset
I0426 07:52:53.854012       1 reflector.go:207] Starting reflector *v1alpha1.Backup (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854058       1 reflector.go:207] Starting reflector *v1alpha1.TidbInitializer (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854019       1 reflector.go:207] Starting reflector *v1alpha1.TidbNGMonitoring (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854094       1 reflector.go:207] Starting reflector *v1alpha1.TidbCluster (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854102       1 reflector.go:207] Starting reflector *v1alpha1.DMCluster (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854021       1 reflector.go:207] Starting reflector *v1alpha1.BackupSchedule (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854135       1 reflector.go:207] Starting reflector *v1alpha1.TidbClusterAutoScaler (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854012       1 reflector.go:207] Starting reflector *v1alpha1.TidbMonitor (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854074       1 reflector.go:207] Starting reflector *v1alpha1.TidbDashboard (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:53.854021       1 reflector.go:207] Starting reflector *v1alpha1.Restore (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
W0426 07:52:53.855129       1 backup_tracker.go:82] list backups error backups.pingcap.com is forbidden: User "system:serviceaccount:my-namespace:tidb-controller-manager" cannot list resource "backups" in API group "pingcap.com" at the cluster scope, will retry
I0426 07:52:54.155190       1 reflector.go:207] Starting reflector *v1.Pod (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155232       1 reflector.go:207] Starting reflector *v1.Service (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155241       1 reflector.go:207] Starting reflector *v1.Secret (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155242       1 reflector.go:207] Starting reflector *v1.StatefulSet (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155332       1 reflector.go:207] Starting reflector *v1.Ingress (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155193       1 reflector.go:207] Starting reflector *v1.Endpoints (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155190       1 reflector.go:207] Starting reflector *v1.Deployment (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155298       1 reflector.go:207] Starting reflector *v1.Job (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.155430       1 reflector.go:207] Starting reflector *v1.PersistentVolumeClaim (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
W0426 07:52:54.256226       1 backup_tracker.go:82] list backups error backups.pingcap.com is forbidden: User "system:serviceaccount:my-namespace:tidb-controller-manager" cannot list resource "backups" in API group "pingcap.com" at the cluster scope, will retry
I0426 07:52:54.256288       1 reflector.go:207] Starting reflector *v1.ConfigMap (30s) from k8s.io/[email protected]/tools/cache/reflector.go:156
I0426 07:52:54.356627       1 main.go:195] cache of informer factories sync successfully
I0426 07:52:54.356675       1 tidb_dashboard_controller.go:79] Starting tidb-dashboard controller
I0426 07:52:54.356728       1 backup_controller.go:78] Starting backup controller
I0426 07:52:54.356742       1 backup_schedule_controller.go:70] Starting backup schedule controller
I0426 07:52:54.356730       1 restore_controller.go:71] Starting restore controller
I0426 07:52:54.356763       1 tidb_initializer_controller.go:69] Starting tidbinitializer controller
I0426 07:52:54.356777       1 dm_cluster_controller.go:94] Starting dmcluster controller
I0426 07:52:54.356803       1 tidb_monitor_controller.go:65] Starting tidbmonitor controller
I0426 07:52:54.356820       1 pod_control.go:113] Starting tidbcluster pod controller
I0426 07:52:54.356824       1 tidb_ng_monitoring_controller.go:78] Starting tidbngmonitor controller
I0426 07:52:54.356807       1 tidb_cluster_controller.go:105] Starting tidbcluster controller
W0426 07:52:54.455729       1 backup_tracker.go:82] list backups error backups.pingcap.com is forbidden: User "system:serviceaccount:my-namespace:tidb-controller-manager" cannot list resource "backups" in API group "pingcap.com" at the cluster scope, will retry
E0426 07:52:54.475184       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 635 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2a88e40?, 0x48c9990})
	k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b98e70?})
	k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x75
panic({0x2a88e40, 0x48c9990})
	runtime/panic.go:884 +0x212
github.com/pingcap/tidb-operator/pkg/manager/volumes.getStorageClass(...)
	github.com/pingcap/tidb-operator/pkg/manager/volumes/pod_vol_modifier.go:286
github.com/pingcap/tidb-operator/pkg/manager/volumes.(*podVolModifier).GetDesiredVolumes(0x0?, 0xc000a00000, {0x2e586c1, 0x2})
	github.com/pingcap/tidb-operator/pkg/manager/volumes/pod_vol_modifier.go:182 +0x9b
github.com/pingcap/tidb-operator/pkg/manager/volumes.SyncVolumeStatus({0x34df500, 0xc000a9b200}, {0x34cb688, 0xc00057b990}, 0xc000a00000, {0x2e586c1, 0x2})
	github.com/pingcap/tidb-operator/pkg/manager/volumes/sync_volume_status.go:33 +0xad
github.com/pingcap/tidb-operator/pkg/manager/member.(*pdMemberManager).syncTidbClusterStatus(0xc0012936e0, 0xc000a00000, 0xc000ca6000)
	github.com/pingcap/tidb-operator/pkg/manager/member/pd_member_manager.go:424 +0x1165
github.com/pingcap/tidb-operator/pkg/manager/member.(*pdMemberManager).syncPDStatefulSetForTidbCluster(0xc0012936e0, 0xc000a00000)
	github.com/pingcap/tidb-operator/pkg/manager/member/pd_member_manager.go:204 +0x345
github.com/pingcap/tidb-operator/pkg/manager/member.(*pdMemberManager).Sync(0xc0012936e0, 0xc000a00000)
	github.com/pingcap/tidb-operator/pkg/manager/member/pd_member_manager.go:105 +0xd1
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*defaultTidbClusterControl).updateTidbCluster(0xc0000365a0, 0xc000a00000)
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:182 +0x169
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*defaultTidbClusterControl).UpdateTidbCluster(0xc0000365a0, 0xc000a00000)
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:114 +0xb0
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).syncTidbCluster(...)
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:166
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).sync(0xc000df4f90, {0xc00099c030, 0x18})
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:162 +0x278
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).processNextWorkItem(0xc000df4f90)
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:129 +0x108
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).worker(0xc00184a6a0?)
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:117 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x34ac6a0, 0xc001156030}, 0x1, 0xc00014e180)
	k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
	k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x25
created by github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).Run
	github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:109 +0x157
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2615cfb]

The culprit is that the StorageClassLister:

StorageClassLister: scLister,

is accessed but it is nil because it was never initialized:

if cliCfg.HasSCPermission() {
scLister = kubeInformerFactory.Storage().V1().StorageClasses().Lister()
} else {
klog.Info("no permission for storage classes, skip creating sc lister")
}

@csuzhangxc
Copy link
Member

Thanks for your report. We should fix this panic may similarly with

// check whether the storage class support
if p.deps.StorageClassLister != nil {

@maeb
Copy link
Author

maeb commented Apr 28, 2023

Maybe it would work if the feature VolumeModifying was set to false. Haven't tested it yet.

@csuzhangxc
Copy link
Member

VolumeModifying is mainly used to change IOPS and throughput for AWS EBS online until now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants