CompositeController is not working when deployed before children CRDS #86

misohu · 2023-09-07T09:50:21Z

Bug Description

This bug occurs for kfp-profile-controller usecase which deploys composite controller here. This controller needs to create a PodDefaults CR. When the CRD is not created before the composite controller is deployed the metacontroller is reporting errors

$ kubectl -n kubeflow logs metacontroller-operator-charm-0
{"level":"error","ts":1694079442.6362295,"logger":"controller-runtime.manager.controller.composite-metacontroller","msg":"Reconciler error","name":"kubeflow-pipelines-profile-controller","namespace":"","error":"can't find child resource \"poddefaults\" in kubeflow.org/v1alpha1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}

If we deploy the CRD manually the metacontroller pod gets stuck without recognizing the change. The metacontroller's pod stays in ACTIVE status without reporting any new logs. The composite controller object stays in state of not recognizing poddefault CRD.

$ kubectl describe compositecontroller  kubeflow-pipelines-profile-controller
Name:         kubeflow-pipelines-profile-controller
Namespace:    
Labels:       app.juju.is/created-by=kfp-profile-controller
              app.kubernetes.io/instance=kfp-profile-controller-kubeflow
              kubernetes-resource-handler-scope=secrets-and-compositecontroller
Annotations:  <none>
API Version:  metacontroller.k8s.io/v1alpha1
Kind:         CompositeController
Metadata:
  Creation Timestamp:  2023-09-07T08:48:13Z
  Generation:          1
  Managed Fields:
    API Version:  metacontroller.k8s.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:app.kubernetes.io/instance:
          f:kubernetes-resource-handler-scope:
      f:spec:
        f:childResources:
        f:generateSelector:
        f:hooks:
          f:sync:
            f:webhook:
              f:url:
        f:parentResource:
          f:apiVersion:
          f:resource:
        f:resyncPeriodSeconds:
    Manager:         lightkube
    Operation:       Apply
    Time:            2023-09-07T08:48:13Z
  Resource Version:  2023
  UID:               f7c28af5-1930-4254-9dfd-9c39ca405b63
Spec:
  Child Resources:
    API Version:  v1
    Resource:     secrets
    Update Strategy:
      Method:     InPlace
    API Version:  v1
    Resource:     configmaps
    Update Strategy:
      Method:     OnDelete
    API Version:  apps/v1
    Resource:     deployments
    Update Strategy:
      Method:     InPlace
    API Version:  v1
    Resource:     services
    Update Strategy:
      Method:     InPlace
    API Version:  kubeflow.org/v1alpha1
    Resource:     poddefaults
    Update Strategy:
      Method:         InPlace
  Generate Selector:  true
  Hooks:
    Sync:
      Webhook:
        URL:  http://kfp-profile-controller.kubeflow/sync
  Parent Resource:
    API Version:          v1
    Resource:             namespaces
  Resync Period Seconds:  3600
Events:
  Type     Reason       Age                  From            Message
  ----     ------       ----                 ----            -------
  Warning  CreateError  74s (x4 over 2m26s)  metacontroller  Cannot create new controller: can't find child resource "poddefaults" in kubeflow.org/v1alpha1

Only way to recover the metacontroller is to manually remove the metacontroller pod.

kubectl delete po -n kubeflow metacontroller-operator-charm-0

This may be problematic in prod deployments where the CRD for PodDefaults will be deployed later than the metacontroller.

To Reproduce

Deploy charms

juju deploy metacontroller-operator --channel latest/edge --trust
juju deploy kfp-profile-controller --channel latest/edge --trust
juju deploy minio --channel latest/edge --trust
juju relate minio:object-storage kfp-profile-controller:object-storage

Create test namespace

kubectl create ns test
kubectl label ns test --overwrite pipelines.kubeflow.org/enabled=true

Chech metacontroller for errors (it is also stuck)

kubectl -n kubeflow logs metacontroller-operator-charm-0 --timestamps

Manualy deploy crd from here https://github.com/canonical/admission-webhook-operator/blob/main/src/templates/crds.yaml.j2

kubectl apply -f file_with_crd_yaml.yaml

Now metacontroller no longer responses and no poddefault CRs are created in test ns

kubectl -n kubeflow logs metacontroller-operator-charm-0 --timestamps
kubectl get poddefaults -n test

To recover remove the pod and let statefulset to recreate it

kubectl delete po metacontroller-operator-charm-0

Environment

microk8s v1.24.17
juju version 2.9.44-ubuntu-amd64
metacontroller version v2.0.4

Relevant log output

Model     Controller  Cloud/Region        Version  SLA          Timestamp
kubeflow  uk8sx       microk8s/localhost  2.9.44   unsupported  11:49:17+02:00

App                      Version                Status  Scale  Charm                    Channel      Rev  Address         Exposed  Message
kfp-profile-controller                          active      1  kfp-profile-controller                  0  10.152.183.55   no       
kubeflow-profiles                               active      1  kubeflow-profiles        latest/edge  321  10.152.183.205  no       
metacontroller-operator                         active      1  metacontroller-operator                 1  10.152.183.124  no       
minio                    res:oci-image@1755999  active      1  minio                    latest/edge  231  10.152.183.58   no       

Unit                        Workload  Agent  Address      Ports              Message
kfp-profile-controller/0*   active    idle   10.1.137.14                     
kubeflow-profiles/0*        active    idle   10.1.137.15                     
metacontroller-operator/0*  active    idle   10.1.137.9                      
minio/0*                    active    idle   10.1.137.13  9000/TCP,9001/TCP  

Relation provider     Requirer                               Interface       Type     Message
minio:object-storage  kfp-profile-controller:object-storage  object-storage  regular

Additional context

No response

The text was updated successfully, but these errors were encountered:

phoevos mentioned this issue Sep 7, 2023

fix: Patch KFP Profile Controller service port canonical/kfp-operators#318

Merged

NohaIhab added the bug Something isn't working label Nov 24, 2023

kimwnasptd mentioned this issue Jul 4, 2024

UATs are failing with 401 in one-click installation canonical/bundle-kubeflow#951

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompositeController is not working when deployed before children CRDS #86

CompositeController is not working when deployed before children CRDS #86

misohu commented Sep 7, 2023

CompositeController is not working when deployed before children CRDS #86

CompositeController is not working when deployed before children CRDS #86

Comments

misohu commented Sep 7, 2023

Bug Description

To Reproduce

Environment

Relevant log output

Additional context