Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dynamic creation for admission hooks and update dependencies #1450

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented Feb 26, 2021

Fixes: #1405.

This PR introduces new mechanism to get certificate for webhooks.
I updated YAMLs for our webhooks.
I added initContainer to Katib controller which executes cert-generator.sh script.
This script creates CertificateSigningRequest, katib-webhook-cert secret and patches webhooks configurations with appropriate caBundle.
Since we have katib-webhook-cert secret in the manifest, cleanup process should delete everything.

So we don't need to deploy cert-manager for Katib.

@gaocegege @johnugeorge @yanniszark @kuikuikuizzZ @knkski What do you think about this approach ?

Also I updated controller-runtime to v0.8.2 and k8s.io deps to v0.20.4. That requires some changes:

  • Change some packages location
  • Change the arguments for client calls (List, Get, etc.)
  • In the newer Kubernetes versions we can't add owner reference for cluster-scoped objects (e.g. PV) with namespace-scoped object (e.g. Suggestion). Thus, I have to disable owner reference for the PV which is created when Experiment has FromVolume resume policy. For that reason, I added PersistentVolumeReclaimPolicy: Delete for the PV and once PVC is garbage collected, PV should also be deleted.
  • I removed PyTorch operator from the dependencies because of this problem.

I still need to make some tests and create new image for cert generator.
It would be great if you can start to review this.

/cc @gaocegege @johnugeorge

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member Author

I named the ValidatingWebhookConfiguration with katib.kubeflow.org which contains validator.experiment.katib.kubeflow.org webhook.

I named the MutatingWebhookConfiguration with katib.kubeflow.org which contains defaulter.experiment.katib.kubeflow.org and mutator.pod.katib.kubeflow.org webhooks.

Please let me know what do you think about that naming ?

@kuikuikuizzZ
Copy link

@andreyvelich
Copy link
Member Author

andreyvelich commented Feb 26, 2021

Is the secret.yaml in manifest still needed? https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/katib-controller/secret.yaml @andreyvelich

Basically this secret is needed only to clean-up all resources after removing Katib from the cluster.
We will include it to the kustomization file.

I think users can also use this secret to create their own certs (We might provide this functionality also).

@kuikuikuizzZ
Copy link

Is the secret.yaml in manifest still needed? https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/katib-controller/secret.yaml @andreyvelich

Basically this secret is needed only to clean-up all resources after removing Katib from the cluster.
We will include it to the kustomization file.

I think users can also use this secret to create their own certs (We might provide this functionality also).

ok, SGTM

@andreyvelich
Copy link
Member Author

@tenzen-y @imilos @zuiurs @trog-levrai I tested this change on GKE cluster with v1.19.7-gke.1500 version and the new build is working for me. I didn't receive error that you noticed here: #1395.

It would be great to see how it works on your platforms since you were facing problems with CommonName before.

/cc @gaocegege @johnugeorge

@andreyvelich
Copy link
Member Author

Also I switched to certificates.k8s.io/v1 from certificates.k8s.io/v1beta1 since v1beta1 will be not supported soon.

Thus, I had to update signerName: kubernetes.io/kube-apiserver-client in CSR and RBAC according to this doc: https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers.

@andreyvelich
Copy link
Member Author

I downgraded CertificateSigningRequest to v1beta1 since we are using v1.18 Kubernetes version for our CI.
cc @PatrickXYS.

The test seems to be working.
I removed WIP status, please take a look.

@andreyvelich
Copy link
Member Author

/hold for reviews

@andreyvelich andreyvelich changed the title [WIP] Disable dynamic creation for admission hooks and update dependencies Disable dynamic creation for admission hooks and update dependencies Feb 26, 2021
@PatrickXYS
Copy link
Member

@andreyvelich https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#api-change-2

I see, you can use 1.19 EKS for test now, we delivered it a few days ago

@PatrickXYS
Copy link
Member

Ah looks like I need to make changes in kubeflow/testing https://github.com/kubeflow/testing/blob/master/images/aws-scripts/create-eks-cluster.sh#L29

And build new images, let me know if you need my help @andreyvelich

@andreyvelich
Copy link
Member Author

Thank you for this information @PatrickXYS!
I think it's better to leave v1beta1 version for CSR for now.
I believe many of our users are still using k8s < 1.19

@PatrickXYS
Copy link
Member

@andreyvelich Good assumption, we definitely need to provide backward compatiability. And staying 1.18 should be a good way to go for now

@trog-levrai
Copy link

trog-levrai commented Feb 27, 2021

I installed katib with kubeflow on a minikube setup for testing only. I'm afraid I'm not sure how to test it, I tried downloading this branch and doing make deploy but I got the following error log:

bash scripts/v1beta1/deploy.sh
++ dirname scripts/v1beta1/deploy.sh
+ SCRIPT_ROOT=scripts/v1beta1/../..
+ cd scripts/v1beta1/../..
+ kubectl get validatingwebhookconfigurations katib-validating-webhook-config
NAME                              WEBHOOKS   AGE
katib-validating-webhook-config   1          9d
+ kubectl delete validatingwebhookconfigurations katib-validating-webhook-config
validatingwebhookconfiguration.admissionregistration.k8s.io "katib-validating-webhook-config" deleted
+ kubectl get mutatingwebhookconfigurations katib-mutating-webhook-config
NAME                            WEBHOOKS   AGE
katib-mutating-webhook-config   2          9d
+ kubectl delete mutatingwebhookconfigurations katib-mutating-webhook-config
mutatingwebhookconfiguration.admissionregistration.k8s.io "katib-mutating-webhook-config" deleted
+ kubectl apply -f manifests/v1beta1
namespace/kubeflow configured
+ kubectl apply -f manifests/v1beta1/katib-controller
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/experiments.kubeflow.org configured
customresourcedefinition.apiextensions.k8s.io/suggestions.kubeflow.org configured
customresourcedefinition.apiextensions.k8s.io/trials.kubeflow.org configured
configmap/katib-config configured
clusterrole.rbac.authorization.k8s.io/katib-controller configured
serviceaccount/katib-controller configured
clusterrolebinding.rbac.authorization.k8s.io/katib-controller configured
Warning: resource secrets/katib-controller is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
secret/katib-controller configured
service/katib-controller configured
configmap/trial-template configured
The Deployment "katib-controller" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"katib-controller"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
make: *** [Makefile:28: deploy] Error 1

PS not an expert with K8S, sorry if the answer is obvious. I also assume I should also delete the secret I created as a workaround to make sure the default init works properly ?

@andreyvelich
Copy link
Member Author

@trog-levrai Thank you for the testing!

First of all, you have to properly uninstall Katib from your minikube clutser.
Run undeploy.sh from the master branch.

Then, try to use my branch to install updated Katib manifests.
Note that you have to change images for the Katib components.
To build them run build.sh script.
After updating the images, run deploy.sh script.

@tenzen-y
Copy link
Member

@tenzen-y @imilos @zuiurs @trog-levrai I tested this change on GKE cluster with v1.19.7-gke.1500 version and the new build is working for me. I didn't receive error that you noticed here: #1395.

It would be great to see how it works on your platforms since you were facing problems with CommonName before.

/cc @gaocegege @johnugeorge

@andreyvelich

I tested your changed on K8s v1.19.8 created by kubeadm.
Then, I got the following errors.

  • Pods in kubectlow
$ kubectl get pods -n kubeflow
NAME                                READY   STATUS                  RESTARTS   AGE
katib-controller-5b7d756869-kj44l   0/1     Init:CrashLoopBackOff   5          11m
katib-db-manager-6895b946bc-g9bwr   1/1     Running                 0          11m
katib-mysql-bd448c8b5-pxvbh         1/1     Running                 0          11m
  • cert-generator's logs
$ kubectl logs -n kubeflow $(kubectl get pods -n kubeflow -l app=katib-controller -ojsonpath='{.items[*].metadata.name}') -c cert-generator
INFO: Creating certs in tmpdir /tmp/tmp.cHLBnl Generating RSA private key, 2048 bit long modulus (2 primes)................................................................+++++..........................................+++++e is 65537 (0x010001)INFO: Creating CSR: katib-controller.kubeflow 
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
NAME                        AGE     SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   6m16s   kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Approved,Failed
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io "katib-controller.kubeflow" deleted
WARN: Previous CSR was found and removed.
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io/katib-controller.kubeflow created
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
NAME                        AGE   SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   0s    kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Pending
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io/katib-controller.kubeflow approved
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
ERROR: After approving csr katib-controller.kubeflow, the signed certificate did not appear on the resource. Giving up after 1 minute.
  • certificatesigningrequests
$ kubectl get certificatesigningrequests.certificates.k8s.io 
NAME                        AGE   SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   9s    kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Approved,Failed
  • certificatesigningrequests status
$ kubectl get certificatesigningrequests.certificates.k8s.io katib-controller.kubeflow -ojsonpath='{.status.conditions}' | jq .
[
  {
    "lastTransitionTime": "2021-02-28T10:54:40Z",
    "lastUpdateTime": "2021-02-28T10:54:40Z",
    "message": "This CSR was approved by kubectl certificate approve.",
    "reason": "KubectlApprove",
    "status": "True",
    "type": "Approved"
  },
  {
    "lastTransitionTime": "2021-02-28T10:54:40Z",
    "lastUpdateTime": "2021-02-28T10:54:40Z",
    "message": "invalid usage for client certificate: server auth",
    "reason": "SignerValidationFailure",
    "status": "True",
    "type": "Failed"
  }
]

I think that server auth isn't included in usages in kubernetes.io/kube-apiserver-client.

https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers

  1. Permitted key usages - must include ["client auth"]. Must not include key usages beyond ["digital signature", "key encipherment", "client auth"].
  signerName: kubernetes.io/kube-apiserver-client
  usages:
  - digital signature
  - key encipherment
  - server auth

Thanks.

@andreyvelich
Copy link
Member Author

@tenzen-y @imilos @zuiurs @trog-levrai I tested this change on GKE cluster with v1.19.7-gke.1500 version and the new build is working for me. I didn't receive error that you noticed here: #1395.
It would be great to see how it works on your platforms since you were facing problems with CommonName before.
/cc @gaocegege @johnugeorge

@andreyvelich

I tested your changed on K8s v1.19.8 created by kubeadm.
Then, I got the following errors.

  • Pods in kubectlow
$ kubectl get pods -n kubeflow
NAME                                READY   STATUS                  RESTARTS   AGE
katib-controller-5b7d756869-kj44l   0/1     Init:CrashLoopBackOff   5          11m
katib-db-manager-6895b946bc-g9bwr   1/1     Running                 0          11m
katib-mysql-bd448c8b5-pxvbh         1/1     Running                 0          11m
  • cert-generator's logs
$ kubectl logs -n kubeflow $(kubectl get pods -n kubeflow -l app=katib-controller -ojsonpath='{.items[*].metadata.name}') -c cert-generator
INFO: Creating certs in tmpdir /tmp/tmp.cHLBnl Generating RSA private key, 2048 bit long modulus (2 primes)................................................................+++++..........................................+++++e is 65537 (0x010001)INFO: Creating CSR: katib-controller.kubeflow 
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
NAME                        AGE     SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   6m16s   kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Approved,Failed
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io "katib-controller.kubeflow" deleted
WARN: Previous CSR was found and removed.
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io/katib-controller.kubeflow created
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
NAME                        AGE   SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   0s    kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Pending
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
certificatesigningrequest.certificates.k8s.io/katib-controller.kubeflow approved
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
Warning: certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
ERROR: After approving csr katib-controller.kubeflow, the signed certificate did not appear on the resource. Giving up after 1 minute.
  • certificatesigningrequests
$ kubectl get certificatesigningrequests.certificates.k8s.io 
NAME                        AGE   SIGNERNAME                            REQUESTOR                                         CONDITION
katib-controller.kubeflow   9s    kubernetes.io/kube-apiserver-client   system:serviceaccount:kubeflow:katib-controller   Approved,Failed
  • certificatesigningrequests status
$ kubectl get certificatesigningrequests.certificates.k8s.io katib-controller.kubeflow -ojsonpath='{.status.conditions}' | jq .
[
  {
    "lastTransitionTime": "2021-02-28T10:54:40Z",
    "lastUpdateTime": "2021-02-28T10:54:40Z",
    "message": "This CSR was approved by kubectl certificate approve.",
    "reason": "KubectlApprove",
    "status": "True",
    "type": "Approved"
  },
  {
    "lastTransitionTime": "2021-02-28T10:54:40Z",
    "lastUpdateTime": "2021-02-28T10:54:40Z",
    "message": "invalid usage for client certificate: server auth",
    "reason": "SignerValidationFailure",
    "status": "True",
    "type": "Failed"
  }
]

I think that server auth isn't included in usages in kubernetes.io/kube-apiserver-client.

https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers

  1. Permitted key usages - must include ["client auth"]. Must not include key usages beyond ["digital signature", "key encipherment", "client auth"].
  signerName: kubernetes.io/kube-apiserver-client
  usages:
  - digital signature
  - key encipherment
  - server auth

Thanks.

Make sense, thank you for the testing @tenzen-y!
I changed signerName to kubernetes.io/kubelet-serving to make sure that all usages are supported.
Please can you double check on your side if that CSR is working ?

@andreyvelich
Copy link
Member Author

/retest

1 similar comment
@andreyvelich
Copy link
Member Author

/retest

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, thank you for the testing @tenzen-y!
I changed signerName to kubernetes.io/kubelet-serving to make sure that all usages are supported.
Please can you double check on your side if that CSR is working ?

@andreyvelich
I checked your changes.
Then, I got subject organization is not system:nodes errors.

$ # pods in kubeflow
$ kubectl get pods -n kubeflow 
NAME                                READY   STATUS                  RESTARTS   AGE
katib-controller-5b7d756869-6rcbj   0/1     Init:CrashLoopBackOff   5          11m
katib-db-manager-6895b946bc-jsblq   1/1     Running                 0          11m
katib-mysql-bd448c8b5-szbdt         1/1     Running                 0          11m
$ # certificatesigningrequests status
$ kubectl get certificatesigningrequests.certificates.k8s.io katib-controller.kubeflow -ojsonpath='{.status.conditions}' | jq .
[
  {
    "lastTransitionTime": "2021-03-01T01:42:20Z",
    "lastUpdateTime": "2021-03-01T01:42:20Z",
    "message": "This CSR was approved by kubectl certificate approve.",
    "reason": "KubectlApprove",
    "status": "True",
    "type": "Approved"
  },
  {
    "lastTransitionTime": "2021-03-01T01:42:20Z",
    "lastUpdateTime": "2021-03-01T01:42:20Z",
    "message": "subject organization is not system:nodes",
    "reason": "SignerValidationFailure",
    "status": "True",
    "type": "Failed"
  }
]

It looks like common name and organizations are not correct.

    1. Permitted subjects - organizations are exactly ["system:nodes"], common name starts with "system:node:".

https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#kubernetes-signers

It correctly worked when I fixed it.

$ # certificatesigningrequests
$ kubectl get certificatesigningrequests.certificates.k8s.io 
NAME                        AGE    SIGNERNAME                      REQUESTOR                                         CONDITION
katib-controller.kubeflow   112s   kubernetes.io/kubelet-serving   system:serviceaccount:kubeflow:katib-controller   Approved,Issued
# certificatesigningrequests status
$ kubectl get certificatesigningrequests.certificates.k8s.io katib-controller.kubeflow -ojsonpath='{.status.conditions}' | jq .
[
  {
    "lastTransitionTime": "2021-03-01T02:17:24Z",
    "lastUpdateTime": "2021-03-01T02:17:24Z",
    "message": "This CSR was approved by kubectl certificate approve.",
    "reason": "KubectlApprove",
    "status": "True",
    "type": "Approved"
  }
]
$ # deploy tpe-example.yaml
$ kubectl apply -f examples/v1beta1/tpe-example.yaml 
experiment.kubeflow.org/tpe-example created
$ kubectl get pods -n kubeflow 
NAME                                READY   STATUS    RESTARTS   AGE
katib-controller-5b7d756869-5gsbv   1/1     Running   0          3m51s
katib-db-manager-6895b946bc-x26wm   1/1     Running   0          3m52s
katib-mysql-bd448c8b5-zhmhj         1/1     Running   0          3m51s
katib-ui-5dbbdc6596-b6bb2           1/1     Running   0          3m40s
tpe-example-5nljhpbf-6prx9          2/2     Running   0          24s
tpe-example-b5wm8n2t-bfrpr          2/2     Running   0          24s
tpe-example-d2mscqgn-fnxvt          2/2     Running   0          24s
tpe-example-tpe-77dcbf7548-msk67    1/1     Running   0          44s

Thanks.

hack/cert-generator.sh Outdated Show resolved Hide resolved
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich

It looks like that would be better to use environment variables in hack/cert-generator.sh instead of fixing the namespace.

manifests/v1beta1/katib-controller/katib-controller.yaml Outdated Show resolved Hide resolved
hack/cert-generator.sh Outdated Show resolved Hide resolved
@johnugeorge
Copy link
Member

/lgtm

@andreyvelich
Copy link
Member Author

Thanks everyone to help on this PR!
/hold cancel

@google-oss-robot google-oss-robot merged commit 12e7f1e into kubeflow:master Mar 6, 2021
knkski added a commit to knkski/katib that referenced this pull request Mar 9, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 9, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 9, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 9, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 9, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 10, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 11, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 11, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
knkski added a commit to knkski/katib that referenced this pull request Mar 11, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
google-oss-robot pushed a commit that referenced this pull request Mar 12, 2021
Updates Docker image to include changes from #1450, and updates
operator to latest version of operator framework.
andreyvelich pushed a commit to andreyvelich/katib that referenced this pull request Mar 12, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
andreyvelich pushed a commit to andreyvelich/katib that referenced this pull request Mar 12, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
andreyvelich pushed a commit to andreyvelich/katib that referenced this pull request Mar 12, 2021
Updates Docker image to include changes from kubeflow#1450, and updates
operator to latest version of operator framework.
@andreyvelich andreyvelich deleted the issue-1405-disable-dynamic-webhooks branch October 6, 2021 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable dynamic webhook creation in the Katib controller