Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KServe WebHook throws 500 due to loading root certificates #2553

Closed
kimwnasptd opened this issue Oct 16, 2023 · 9 comments · Fixed by #2627
Closed

KServe WebHook throws 500 due to loading root certificates #2553

kimwnasptd opened this issue Oct 16, 2023 · 9 comments · Fixed by #2627

Comments

@kimwnasptd
Copy link
Member

kimwnasptd commented Oct 16, 2023

@DnPlas bumped into this in #2552
https://github.com/kubeflow/manifests/actions/runs/6535263237/job/17747221819

This could be a transient issue that would require us to wait before applying the rest of the manifests, or bluntly reapplying. Will try to reproduce locally to see if it's indeed a race condition

cc @yuzisun

certificate.cert-manager.io/serving-cert unchanged
issuer.cert-manager.io/selfsigned-issuer unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io configured
validatingwebhookconfiguration.admissionregistration.k8s.io/clusterservingruntime.serving.kserve.io configured
validatingwebhookconfiguration.admissionregistration.k8s.io/inferencegraph.serving.kserve.io configured
validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io configured
validatingwebhookconfiguration.admissionregistration.k8s.io/servingruntime.serving.kserve.io configured
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io configured
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
@kimwnasptd
Copy link
Member Author

I tried to re-run this action but looks like it was re-triggered

@kimwnasptd
Copy link
Member Author

kimwnasptd commented Oct 16, 2023

I see the following errors a little bit above. Maybe we are missing some manifests to apply? Looks like there should be a ClusterServingRuntime CRD applied

resource mapping not found for name: "kserve-lgbserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io created
ensure CRDs are installed first
resource mapping not found for name: "kserve-mlserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-paddleserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-pmmlserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-sklearnserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-tensorflow-serving" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-torchserve" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-tritonserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first
resource mapping not found for name: "kserve-xgbserver" namespace: "" from "STDIN": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
ensure CRDs are installed first

EDIT: The above is incorrect. We explicitly wait for the CRD to be established and then re-apply the manifests
https://github.com/kubeflow/manifests/blob/master/tests/gh-actions/install_kserve.sh#L8

@kimwnasptd
Copy link
Member Author

@yuzisun @DnPlas I found the bug. In v0.11.1 I see two of the webhooks that reference an incorrect Cert Manager Certificate:

  1. The clusterservingruntime.serving.kserve.io ValidationWebhookConfiguration https://github.com/kserve/kserve/blob/eea3ae94bd5db268b7a01b2bec87cdd5dd7d668f/install/v0.11.1/kserve_kubeflow.yaml#L21371
  2. The servingruntime.serving.kserve.io ValidationWebhookConfiguration https://github.com/kserve/kserve/blob/eea3ae94bd5db268b7a01b2bec87cdd5dd7d668f/install/v0.11.1/kserve_kubeflow.yaml#L21467

Those two should instead have an annotation cert-manager.io/inject-ca-from: kubeflow/serving-cert like the rest of the webhooks. After making this change the installation continues as expected. This will mean we'll need a new minor version of KServe for this.

@kimwnasptd
Copy link
Member Author

In v0.11.0 those were had the correct values. Maybe the culprit is kserve/kserve#3083?

@DnPlas
Copy link
Contributor

DnPlas commented Oct 17, 2023

Just FYI, the Kserve WG is working on a fix atm: kserve/kserve#3188.

@umka1332
Copy link

Just FYI, the Kserve WG is working on a fix atm: kserve/kserve#3188.

@kimwnasptd fyi: KServe 0.11.2 release was released with this fix

@juliusvonkohout
Copy link
Member

ill put this on the Agenda

@juliusvonkohout
Copy link
Member

This should be closed by #2627

/close

Copy link

@juliusvonkohout: Closing this issue.

In response to this:

This should be closed by #2627

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants