Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KServe and cert-manager webhooks are failing #2660

Closed
biswajit-9776 opened this issue Mar 20, 2024 · 27 comments
Closed

KServe and cert-manager webhooks are failing #2660

biswajit-9776 opened this issue Mar 20, 2024 · 27 comments

Comments

@biswajit-9776
Copy link
Contributor

biswajit-9776 commented Mar 20, 2024

While isntalling Kubeflow using the command:

while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Some webhooks could not be reached:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.218.186:443: connect: connection refused
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root ce rtificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterservingruntime.kserve-webhook-server.validator": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block
[biswa@fedora manifests]$ sudo kubectl get endpoints -n cert-manager cert-manager-webhook
NAME                   ENDPOINTS          AGE
cert-manager-webhook   10.244.0.8:10250   108m

The K-serve webhook issue was previously encountered in #2553. Should changes made in #2627 prevent reproducing this error? As for cert-manager webhook, #2585 had problem with no route to host while mine has with refused connection. It could be a kubernetes root level issue or deeper networking stack issue as in https://cert-manager.io/docs/troubleshooting/webhook/#cause-2-eks-on-a-custom-cni

kustomize version:

v5.3.0

My kubectl pods are:

[biswa@fedora manifests]$ sudo kubectl get pods -A
NAMESPACE            NAME                                                              READY   STATUS              RESTARTS       AGE
auth                 dex-5d8fffb998-qq49q                                              1/1     Running             0              94m
cert-manager         cert-manager-5b8f9b9d96-l7vj7                                     1/1     Running             0              94m
cert-manager         cert-manager-cainjector-54f68bfb64-m6x5f                          1/1     Running             0              94m
cert-manager         cert-manager-webhook-f6c8487d6-9x6x4                              1/1     Running             0              94m
istio-system         cluster-local-gateway-7bd9cffcb5-thdkb                            1/1     Running             0              94m
istio-system         configure-kubernetes-oidc-issuer-jwks-in-requestauthenticasxnfl   0/1     Completed           0              94m
istio-system         istio-ingressgateway-666f789ccb-wcqdc                             1/1     Running             0              94m
istio-system         istiod-6cd8c6c59c-htqzn                                           1/1     Running             0              94m
knative-eventing     eventing-controller-688dc8df9f-9fxpp                              1/1     Running             0              94m
knative-eventing     eventing-webhook-8c6cc5bc7-789xh                                  1/1     Running             0              94m
knative-serving      activator-55cd894f6c-dr9q4                                        1/1     Running             8 (36m ago)    94m
knative-serving      autoscaler-76748895b9-shk8t                                       2/2     Running             0              56m
knative-serving      controller-76dcf67d5-7tb5w                                        2/2     Running             0              56m
knative-serving      domain-mapping-f5d4dbc56-pbz5q                                    2/2     Running             0              56m
knative-serving      domainmapping-webhook-6f67684cd8-nlnsf                            2/2     Running             0              55m
knative-serving      net-istio-controller-7bb6fb5f58-tklxs                             2/2     Running             0              55m
knative-serving      net-istio-webhook-7d8476f6-svcjf                                  2/2     Running             0              55m
knative-serving      webhook-d5cbdf855-bzmsx                                           2/2     Running             0              55m
kube-system          coredns-565d847f94-cd9dp                                          1/1     Running             0              96m
kube-system          coredns-565d847f94-lc62z                                          1/1     Running             0              96m
kube-system          etcd-kubeflow-control-plane                                       1/1     Running             0              96m
kube-system          kindnet-qzthr                                                     1/1     Running             0              96m
kube-system          kube-apiserver-kubeflow-control-plane                             1/1     Running             0              96m
kube-system          kube-controller-manager-kubeflow-control-plane                    1/1     Running             0              96m
kube-system          kube-proxy-9zct2                                                  1/1     Running             0              96m
kube-system          kube-scheduler-kubeflow-control-plane                             1/1     Running             0              96m
kubeflow             admission-webhook-deployment-6cf44ffbdb-5m86s                     0/1     ContainerCreating   0              55m
kubeflow             cache-server-7d94c87787-88m4h                                     0/2     Init:0/1            0              55m
kubeflow             centraldashboard-965564b75-6frpk                                  2/2     Running             0              55m
kubeflow             jupyter-web-app-deployment-757976b798-7ngdb                       0/2     Pending             0              55m
kubeflow             katib-controller-64bf8db8bd-nfn2k                                 0/1     ContainerCreating   0              55m
kubeflow             katib-db-manager-6d6885765-tqldd                                  1/1     Running             7 (40m ago)    55m
kubeflow             katib-mysql-db6dc68c-q7hbt                                        1/1     Running             0              55m
kubeflow             katib-ui-64b8f8d78c-vxttm                                         2/2     Running             0              55m
kubeflow             kserve-controller-manager-6df96f6d7c-wwxct                        0/2     ContainerCreating   0              55m
kubeflow             kserve-models-web-app-99849d9f7-rmfhk                             2/2     Running             0              55m
kubeflow             kubeflow-pipelines-profile-controller-59ccbd47b9-7875s            1/1     Running             0              55m
kubeflow             metacontroller-0                                                  1/1     Running             0              94m
kubeflow             metadata-envoy-deployment-5cbbb86fc9-pwpbw                        1/1     Running             0              55m
kubeflow             metadata-grpc-deployment-784b8b5fb4-rqw94                         1/2     CrashLoopBackOff    10 (49s ago)   55m
kubeflow             metadata-writer-844bd5d486-nm2j6                                  2/2     Running             4 (69s ago)    55m
kubeflow             minio-65dff76b66-brflp                                            0/2     Pending             0              55m
kubeflow             ml-pipeline-6c7c86f666-qbs65                                      0/2     PodInitializing     0              55m
kubeflow             ml-pipeline-persistenceagent-85c485f86f-j8qwx                     0/2     PodInitializing     0              55m
kubeflow             ml-pipeline-scheduledworkflow-6448c96f4f-98997                    0/2     PodInitializing     0              55m
kubeflow             ml-pipeline-ui-6db56c647b-b6ksz                                   0/2     Pending             0              55m
kubeflow             ml-pipeline-viewer-crd-5df88b6956-kpt68                           0/2     Pending             0              55m
kubeflow             ml-pipeline-visualizationserver-6d49897f85-p9msj                  0/2     Pending             0              55m
kubeflow             mysql-c999c6c8-phg5s                                              0/2     Pending             0              55m
kubeflow             notebook-controller-deployment-9ffdf65d7-bsn6b                    0/2     PodInitializing     0              55m
kubeflow             profiles-deployment-cbf679dbd-qwskr                               0/3     PodInitializing     0              55m
kubeflow             pvcviewer-controller-manager-d66667b49-mhn4n                      0/3     Pending             0              55m
kubeflow             tensorboard-controller-deployment-7444dc8fcd-gxvfr                0/3     Pending             0              55m
kubeflow             tensorboards-web-app-deployment-78f7c694bf-tp8z9                  0/2     Pending             0              55m
kubeflow             training-operator-69575765df-v9hl4                                1/1     Running             0              55m
kubeflow             volumes-web-app-deployment-6dfccd897d-xklf7                       0/2     Pending             0              55m
kubeflow             workflow-controller-f65c9d9b4-m4f9k                               0/2     PodInitializing     0              55m
local-path-storage   local-path-provisioner-684f458cdd-nvs75                           1/1     Running             0              96m
oauth2-proxy         oauth2-proxy-58d95869bf-5n6l5                                     1/1     Running             0              94m
oauth2-proxy         oauth2-proxy-58d95869bf-684pn                                     1/1     Running             0              94m
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 3, 2024

Can you try with the master branch as well? Please also check whether your install command is up to date in the master branch readme.md and follow the installation instructions with Kind as close as possible.

@juliusvonkohout juliusvonkohout changed the title K-Serve and cert-manager webhooks were failed to be called KServe and cert-manager webhooks are failing Apr 3, 2024
@dnapier
Copy link

dnapier commented Apr 3, 2024

I was able to resolve this by increasing the resources allocated to the machine. Was getting capped out by CPU, maybe you're facing similar?

@biswajit-9776
Copy link
Contributor Author

Can you try with the master branch as well? Please also check whether your install command is up to date in the master branch readme.md and follow the installation instructions with Kind as close as possible.

Hey @juliusvonkohout, yes my local machine's master branch is up to date.

@biswajit-9776
Copy link
Contributor Author

@dnapier Hi, I tried to increase CPU resources in the --kubeconfig file but it says there is no resources field in v1alpha4.Node. Could you please tell me what you tried?

@dnapier
Copy link

dnapier commented Apr 4, 2024

When I ran kubectl describe nodes, the cpu resources were maxed out. This was being done in a VM, so I simply added more cores to the machine. If you're doing the same and the core speeds are being limited by the host, you could raise the limit as well, but that was not the case for me.

image

I encountered another issue following this which was the activator of knative-serving crashing, but I do not believe that is related to the error you're seeing here.

@juliusvonkohout
Copy link
Member

@dnapier Hi, I tried to increase CPU resources in the --kubeconfig file but it says there is no resources field in v1alpha4.Node. Could you please tell me what you tried?

CC @diegolovison then

@diegolovison
Copy link
Contributor

Are you using kind with docker ?

@ALPHA-1503
Copy link

ALPHA-1503 commented Apr 9, 2024

Hello guys, I'm facing the same issues. I have to deploy Kubeflow for an Internship project and I have the same problem with Kubeflow v1.8
kustomize version : v5.3.0
cert-manager version : v0.12.1

After : "while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done" I get this error

Capture d'écran 2024-04-09 151931

My Kubernetes cluster is running with Tanzu.

@juliusvonkohout
Copy link
Member

Please just test with Kind as explained in the readme.md in the master branch, to make sure that it is not a Kubernetes issue of your own cluster.

@dnapier
Copy link

dnapier commented Apr 9, 2024

Are you using kind with docker ?

Sorry, I didn't catch that this was addressed to me. Yes in my case, I am using kind with docker. Debian 12 host.

@diegolovison
Copy link
Contributor

What is the amount of CPU and memory that you have available?
Were you strictly following https://github.com/kubeflow/manifests/#installation

@dnapier
Copy link

dnapier commented Apr 9, 2024

12GB of memory on the system, 8 core processor (Intel(R) Xeon(R) E5-2620).

And yes I was strictly following the installation instructions.

@ALPHA-1503
Copy link

Please just test with Kind as explained in the readme.md in the master branch, to make sure that it is not a Kubernetes issue of your own cluster.

I already tested the v1.8 on minikube and I'm facing the same issue...

@diegolovison
Copy link
Contributor

12GB of memory on the system, 8 core processor (Intel(R) Xeon(R) E5-2620).

I believe you will need to have more resources. I have 20 cores and 36GB of memory

minikube and I'm facing the same issue...

I wasn't able to make it work on Minikube. Only with kind

@ALPHA-1503
Copy link

I've just attempted to install it using a local kind cluster, but it didn't work. I'm encountering another issue...
!
issue-kind-kf

@dnapier
Copy link

dnapier commented Apr 10, 2024

I've just attempted to install it using a local kind cluster, but it didn't work. I'm encountering another issue... ! issue-kind-kf

That's the exact issue I'm facing which @diegolovison is suggesting is caused from lack of available resources. I'm working on doubling my memory to 24GB to test if that resolves it. Will update asap.

@ALPHA-1503
Copy link

Interesting.... I managed to install v1.8 on Minikube just now. I'm curious why it's working now. My suspicion is that I might encounter issues installing it on my Tanzu Cluster, perhaps due to a cluster-related problem.

@dnapier
Copy link

dnapier commented Apr 10, 2024

Interesting.... I managed to install v1.8 on Minikube just now. I'm curious why it's working now. My suspicion is that I might encounter issues installing it on my Tanzu Cluster, perhaps due to a cluster-related problem.

Do you mind sharing your cpu/memory for comparison?

@ALPHA-1503
Copy link

8 Cores/16G

@juliusvonkohout
Copy link
Member

minikube with podman worked for me with 16 GB if you strip down the example distribution down a bit. Otherwise you might need 32 GB. @diegolovison , we should add the memory and core requirements on top of the installation instructions with kind.

@diegolovison
Copy link
Contributor

Do you believe that 32 GB and 20 cores?

@juliusvonkohout
Copy link
Member

Do you believe that 32 GB and 20 cores?

I do not understand your question.

@diegolovison
Copy link
Contributor

should we document that 32 GB of RAM and 20 CPU cores are the minimal to install Kubeflow locally?

@dnapier
Copy link

dnapier commented Apr 15, 2024

should we document that 32 GB of RAM and 20 CPU cores are the minimal to install Kubeflow locally?

Not that I have a say here, but I think that's a great idea.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 15, 2024

I would go with 16 cores and 32 GB memory as recommendation. Or are you sure that 16 cores are not enough? It is possible to do with way less, but that is then left up to the end user.

@diegolovison
Copy link
Contributor

Ok. Sounds good

@juliusvonkohout
Copy link
Member

@biswajit-9776 Please retry with the lastest master branch and readme. If you still encounter problems please open a new issue with our new template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants