-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhook server could not answer in time #30
Comments
There is a lag between deploying the cert-manager and its ability to hand out certs. There are some steps that are outlined here to make sure the cert-manager is operational: https://cert-manager.io/docs/installation/verify/#manual-verification Can you add this step to your deployment to see if that solves the issue? We automated this wait in the following script: https://github.com/vertica/vertica-kubernetes/blob/main/scripts/wait-for-cert-manager-ready.sh |
Hi @spilchen , |
How did you run these two steps in your repro?
|
Attached helm charts: Deploy steps:
Ready state check:
based on the output the operator and the webhook was ready |
This is an issue with the operator-sdk framework we are using. There was a PR that went into the controller-runtime that will help alleviate this (kubernetes-sigs/controller-runtime#1588). It provides a true health check that makes sure the webhook server is up and running. This only went into controller-runtime in July in the v0.9.3 release -- for comparison the current framework we use is on v0.7.2. And there was a minor fix for it that went into the v0.9.6 release. So I'm a bit resistant to move up controller-runtime to pick this up as I don't want to destabilize things. Is this a super urgent problem that needs to be fixed? |
Do you mean that the /readyz gives false positive response? It could be a serious issue. I have to try some scenarios and I will back. |
The /readyz probe just tells you whether the pod is running. It doesn't tell you if the webhook port is being listened on. The listener is setup shortly after the pod starts. Until that happens, there is a timing window that fails any webhook request that comes in. |
Hi @spilchen, |
So I would like to keep this issue in opened for the controller-runtime version increase. It is not so urgent. |
The e2e tests hit a failure because we had tried to create a VerticaDB before the webhook was fully up. This is a known issue (#30) with the webhook. We are going to work around this for now by adding a wait script when the tests issue make deploy.
Is it possible to work around this issue in helm? My use case: Installing a VerticaDB resource in a helm chart. I'd like to install verticadb-operator via a dependency, but doing so seems to trigger this issue. A clean installation fails with the following error:
|
It isn't helm based, but we have a script that works around this issue that we use in our development environment (scripts/wait-for-webhook.sh). However, we are in the process of upgrading the go packages in #165. This will bring in a new controller-runtime that properly implements a health check and should resolve this issue. |
Thanks. I already tried polling the webhook in a pre-install job. However, the way helm merges chart dependencies causes my job to run before any operator resources are deployed, and then stalls until timeout. |
We are still seeing cases where the |
I see that as well. Our current solution is to run the aforementioned pre-install job in our chart and install both charts from Terraform sequentially instead of helm dependencies. |
REPRODUCTION
kubectl -n <namespace> apply -f <path-to-the-attached-file>
The strange thing was that the webhook got the vertica descriptor, and a response was sent by the webhook (check the logs below). I guess there was a timeout between the webhook and the kubertetes API server, because I had to wait at least 1,5 sec to got any response, but I do not know how can I check it to give more details. If it was a timeout then it could be a wrong connection pool handling in the webhook or a wrong configuration in the kube-rbac-proxy.
webhook manager container log:
The text was updated successfully, but these errors were encountered: