Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[injector] TLS handshake error from masters #351

Closed
rrondeau opened this issue Oct 7, 2020 · 11 comments
Closed

[injector] TLS handshake error from masters #351

rrondeau opened this issue Oct 7, 2020 · 11 comments
Labels
area/connect Related to Connect service mesh, e.g. injection type/question Question about product, ideally should be pointed to discuss.hashicorp.com

Comments

@rrondeau
Copy link
Contributor

rrondeau commented Oct 7, 2020

Hi

I upgrade my gke cluster to k8s 1.16 from 1.15 on monday.
The cluster version is the only thing i change from last week.
I never had these logs/problems, we are using consul injection for more than a year.
Since this upgrade, i'm seeing a lot of TLS handshake error in my consul injector logs.

Sample of logs :

consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:01:27.646Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:01:34.726Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020/10/07 15:01:35 http: TLS handshake error from 172.16.0.36:53730: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020/10/07 15:01:35 http: TLS handshake error from 172.16.0.36:53732: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:01:46.511Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:02:27.519Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:02:28.025Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:02:28.236Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:02:28.418Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:02:57.409Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:03:16.152Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020/10/07 15:03:16 http: TLS handshake error from 172.16.0.36:56438: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:03:24.350Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:04:18.638Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:04:18.998Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:04:19.069Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:04:19.176Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:04:19.561Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020/10/07 15:04:19 http: TLS handshake error from 172.16.0.36:39268: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:04:19.624Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:04:25.206Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:04:27.380Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:04:27.675Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:04:52.090Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:05:20.753Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:05:20.943Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020/10/07 15:06:58 http: TLS handshake error from 172.16.0.36:33998: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020/10/07 15:06:58 http: TLS handshake error from 172.16.0.36:34004: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:06:58.977Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:06:59.327Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:07:04.919Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:07:05.103Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:07:12.350Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020/10/07 15:07:12 http: TLS handshake error from 172.16.0.36:43820: remote error: tls: bad certificate
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp sidecar-injector 2020-10-07T15:07:18.622Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:07:19.286Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8 sidecar-injector 2020-10-07T15:08:13.099Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s

My injectors status :

~$ kubectl -n consul get pods -l component=connect-injector
NAME                                                         READY   STATUS    RESTARTS   AGE
consul-connect-injector-webhook-deployment-66b7c5cc4-c5wlp   1/1     Running   0          31h
consul-connect-injector-webhook-deployment-66b7c5cc4-pmdv8   1/1     Running   0          2d5h

We are using consul 1.8.4 deployed with hashicorp consul-helm chart in a GKE cluster.

Do you have any idea how to resolve this ?

@ishustava ishustava added area/connect Related to Connect service mesh, e.g. injection type/question Question about product, ideally should be pointed to discuss.hashicorp.com labels Oct 7, 2020
@ishustava
Copy link
Contributor

Hey @rrondeau

Thanks for this issue!

Could you share some additional info with us:

  • Your helm values
  • Your helm chart version?

Are you running two instances of the connect injector? or is one of them an old instance?

@ishustava ishustava added the waiting-reply Waiting on the issue creator for a response before taking further action label Oct 7, 2020
@lkysow
Copy link
Member

lkysow commented Oct 7, 2020

Is it also possible that there's some health checker hitting this endpoint?

@lkysow
Copy link
Member

lkysow commented Oct 7, 2020

Also, do you know what has the IP 172.16.0.36 ?

@rrondeau
Copy link
Contributor Author

rrondeau commented Oct 7, 2020

Also, do you know what has the IP 172.16.0.36 ?

Yep this is the ip address of the k8s master api of my GKE cluster.

Is it also possible that there's some health checker hitting this endpoint?

Nope i dont think so

Hey @rrondeau

Thanks for this issue!

Could you share some additional info with us:

* Your helm values

* Your helm chart version?

Are you running two instances of the connect injector? or is one of them an old instance?

We are using the last helm version, 0.24.1, and we always had 2 injectors.

My values looks like this : https://gist.github.com/rrondeau/cee35a97152737632b0541f56b2648c5

@ishustava
Copy link
Contributor

Running two injectors is likely the issue here, and I'm surprised you haven't seen problems with it in the past. We don't allow setting replicas for the connect injector (it's always hard-coded to 1), but this is definitely something on our backlog to fix soon!

The reason that running two or more injectors is not supported is because each instance generates a CA and a cert for the connect-inject webhook. It also makes a call to Kubernetes API to update the configuration for the webhook to set the CA. When you run two instances, each of them will generate its own CA and each will call Kubernetes API to update the config. In this case, only one of the instances will be healthy - the one that called the Kubernetes API last. If you check the logs for the other instance, you will likely see no errors there.

There is a workaround though if you want to run two instances: you can provide your own certs to the connect injector by setting the connectInject.certs properties.

@rrondeau
Copy link
Contributor Author

rrondeau commented Oct 7, 2020

ok i dont know how it worked but it did !
I did not see any hints in the docs about this limitation :|
I will try with only one injector or with the workaround tomorrow.
I made a PR in the helm repo a while ago hashicorp/consul-helm#338 but nobody mentioned the limitation either.

Thanks for the help !

@ishustava
Copy link
Contributor

I did not see any hints in the docs about this limitation :|

Sorry about that! We've discovered it recently ourselves and didn't update the docs 😞

Let us know if running only one instance or using the workaround fixes this for you.

I made a PR in the helm repo a while ago hashicorp/consul-helm#338 but nobody mentioned the limitation either.

Thanks for this PR! We'll likely use it as a base, but we'll also need to fix cert issue. We've already added a separate webhook cert manager (hasn't been merged to master or released yet), but we'll need to hook it up to the injector. No need to update your PR though!

sorry again for the incovenience and this requirement not being very clear!

@ishustava ishustava removed the waiting-reply Waiting on the issue creator for a response before taking further action label Oct 7, 2020
@rrondeau
Copy link
Contributor Author

rrondeau commented Oct 8, 2020

@ishustava Thanks for all the info and the help, I dont have any TLS error now.
I scaled down the deployment and i will wait for the real fix.

Thanks again for all your hard work !

@rrondeau rrondeau closed this as completed Oct 8, 2020
@jeanmorais
Copy link
Contributor

Hey @ishustava,

I realized that the option to set the number of replicas for the injector has already been implemented.

However, is the support for +1 replicas already ok? Errors like remote error: tls: bad certificate shouldn't happen anymore, right?

Sorry for sending the message in an issue closed so long ago. I would appreciate it if you could clarify.

@lkysow
Copy link
Member

lkysow commented Oct 5, 2021

Hi Jean, yes it should be fixed. Are you still seeing that? If so can you please open up a new issue with the relevant fields set and we can look into it!

@jeanmorais
Copy link
Contributor

@lkysow I have not noticed this error anymore. I just wanted to confirm it.

Thanks for the fast response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connect Related to Connect service mesh, e.g. injection type/question Question about product, ideally should be pointed to discuss.hashicorp.com
Projects
None yet
Development

No branches or pull requests

4 participants