-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rejected peer communication in K8S cluster with TLS and 3.2+ #8268
Comments
Can you double-check the etcd version? Seems duplicate with #8206, fix #8223 is included in 3.2.3. /cc @heyitsanthony |
should have been fixed in 40468ab... @cehoffman what does |
I just double checked the test again and it for sure is using 3.2.3. First member log
2nd member
|
There's a DNS SAN on that cert but no IP SAN, so all it can do is check the DNS. |
I've tried it with both DNS only and IP with DNS. This case is using only DNS because it was the last change I tried and it was working fine with 3.1.10. I didn't switch it back. I used the same config to confirm I had failed with 3.2.3. I can add IPs, but again IPs are not useful since I don't for sure know which IPs I'll have ahead of time when used with etcd-operator. The cluster still fails, but the peer now rejects connection from the initial member. |
@cehoffman is the issue now that you want to disable SAN checking or is it that is etcd failing to confirm the DNS SAN resolves to the incoming connection's address? What's in the DNS records? |
I believe the SAN checking when only DNS names are specified is the desired behavior. The DNS resolution is working and all the members follow the names pattern generically
It seems to me that etcd peer server is rejecting communication from peer clients because it always wants to use the connecting client IP for validation in the peer certificate. Running in kubernetes with etcd-operator, it only seems possible for the peer client to verify it is talking to a valid peer server and not for a peer server to validate the address of a peer client unless the peer client also passes along the DNS name it can be reached at. |
I'm having a similar issue with both 3.2.2 and 3.2.3. My cluster comes up healthy but when I kill an etcd pod and then a new one gets scheduled, I will see this error from all living peers and the new pod will just die because it can't resume state. |
Is this issue definitely fixed? I've just tried to update my cluster to v3.2.9 from v3.1.5, but after updating the first node, I received these same |
The IP lookup seems to be based on PTR queries. If your PTR queries don't come up properly, this will fail. I'm seeing this in an environment where we don't manipulate PTR records. |
@pires thanks for the reply, PTR records of course... should have seen that. I don't have those unfortunately, it's private address space in an AWS VPC. Using only a PTR lookup seems a bit fragile. I would have thought doing a lookup on the wildcard entries from the certs would be better, since you'd only need this info when establishing connections for the first time with new nodes. Maybe I'm missing something. In the meantime I might have to disable peer cert auth if I want to upgrade to v3.2.x, which I'd really prefer not to. |
Client auth should be fine. It's peer auth that's failing. |
My bad s/client/peer/, I've been doing that all afternoon. |
I'm having the same problems. I'm using a VPC in AWS, so I cannot create PTR records. What are the alternatives? The only ones I see are to disable the peer auth, or to not upgrade to 3.2.x at all (as with the 3.1.x works). |
the behaviour I see is a bit weird. On a single node etcd cluster peer authentication is fine:
but right after second node joins, first node starts rejecting peers with the same certificate
connection is done from kube hosted pod, to a static pod on the host network through kube service. |
This is using etcd-operator 0.4.1 in K8S 1.6.7 to create the cluster with TLS. When creating the cluster with ETCD 3.1.8 the TLS certificates used for peer communication work without issue. When bumping the cluster to 3.2+ (3.2 and 3.2.3 tested) it results in a failed cluster. Also provisioning a new cluster in 3.2 fails with the same certificates. I am using the current method of generating the ETCD certificates from terraform-installer.
The heart of the issue seems to be that 3.2 is rejecting the IP address of the connecting peer since it doesn't match any of the DNS names in the certificate. Given the IP addresses of the ETCD members in a K8S deploy will be unknown until provision time, it seems like the peer check is overly restrictive and IP matching should only occur when the certificate explictly provides IPs to validate. I modified my generation to not include IPs in the certificate and it still fails. When providing the exact IPs that will be used in the certificate it also fails because it still is looking at the DNS names instead of the IP addresses in the certificate.
The text was updated successfully, but these errors were encountered: