Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdmain, pkg: Support peer and client TLS auth based on SAN fields. #10614

Merged
merged 5 commits into from
Jul 19, 2019

Conversation

jmillikin-stripe
Copy link
Contributor

Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.

This commit extends the peer validation logic to use Go's
(*"crypto/x509".Certificate).ValidateHostname function for name
validation, which allows SANs to be used for peer access control.

In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).

(See issue #8262 and PR #8616 for additional context)

@jingyih
Copy link
Contributor

jingyih commented Apr 15, 2019

cc @wenjiaswe

@@ -270,6 +270,11 @@ The security flags help to [build a secure etcd cluster][security].
+ default: ""
+ env variable: ETCD_CLIENT_CRL_FILE

### --client-cert-allowed-name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed-san?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking whether a given hostname is valid for a TLS certificate, it's typical to not mandate which particular field the name comes from. The implementation of (*x509.Certificate).VerifyHostname() uses SAN if any are set, and falls back to CN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The implementation checks CN first, then checks parsed SAN, right? Can we make this clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to infer this behavior from the flag and current help message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the verification logic is something like this:

func checkName(name string) bool {
  if len(SAN) == 0 {
    return validHostname(CN) && matches(CN, name)
  }
  for _, sanEntry := range SAN {
    if matches(sanEntry, name) {
      return true
    }
  }
  return false
}

The full behavior is described by RFC 6125 section 6.4, which is too long to describe here.

@xiang90
Copy link
Contributor

xiang90 commented Jun 27, 2019

@jmillikin-stripe

sorry for the long delay. this pr looks good to me. but can we change allowed-name to allowed-san or something more specific?

@wenjiaswe
Copy link
Contributor

@xiang90 If I understood right, x509's logic is, if a SSL Certificate has a Subject Alternative Name (SAN) field, then SSL clients are supposed to ignore the Common Name value and seek a match in the SAN list. As in RFC 6125, Appendix B.2.:

   If a subjectAltName extension of type dNSName is present, that MUST
   be used as the identity.  Otherwise, the (most specific) Common Name
   field in the Subject field of the certificate MUST be used.  Although
   the use of the Common Name is existing practice, it is deprecated and
   Certification Authorities are encouraged to use the dNSName instead.

I think in this PR, only one of --peer-cert-allowed-cn and --peer-cert-allowed-name can be set at a time. If --peer-cert-allowed-cn is set, then CommonName is used and it follows the old behavior, if new --peer-cert-allowed-name is used, then it follows "current best practice" to check SAN and ignore CN if SAN is set, otherwise check CN. @jmillikin-stripe did I understand correctly?

If that's the case, maybe we could use --peer-cert-allowed-hostname instead of --peer-cert-allowed-name to match (*x509.Certificate).VerifyHostname() function name?

@jmillikin-stripe
Copy link
Contributor Author

@wenjiaswe Yep, that all matches my understanding. I'd be fine with renaming the new flags to -hostname.

pkg/transport/listener.go Outdated Show resolved Hide resolved
pkg/transport/listener.go Outdated Show resolved Hide resolved
@jmillikin-stripe
Copy link
Contributor Author

Stray debugging prints removed, sorry about that.

@jmillikin-stripe
Copy link
Contributor Author

The test failure in ["TARGET=linux-amd64-integration-4-cpu"] looks flaky -- can an etcd maintainer re-try that test?

@jingyih
Copy link
Contributor

jingyih commented Jul 10, 2019

The test failure in ["TARGET=linux-amd64-integration-4-cpu"] looks flaky -- can an etcd maintainer re-try that test?

=== RUN TestBalancerUnderBlackholeKeepAliveWatch
--- FAIL: TestBalancerUnderBlackholeKeepAliveWatch (5.73s)
black_hole_test.go:83: took too long to receive watch events

This is a known flaky test, do not worry about it.

@jingyih
Copy link
Contributor

jingyih commented Jul 10, 2019

The semaphoreci tests timed out after 20m. Can you rebase to master? It was adjusted to 30m.

Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.

This commit extends the peer validation logic to use Go's
`(*"crypto/x509".Certificate).ValidateHostname` function for name
validation, which allows SANs to be used for peer access control.

In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).
@xiang90
Copy link
Contributor

xiang90 commented Jul 10, 2019

LGTM if all failures are known.

@jmillikin-stripe
Copy link
Contributor Author

The semaphore test is consistently failing with a timeout in the new test. Please don't merge until I can debug it.

@jmillikin-stripe jmillikin-stripe force-pushed the cert-allowed-san-flags branch 2 times, most recently from 683068f to 59072f2 Compare July 10, 2019 04:57
@codecov-io
Copy link

Codecov Report

Merging #10614 into master will decrease coverage by 1.03%.
The diff coverage is 23.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10614      +/-   ##
==========================================
- Coverage   63.35%   62.31%   -1.04%     
==========================================
  Files         398      398              
  Lines       37496    37507      +11     
==========================================
- Hits        23754    23371     -383     
- Misses      12144    12555     +411     
+ Partials     1598     1581      -17
Impacted Files Coverage Δ
etcdmain/config.go 82.88% <100%> (+0.15%) ⬆️
pkg/transport/listener.go 49.54% <9.09%> (-1.64%) ⬇️
clientv3/balancer/grpc1.7-health.go 5.23% <0%> (-53.78%) ⬇️
client/client.go 42.81% <0%> (-41.18%) ⬇️
client/keys.go 72.86% <0%> (-18.6%) ⬇️
auth/store.go 45.83% <0%> (-13.96%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-6.07%) ⬇️
clientv3/leasing/cache.go 87.22% <0%> (-4.45%) ⬇️
proxy/grpcproxy/watcher.go 89.79% <0%> (-4.09%) ⬇️
proxy/grpcproxy/register.go 80.55% <0%> (-2.78%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb7dd97...95f3138. Read the comment docs.

@jmillikin-stripe
Copy link
Contributor Author

SemaphoreCI is green now. I've discovered an interesting behavior in the Go TLS library: the error message returned from a failed handshake is not deterministic. When both sides of a TLS handshake distrust the other, the error races between "client certificate authentication failed" and "remote error: tls: bad certificate" depending on which side sends a NACK frame.

The test for CN tends toward the second error, and the new test tends toward the first.

@jmillikin-stripe
Copy link
Contributor Author

Gentle ping

@xiang90 xiang90 merged commit 9a69aa1 into etcd-io:master Jul 19, 2019
@jmillikin-stripe jmillikin-stripe deleted the cert-allowed-san-flags branch July 22, 2019 02:26
@jmillikin-stripe
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants