Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for configuring TLS ServerName for health checks #9475

Merged
merged 2 commits into from
Mar 17, 2021

Conversation

cbroglie
Copy link
Contributor

@cbroglie cbroglie commented Dec 29, 2020

Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes #9473.

@hashicorp-cla
Copy link

hashicorp-cla commented Dec 29, 2020

CLA assistant check
All committers have signed the CLA.

@jsosulska jsosulska added theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/health-checks Health Check functionality type/enhancement Proposed improvement or new feature labels Jan 25, 2021
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I think this change looks good, and the implementation looks both correct and complete. Nice work!

The only concern I have with this change is the refactor in http_decode_test.go. I appreciate the desire to reduce some of the boilerplate here, but I think these tests are already a quite difficult to understand, and with the reduced boilerplate I think unfortunately it makes then even harder to understand.

I suspect another way we can approach these tests is by defining two JSON inputs that set all the fields, one with all CamelCase keys, one with all snake_case keys, and use those two inputs to test the permutations. There will need to be one other test case for the one field that has more then 2 options, but that's probably ok.

I don't think we need to make that change in this PR, but I think it would be good to use the existing test structure until we can make that change. Alternatively if you wanted to try out that approach first and then introduce this additional field, that would be great as well.

We could also start that approach with only the new field being added, and later on come back and add the other fields as we remove the old test cases.

What do you think about that approach?

if !c.enableAgentTLSForChecks() {
return &tls.Config{
InsecureSkipVerify: skipVerify,
ServerName: serverName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a change in behaviour here where serviceName will be defaulted to the server or node name, where as previously it was not. I'm not sure if this is likely to cause problems or not.

Was this change intentional or incidental? I think if this part of the change isn't required it would be good to maintain the previous behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make this change intentionally: the client may not care to verify which cert the server is returning, but the server may still need the SNI to determine which cert to use.

I'm not sure if this is likely to cause problems or not.

I don't have a good sense of this either. I'd argue my change is good for consistency of SNI handling, but it's not required for my use case and I can revert it if we're worried about the behavior change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this some more to see if I can figure out how most servers handle this field. This change may very well be safe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFTR, this broke the use-case of using Vault as a CA for Nomad services.

The service health checks used to verify the certificate against the service IP so we put it into ip_sans. As of Consul 1.10 it checks against the unqualified hostname so they all got a mismatch and I had to change all my Nomad plans to use $alt_names := env "attr.unique.consul.name" | printf "alt_names=%s" -}} and re-deploy.

I'm surprised this didn't break more people's clusters but everybody is probably running with TLS verification off anyways. ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe #10481 is reporting this same issue, and should be fixed by #10490

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of the Consul server/ node name in the ServerName field of a client TLS config used exclusively for health checking doesn't appear to be a desirable behavior. I've opened #17481 to fix this and provide an explanation.

@cbroglie
Copy link
Contributor Author

I appreciate the desire to reduce some of the boilerplate here, but I think these tests are already a quite difficult to understand, and with the reduced boilerplate I think unfortunately it makes then even harder to understand.

Yep that's fair, I was just optimizing for future authors adding new fields. I'll revert my changes and take a look at either implementing your suggestion or just carrying forward the existing copy/paste.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you for working on this!

@@ -188,6 +188,11 @@ The table below shows this endpoint's support for
The value can be further limited for all checks of a given agent using the
`check_output_max_size` flag in the agent.

- `TLSServerName` `(string: "")` - Specifies an optional string used to set the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs looks good. I think we also need to document this in website/content/docs/discovery/checks.mdx , which is linked from the config docs.

@vercel
Copy link

vercel bot commented Feb 25, 2021

@cbroglie is attempting to deploy a commit to the HashiCorp Team on Vercel.

A member of the Team first needs to authorize it.

@cbroglie
Copy link
Contributor Author

@dnephin I believe I've made all the requested changes and rebased, let me know if anything is missing

Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes hashicorp#9473.
@vercel vercel bot temporarily deployed to Preview – consul March 16, 2021 22:22 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging March 16, 2021 22:22 Inactive
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks good to me.

I noticed that CI was not running on this PR. I believe that is because you have CircleCI enabled on your fork, so the jobs are running as part of your project and for some reason are not being linked back to github.

Once https://app.circleci.com/pipelines/github/cbroglie/consul looks good, I think this is ready to merge.

@dnephin dnephin merged commit cacd7cc into hashicorp:master Mar 17, 2021
@hashicorp-ci
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/338341.

@cbroglie cbroglie deleted the tls-server-name branch March 17, 2021 02:33
@cbroglie
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/health-checks Health Check functionality type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support configuring ServerName for HTTP health checks
7 participants