Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS passthrough fails if Client Hello is fragmented in multiple TCP packets. #11424

Open
Dirbaio opened this issue Jun 3, 2024 · 4 comments · May be fixed by #11843
Open

TLS passthrough fails if Client Hello is fragmented in multiple TCP packets. #11424

Dirbaio opened this issue Jun 3, 2024 · 4 comments · May be fixed by #11843
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Dirbaio
Copy link
Contributor

Dirbaio commented Jun 3, 2024

Since a few days ago, GitHub sends the Client Hello fragmented in a few TCP packets of 1 byte, then one TCP packet with the rest of the Client Hello. I have no idea why, it seems to be random, affecting about 5% of the webhook deliveries.

See this wireshark capture: there's a few 1-byte packets, then the packet with id 5793 sends the rest of the Client Hello:

screenshot-2024-06-03_22-23-27

The TCP proxy is doing a single TCP Read() call to read the Client Hello, here.

length, err := conn.Read(data)

This read is receiving only the 1st byte. This causes parser.GetHostname(data) to fail, which causes the TCP connection to be incorrectly routed to nginx, instead of to the TLS passthrough destination. nginx-ingress doesn't have the certificate for this host, causing the TLS handshake to fail.

I'm not sure how to fix this. Seems the proxy should read in a loop until it's received the whole Client Hello?

@Dirbaio Dirbaio added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 3, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@longwuyuan
Copy link
Contributor

@Dirbaio the tcpdump and its analysis looks like a super significant and highly effective debug piece of information.

But this being the githib issues section for the project, its better if the issue report is descriptive and detailed.

The template of a new bug report will show you questions that are asked by the project. Please look at that template. Then edit your issue description and answer those questions. It will help.

Github's response component in a client HELO and the ingress nginx controller added with a TCP proxy sort of require a deep diving expert to even comprehend what problem you are reporting. Reproducing the problem is also a guess work based on the info provided.

/remove-kind bug
/kind support
/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jun 4, 2024
@alshain
Copy link

alshain commented Jun 20, 2024

We are impacted by this through a variation of tldr.fail/

Due to Chrome now enabling hybridized Kyber support by default, ClientHello is always fragmented and our users run into this bug every now and then with requests failing randomly with cert errors.

(Unlikely) If the initial page load fails, the user is greeted by a cert error.

(Much more likely) However, if a background request fails, the webpage will just randomly misbehave: CSS might be missing, some JavaScript might not load, a JS-triggered request might fail etc. These cert errors are only visible in the browser console.

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 21, 2024
@maxl99 maxl99 linked a pull request Aug 21, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

Successfully merging a pull request may close this issue.

4 participants