Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Success rate, throughput, and latency issues with HTTP/1 #1353

Closed
siggy opened this issue Jul 19, 2018 · 5 comments · Fixed by linkerd/linkerd2-proxy#26
Closed

Success rate, throughput, and latency issues with HTTP/1 #1353

siggy opened this issue Jul 19, 2018 · 5 comments · Fixed by linkerd/linkerd2-proxy#26

Comments

@siggy
Copy link
Member

siggy commented Jul 19, 2018

With linkerd2-proxy, observed 80% success rate and high latency when testing HTTP/1.

Test environment

  • compares 3 configurations
    • linkerd2-proxy:git-565c1dad
    • linkerd1 1.4.5
    • baseline (no proxy)
  • HTTP/1
  • 1000 qps total
  • 10 connections
  • slow-cooker frontend
  • helloworld backend

Proxy metrics:
https://gist.github.com/siggy/2708cdff73c3e25463d80fc10feac45a

Kubernetes config:
https://gist.github.com/siggy/21ecc89162c23f1690baf29ab4cd2b5a

Seeing lots of these in proxy log:

ERR! proxy={server=in listen=0.0.0.0:4143 remote=127.0.0.1:52052} linkerd2_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500

Steps to reproduce

  1. Deploy

    kubectl apply -f https://gist.githubusercontent.com/siggy/21ecc89162c23f1690baf29ab4cd2b5a/raw/100493dc1e4fd2181c4f474fa7b4c52116dc71bd/linkerd2-h1.yaml
  2. Observe in Grafana

    kubectl -n linkerd2-h1 port-forward $(kubectl -n linkerd2-h1 get po --selector=app=grafana -o jsonpath='{.items[*].metadata.name}') 3000:3000
    

screen shot 2018-07-19 at 1 16 34 pm

@seanmonstar
Copy link
Contributor

After enabling a bunch of logs, noticed this:

http/1 client error: an error occurred trying to connect: Cannot assign requested address (os error 99)

Which is interesting! The "trying to connect" means the error came from the C: Connect piece, which we create a custom one in the proxy in transport::connect.

Digging deeper...

@hawkw
Copy link
Contributor

hawkw commented Jul 19, 2018

Cannot assign requested address (os error 99)

Huh, is SO_REUSEADDR not being set in some cases, or something?

@seanmonstar
Copy link
Contributor

seanmonstar commented Jul 19, 2018

Well, this error is when connecting, and thus we don't set the option at all. But, this suggests that a lot of churn is happening, and many connections are sitting in TIME_WAIT. There was a patch to hyper to try to significantly reduce this, upgrade in linkerd/linkerd2-proxy#24.

@seanmonstar
Copy link
Contributor

Turns out the real problem was every single one of these requests resulted in a new connection. There was some optimizations added to hyper to reduce the amount of operations needed when the size of a body was known, but because of those optimizations, the internal read state wasn't polled to the end, so it assumed the body wasn't wanted and had to close the connection. Fix to hyper merged in hyperium/hyper#1610, new PR for the proxy incoming!

@siggy
Copy link
Member Author

siggy commented Jul 30, 2018

Confirmed, tested with Linkerd2 v18.7.2, issue no longer observable.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants