-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: investigate data race in Transport Request.Body reuse when GetBody is not defined #25009
Comments
Does this happen only with http2 ? Is it possible to for you to write a quick http2 server in Go and make this entire issue to be self-contained and fully reproducible ? |
@agnivade Yes, it's fine with http1.1. Here is a quite simple server:
And nginx is needed. |
I don't understand. The whole point of having a Go server was to have the repro self-contained. If this is a client issue, why do we need nginx ? Is the Go client-server not enough to cause the repro ? |
@agnivade Nginx is needed because it will check the body length against the header And please note that in order to make this issue reproducible, |
The point of having a Go server is to allow us to reproduce the issue without nginx. You can simply add the equivalent check in the Go server so that it's behavior is same as nginx. |
Thank you @ictar for the report, we'll investigate it. @agnivade thank you for handling this. I just wanted to add that I can relate to @ictar's need to have nginx for a repro(many companies that I know heavily use it) and that the question of absolving the issue only because Go's servers can't reproduce it might be tricky. This is because many companies will usually put nginx or some other web server as their frontend to their web apps, and from there proxy the content to polyglot services on their backend e.g Go, Node.js, Python(etc) web server. Interoperability between protocols and servers matters a lot and discrepancies between Go clients talking to various servers creep in :) |
Yes I understand. I was trying to isolate the issue. I did not mean to say to try to reproduce the issue with a plain Go HTTP server. Rather I meant to write equivalent code in Go to mimic nginx behavior so that we hit the bug. If that is hard to do, then we obviously need nginx. But if a simple check of verifying the content-length header and the body shows the issue, then we theoretically don't need nginx. Kind of like the test case that one will write if this indeed is a bug. |
Seems like just verifying content-length to body length does not show the issue and we need nginx for that. Also does not reproduce if |
/cc @bradfitz , @tombergan |
@ictar I just got sometime this evening to try to reproduce this:
and I ran it 50 times in a row $ curl -i -k -X POST https://localhost:9889
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Fri, 15 Jun 2018 06:45:44 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Connection: keep-alive
Hello world!
$ curl -i -k -X POST https://localhost:9889
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Fri, 15 Jun 2018 06:45:46 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Connection: keep-alive
Hello world! but no failures and the client and server look like this Clientpackage main
import (
"bytes"
"crypto/tls"
"errors"
"fmt"
"io/ioutil"
"net/http"
"sync"
"time"
)
type client struct {
base string
hc *http.Client
}
func newClient(baseURL string) *client {
return &client{
base: baseURL,
hc: &http.Client{
Timeout: 5 * time.Second,
Transport: &http.Transport{
MaxIdleConnsPerHost: 100,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
ExpectContinueTimeout: 5 * time.Second,
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},
},
},
}
}
func (c *client) send(content []byte) error {
if content == nil {
return errors.New("nil content")
}
body := new(bytes.Buffer)
body.Write(content)
start := time.Now()
req, err := http.NewRequest("POST", c.base, body)
if err != nil {
return fmt.Errorf("Send %+v (from %s)", err, time.Since(start))
}
req.Header = make(http.Header)
req.Header.Set("Content-Type", "application/json")
resp, err := c.hc.Do(req)
if err != nil {
return fmt.Errorf("Send: HTTPClient.Do: %+v (from %s)", err, time.Since(start))
}
defer resp.Body.Close()
blob, err := ioutil.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("Send: Read: %+v (from %s)", err, time.Since(start))
}
fmt.Printf("Send: Resp: %s\n", blob)
return nil
}
func main() {
client := newClient("https://localhost:9889/")
content := []byte(`{"test":1}`)
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
if err := client.send(content); err != nil {
fmt.Printf("#%d: err: %v\n", i, err)
}
}(i)
}
wg.Wait()
} server.gopackage main
import (
"io/ioutil"
"log"
"net/http"
)
func main() {
addr := ":9789"
log.Printf("Serving at: %s", addr)
http.HandleFunc("/", helloWorld)
if err := http.ListenAndServeTLS(addr, "cert.pem", "key.pem", nil); err != nil {
log.Fatalf("Failed to serve: %v", err)
}
}
func helloWorld(w http.ResponseWriter, r *http.Request) {
_, _ = ioutil.ReadAll(r.Body)
defer r.Body.Close()
w.Write([]byte("Hello world!"))
} @ictar with the setup that I've provided above, could you please try it locally too and see if this can reproduce the bug? Thank you. |
With a minor edit to client.go to ensure we've configured the transport for HTTP2 with --- old.go 2018-06-14 23:55:32.030299882 -0700
+++ client.go 2018-06-14 23:55:10.526187144 -0700
@@ -9,6 +9,8 @@
"net/http"
"sync"
"time"
+
+ "golang.org/x/net/http2"
)
type client struct {
@@ -17,19 +19,22 @@
}
func newClient(baseURL string) *client {
+ tr := &http.Transport{
+ MaxIdleConnsPerHost: 100,
+ MaxIdleConns: 100,
+ IdleConnTimeout: 90 * time.Second,
+ ExpectContinueTimeout: 5 * time.Second,
+ TLSClientConfig: &tls.Config{
+ InsecureSkipVerify: true,
+ },
+ }
+
+ http2.ConfigureTransport(tr)
return &client{
base: baseURL,
hc: &http.Client{
- Timeout: 5 * time.Second,
- Transport: &http.Transport{
- MaxIdleConnsPerHost: 100,
- MaxIdleConns: 100,
- IdleConnTimeout: 90 * time.Second,
- ExpectContinueTimeout: 5 * time.Second,
- TLSClientConfig: &tls.Config{
- InsecureSkipVerify: true,
- },
- },
+ Timeout: 5 * time.Second,
+ Transport: tr,
},
}
} |
@ictar would you mind sharing your nginx.conf adapted from the one I posted above? |
*rather @ernado |
@odeke-em |
@odeke-em |
Awesome, thank you @ernado for the repro via docker-compose I am going to update the title a little though as this seems to happen a few times and not always |
The http2_max_requests nginx directive affects the error rate. If I set it high enough (more than total requests is done by the client) there are no errors. Somehow relates to #18112. I've updated my go-issue-25009 with |
Yeah, I wondered and noticed that too that the |
Interesting point: if go1.7 is used, I'm getting an error on http request:
(tried setting request.GetBody with no effect). Maybe that could help. |
With such a low |
The h2 retry logic is here for reference: https://github.com/golang/net/blob/master/http2/transport.go#L422 |
@rs , I'm passing |
Also if we use separate $ CONCURRENT=0 ./test.sh
# works fine
$ CONCURRENT=1 ./test.sh
# same 400 Bad Request Also you can set |
@ernado You are correct that the retry does not reset the body. If you look at the retry, it sends a 0 byte body instead of a 1 byte body. |
From https://trac.nginx.org/nginx/ticket/1250#comment:4
Can be related to that issue. @fraenkel
Also I've described why we the body is not reset in You can try the following patch to force the correct body reset: --- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -477,9 +477,8 @@ func (t *Transport) roundTrip(req *Request) (*Response, error) {
}
testHookRoundTripRetried()
- // Rewind the body if we're able to. (HTTP/2 does this itself so we only
- // need to do it for HTTP/1.1 connections.)
- if req.GetBody != nil && pconn.alt == nil {
+ // Rewind the body if we're able to.
+ if req.GetBody != nil {
newReq := *req
var err error
newReq.Body, err = req.GetBody() |
Your testcase does not replicate the issue. You are trying to mimic the behavior but you aren't. The problem is that the request is retried and the body isn't reset so nginx sends back a 400 since it expected 1 byte of data and got 0. The fix should be in the http2 code and not http which is what you have done. |
You are absolutely right, but the test case still triggers the invalid retry, isn't it? The client retries with request which body is not reset properly.
Yes, it is exactly what I was trying to describe.
I've provided both possible http2 and http "fixes", and they are described in #25009 (comment) |
Why aren't you using the req.GetBody func? The is how you provide a body on retries. |
Because it is already used under the hood when body is *bytes.Buffer |
So what is currently there should work because the newReq is returned and retried in the loop. Setting it back on the original request is racy, but the outcome should be the same. Let me see if I can reproduce this and dig as to why the newReq isn't working which is what should be working. |
Also I've tried both of my "patches", they are forcing the correct body reset which results in 200 OK from nginx (and passed test case). |
NewReq is not working because we are not using it, we are failing to get connection from the pool and returning from http2 retry loop. |
I agree that you found the bug. The fix while it works doesn't seem right. |
@ernado Your fix in http/transport.go looks correct. We still would need a test case to go along with the change. |
I've implemented a test case for transport and currently preparing the patch. |
Change https://golang.org/cl/131755 mentions this issue: |
Gentle ping for @bradfitz @ianlancetaylor |
Have you found the solution to this issue? I encountered the same problem where I'd get back a statuscode 400 from nginx sometimes when making a lot of concurrent calls. At some point I'd even get a 'broken pipe' error after which I'd have to restart the client. Using an environment variable (GODEBUG="http2client=0") fixes the issue, but I don't like this solution. |
@kaspergyselinck what version of go are you using? This problem should be fixed in go1.12. |
I can confirm that my go-issue-25009 repro still fails on go1.12 and go1.13 for some reason. |
Change https://golang.org/cl/210123 mentions this issue: |
@ernado hello, i have meet the same solution, have you fix it ? |
Hey @Centhero, I'm trying to fix in https://golang.org/cl/210123, but it seems like fix is not as trivial as I expected, sorry. |
What version of Go are you using (
go version
)?$ go version
go version go1.10.1 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOCACHE="/home/test/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GORACE=""
GOROOT="/home/test/go"
GOTMPDIR=""
GOTOOLDIR="/home/test/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build614182611=/tmp/go-build -gno-record-gcc-switches"
and nginx version: nginx/1.13.3
What did you do?
Here is my client:
And here is my nginx configure of http2:
What did you expect to see?
I expect all requests are successfully.
What did you see instead?
But I got 400 occasionally and found
client prematurely closed stream: only 0 out of 10 bytes of request body received
line in nginx log.I used
GODEBUG
to try to figure out, and found data not sent afterEND_HEADERS
:That's the reason why nginx responsed
400
directly.The text was updated successfully, but these errors were encountered: