Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SplitHTTP race condition when creating new sessions #3533

Merged
merged 3 commits into from
Jul 17, 2024
Merged

Fix SplitHTTP race condition when creating new sessions #3533

merged 3 commits into from
Jul 17, 2024

Conversation

yuhan6665
Copy link
Member

It seems SH can't receive a short message too fast

It seems SH can't receive a short message too fast
@yuhan6665
Copy link
Member Author

@mmmray I made some progress. I think it is a unit test only issue that is unlikely in real use.
I see a race condition between server read

packet, more := <-h.pushedPackets

and client push packet to the channel

It seems server read must happen before client push. But can't understand why the other way would hang..

@RPRX
Copy link
Member

RPRX commented Jul 14, 2024

先不要给 main push 新的 commit,等待 #3530

@RPRX RPRX mentioned this pull request Jul 15, 2024
@mmmray
Copy link
Collaborator

mmmray commented Jul 15, 2024

@yuhan6665 yeah I don't understand it either. there is a test that writes before read: https://github.com/XTLS/Xray-core/blob/13c2d5071c5e1e6789635d5f205e685ae92ece08/transport/internet/splithttp/upload_queue_test.go

is this hanging test locally reproducible?

@yuhan6665
Copy link
Member Author

@yuhan6665 yeah I don't understand it either. there is a test that writes before read: https://github.com/XTLS/Xray-core/blob/13c2d5071c5e1e6789635d5f205e685ae92ece08/transport/internet/splithttp/upload_queue_test.go

is this hanging test locally reproducible?

I removed the 5 seconds delay and was running the test for 100 times -count 100

@RPRX
Copy link
Member

RPRX commented Jul 15, 2024

I think it is a unit test only issue that is unlikely in real use.

听起来新版不必等这个 PR

@mmmray
Copy link
Collaborator

mmmray commented Jul 15, 2024

This appears to be some kind of issue with connection reuse of HTTP/1.1

If I comment out this line, everything works:

globalDialerMap[dialerConf{dest, streamSettings}] = client

Alternatively, this patch can be applied:

toggle
diff --git a/transport/internet/splithttp/client.go b/transport/internet/splithttp/client.go
index dd48ee6..c79e233 100644
--- a/transport/internet/splithttp/client.go
+++ b/transport/internet/splithttp/client.go
@@ -118,7 +118,7 @@ func (c *DefaultDialerClient) SendUploadRequest(ctx context.Context, url string,
        }
        req.Header = c.transportConfig.GetRequestHeader()

-       if c.isH2 {
+       if c.isH2 || true {
                resp, err := c.upload.Do(req)
                if err != nil {
                        return err
diff --git a/transport/internet/splithttp/dialer.go b/transport/internet/splithttp/dialer.go
index 3a7ce64..0d5a1b5 100644
--- a/transport/internet/splithttp/dialer.go
+++ b/transport/internet/splithttp/dialer.go
@@ -109,7 +109,7 @@ func getHTTPClient(ctx context.Context, dest net.Destination, streamSettings *in
                }

                // we use uploadRawPool for that
-               uploadTransport = nil
+               uploadTransport = downloadTransport
        }

        client := &DefaultDialerClient{

So the code that uses uploadRawPool directly is faulty somehow. I have not observed this kind of issue in prod, and in fact I am not able to reproduce this reliably even with count=100. So the bug should be fixed (if it can be figured out... I already spent a lot of time figuring out HTTP/1.1) but I would not release 1.8.19 because of it 😅

@yuhan6665
Copy link
Member Author

@mmmray maybe it is different environment but I tried two fixes you mentioned, it still hangs occasionally here
/usr/local/go/bin/go test -timeout 30s -run ^Test_listenSHAndDial$ github.com/xtls/xray-core/transport/internet/splithttp -count 100

@vrnobody
Copy link
Contributor

vrnobody commented Jul 17, 2024

我发现两个问题:

第一,upsertSession()有小概率为同一个sessionId生成多个uploadQueue,所以应该给它加个锁

func (h *requestHandler) upsertSession(sessionId string) *httpSession {

第二,unit test里面不能有time.Sleep()之类的函数,所以要注释掉下面这行。

<-time.After(time.Second * 5)

修改后的代码: vrnobody@1ee424f (已删除 2024-07-18)

关于第二点,我也不知道为什么,但下面这个unit test有可能失败。非常奇怪。
go test -run ^Test_Time_Sleep$ -count=100 -timeout=30s ./transport/internet/splithttp/splithttp_test.go

func Test_Time_Sleep(t *testing.T) {
	start := time.Now()
	fmt.Println("sleeping...")
	time.Sleep(5 * time.Second)
	dur := time.Since(start).Milliseconds()
	fmt.Println("wakeup after", dur, "ms")
}

@mmmray
Copy link
Collaborator

mmmray commented Jul 17, 2024

@vrnobody I cherry-picked your commit into this branch because it looks perfectly correct to me. I also don't really understand where the sleep 5s comes from, in fact it was copied from other transports

@mmmray
Copy link
Collaborator

mmmray commented Jul 17, 2024

With this change it seems the tests pass even with -count 1000. @yuhan6665 are you seeing the same thing?

@RPRX
Copy link
Member

RPRX commented Jul 17, 2024

听起来新版可以等这个 PR

@mmmray mmmray changed the title Try to fix infinite conn read in SH tests Fix SplitHTTP race condition when creating new sessions Jul 17, 2024
@mmmray
Copy link
Collaborator

mmmray commented Jul 17, 2024

Since there is a need to bring out a new version now, I rebranded the PR to be about @vrnobody's fix. Credit in the changelog should be given to yuhan and vrnobody.

The sleeps in the testsuite seem unnecessary to me now, so I removed them.

I think this PR can be merged if @yuhan6665 can confirm the testsuite no longer hangs on his machine too.

@RPRX
Copy link
Member

RPRX commented Jul 17, 2024

我觉得无论 tests 修好没,至少 SplitHTTP 都需要这个 fix,所以先合了,credit 给你们三位,感谢!

@RPRX RPRX merged commit 02cd3b8 into main Jul 17, 2024
36 checks passed
@yuhan6665
Copy link
Member Author

测试无问题 感谢 @vrnobody

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants