Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverseproxy: Close hijacked conns on reload/quit #4895

Merged
merged 7 commits into from
Sep 2, 2022
Merged

reverseproxy: Close hijacked conns on reload/quit #4895

merged 7 commits into from
Sep 2, 2022

Conversation

mholt
Copy link
Member

@mholt mholt commented Jul 17, 2022

Previously, if Caddy was proxying WebSockets, and the client closed the connection, Caddy would properly proxy the Close message to the server, and vice-versa if the server closed the connection to the client. This is all fine. However, we didn't really take care of cleaning up these hijacked connections in the event of a config reload (where Caddy needs to close the connection).

Hijacked connections don't get closed by the standard library when we call http.Server.Shutdown(), so we have to do that ourselves.

We also send a Close control message to both ends of WebSocket connections. I have tested this many times in my dev environment with consistent success, although the variety of scenarios was limited.

Should close #4762.

This could use a lot of testing in production under heavy traffic. For example, I don't know if the 'graceful' WebSocket close feature works properly with concurrent write (i.e. the proxy is still actively copying a response from the backend while our proxy also injects its Close frame).

I thought of many ways to implement this, and I opted to start with the simplest that worked. Please try it out, and report any issues with it, thanks!

We also send a Close control message to both ends of
WebSocket connections. I have tested this many times in
my dev environment with consistent success, although
the variety of scenarios was limited.
@mholt mholt added the help wanted 🆘 Extra attention is needed label Jul 17, 2022
@mholt mholt added this to the 2.x milestone Jul 17, 2022
@mholt mholt self-assigned this Jul 17, 2022
Copy link
Member

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Looks just as I'd expect.

I'll try to find some time this week to test it out on an app of mine with websockets to see how it behaves.

modules/caddyhttp/reverseproxy/streaming.go Outdated Show resolved Hide resolved
modules/caddyhttp/reverseproxy/streaming.go Outdated Show resolved Hide resolved
@francislavoie
Copy link
Member

FYI @dunglas in case you have any input here regarding Server-Sent Events. Is it safe to just straight up close SSE connections, or should we be sending somekind of message to either end to terminate cleanly? I assume no because it's just a regular HTTP request, but I just want to make sure.

@dunglas
Copy link
Collaborator

dunglas commented Jul 17, 2022

Just closing the connection is good enough, there is no need for a specific message. The client will automatically try to reconnect (which is the expected behavior).

Event stream requests can be redirected using HTTP 301 and 307 redirects as with normal HTTP requests. Clients will reconnect if the connection is closed; a client can be told to stop reconnecting using the HTTP 204 No Content response code.

@WeidiDeng
Copy link
Member

I don't think sync.map delete usage is correct here, per issue and my personal experience with v2ray account management which uses sync.Map. If using delete to remove entry, memory will continue to grow and ultimately lead to oom. Once I stress tested v2ray by continuing adding 1k users and removing them, 2 or 3 rounds v2ray will oom, and with 30 minutes stops in between them to allow gc to kick in.

That github issue is closed with stating keys will be removed when calling loadanddelete. I never tried it, because I developed my own caddy plugin to handle proxy traffic.

@mholt
Copy link
Member Author

mholt commented Jul 21, 2022

@WeidiDeng Yikes, good to know. I'm not totally clear on the issue either, but given the documentation for sync.Map and your experience and links you posted, I'm going to err on the safe side and choose to use a regular map with sync.Mutex instead.

This branch is ready for testing!

@WeidiDeng
Copy link
Member

Recently, I am using forward_auth to develop my own authorization for ttyd, which uses websocket for communication. I use ttyd to manage server including caddy configuration. Since caddy will always restart all apps when configuration changes, I wonder in this case, closing websocket is actually unneccessary. But current caddy
module lifecycle doesn't permit this case.

@francislavoie
Copy link
Member

Yeah. We'd love to keep websocket connections open across config reloads, but the problem is that we don't have a way to relate things in the old config to the new config.

For example, what if the reverse_proxy config changes, e.g. upstreams change and the one the websocket connection was using is removed? We have to close the connection to not leak them. And since we have no way to identify a particular proxy in the config (you might have more than one proxy), we can't safely try to compare the old vs new upstreams. Also that wouldn't work for the dynamic upstreams feature either.

@WeidiDeng
Copy link
Member

I see this doesn't follow caddyhttp.App's GracePeriod, perhaps when cleaning up, there need to be a delay. caddyhttp.ServerCtxKey can be used on a http.request to extract the relevant caddyhttp.App instance. And when 0, clean up immediately.

But this is semantically different from http app's GracePeriod, which when zero, means waiting for request to complete indefinitely.

@mholt mholt modified the milestones: 2.x, v2.6.0 Jul 25, 2022
@mholt mholt removed the help wanted 🆘 Extra attention is needed label Aug 4, 2022
@mholt
Copy link
Member Author

mholt commented Aug 4, 2022

@WeidiDeng

I see this doesn't follow caddyhttp.App's GracePeriod, perhaps when cleaning up, there need to be a delay.

True, we don't honor the grace period for hijacked connections. That's probably an area for improvement in a future PR. I am not sure how complicated that may be.

@mholt mholt modified the milestones: v2.6.0, v2.6.0-beta.1 Sep 2, 2022
@mholt mholt merged commit 66476d8 into master Sep 2, 2022
@mholt
Copy link
Member Author

mholt commented Sep 2, 2022

This has reportedly been working very well.

@mholt mholt modified the milestones: v2.6.0-beta.1, v2.6.0 Sep 13, 2022
@mholt mholt deleted the websockets branch September 17, 2022 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reverseproxy: Cleaner close of websocket connections (and SSE)
4 participants