-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip compression leaking half closed sockets and consuming excess CPU #327
Comments
hmm, thanks, I'll have a look, that indeed doesn't look healthy |
Are you sure these are stale handles? Does the list grow over time? |
the list does grow over time. It corresponds with the CPU load increasing, which seems to happen in big increments. This graph shows it increasing roughly daily If I strace one of the threads, they seem to be constantly cycling through the FDs doing reads. I guess that's what's consuming all the CPU
all these FDs shows in lsof with "can't identify protocol"
and in /proc/17207/fd
|
can you share your config? that might give me some clues |
ok, what i worked out was that one of the source carbon-c-relays that was sending to the receiver was passing through a haproxy TCP proxy (due to some firewall/network challenges). I was recently able to replace that proxy with a carbon-c-relay, and the problem went away. So the flow was: data sources -> carbon-c-relay -> haproxy TCP proxy -> carbon-c-relay I'm not sure why that caused the symptoms that I was seeing, but i'm going to close this issue as it's not a problem now. |
Hmm, ok, so it is the closing behaviour of haproxy that the relay doesn't handle very well. I'll see if I can find something using that approach. |
Ensure EOF is visible to the main dispatcher loop, else we keep trying to read and decompress, but in vain, just as in issue #327.
I think I found the culprit, thanks for your details/analysis |
When I use the gzip compression option between two carbon-c-relays (version 3.2), the receiver eventually starts consuming excessive CPU (consuming all CPU cores on a 24 core host).
On the receiving side, I see sockets in lsof output with "can't identify protocol"
these sockets are not shown in netstat
I am also seeing some corrupt metrics when this happens. If I switch back to not using gzip, then these issues go away.
The text was updated successfully, but these errors were encountered: