-
Notifications
You must be signed in to change notification settings - Fork 57
Memory leaks due to un-GCable event handlers #194
Comments
Hello alkoclick, Nice to hear from you. ;-) I always saw your name when looking at some kweb tickets / contributions and wondered what you are using kweb for. Just out of curiosity: Whats the average request rate your service receives? And what will be the actual value of As i used kweb extensively over the last halve of a year, i have some 5 deployments of kweb applications already running. Some of them also in production. But currently only on very light traffic. However, I expect the traffic to grow over time. So i'm really interested in having a solution for that issue. Best Regards, Frank |
Hey folks, (and especially Frank) More good news! Here's our memory consumption today: You'll notice that it generally follows a similar pattern to the one above... but at only 30% of the memory footprint, capping out at ~600Mb compared to the previous 2.1Gb What are we running? A modified version of Kweb 0.8.5 (we're stuck there due to #186 and #189), with the patch created in #193 on top of it. For some background, this runs into a Kubernetes environment, where the host allocates 6 Gs of memory to K8s. The Kweb container has today served 14203 requests (though, I'll disappoint you, 9367 of those were healthchecks). We are using From my current viewpoint, the correct client state timeout value for most applications should be 1,2,4 or 12 hours, depending on usage patterns. The default should definitely not be 48, that's very high, probably 4 is a good balance if your clients are doing long sessions and 1-2 are pretty effective if you have a lot of clients but on shorter sessions. |
Yeah, agree - I'm not sure why I set the client state so high initially but 4 hours seems much more reasonable. If there are still memory leaks it might be that some of the client state should be stored in a Map with weak keys or values, but that would depend on what is leaking. |
… state timeout to 20 minutes, but the timeout is reset every time the client state is accessed.
Good evening, imho the acutal config value depends on the specific use case of kweb. E.g. consider some "monitor" page showing the status of a system or similar. There you won't have any user interaction and the session will run pretty much forever. In those situations i'd say that the 48 hours timeout is a good setting. Such a system could be implemented quite nicely with kweb. On the server side you watch / listen for state changes and when they occur, you simply update the ui in the session. Without kweb, all the client side update infrastructure / polling / making sure that the client is alive / ... is quite some work to get right. Back to the topic: Maybe if i have an hour or an half, i'd write some kind of load test to see if there is the need for the map with weak keys you mentioned @sanity. I'd expect a selenium test to be the most effective. A simple http call without corresponding websocket + session will most likely not trigger any problems. |
Thanks for the comment. You make a good point about very long-running pages like monitor pages, we need to support that. I recently committed a change where it uses a Guava cache which will expire RemoteClientStates (RCSs) 4 hours after they were last read. I've also made it remove the client states when the WebSocket terminates, although that will mean that a page refresh is required if the websocket is broken (whereas previously the client could just reconnect and continue). Perhaps the best approach would be:
Thoughts? |
I'm a bit cautious about this tbh, it sounds good on paper, but not sure if it'll have any unintended consequences. We'll find out I guess :D I like the proposal you've made, but I feel like it requires some technical work and there should be something simpler. So, my question is, is there any heartbeat in place? Because if we put a heartbeat for active clients, let's say per minute or so, then we can use that to detect whether a session is still alive or not. It will probably use the existing ws channels, which means it will set the "last read" time to recently enough that cleanup will not remove these long-running clients. So in practice:
P.S Note that we've largely addressed the crux of the issue here (#194), so we should consider moving to a new issue for this discussion |
I'm not married to the change, it's just in Yeah, I did think about a heartbeat, we don't currently have one at the "kweb" level, but there is one at the Ktor websocket level (which I assume will ensure that a dropped websocket connection will be detected). I like your proposal, I think it's simpler - shouldn't drop connections prematurely (in the event of a temporary websocket disconnection). The only thing is that a heartbeat message will probably require a new message type, which may conflict with @Derek52's work in #190 (although not necessarily too much of a headache). |
That looks super useful, I think we can base a solution on that |
@alkoclick I believe that the |
@alkoclick Just wanted to check in and see whether this was still an issue? |
Hey @sanity I'm aware of the update and I will post back here once I can confirm this is resolved, but the multitude of breaking changes in recent versions of Kweb makes it significantly more effort-ful to upgrade |
@alkoclick Yeah, the breaking changes are unfortunate but I'm trying to get the API stable before the 1.0.0 release, after that we will be a lot more conservative. I think we're past the worst of that now, the remaining pre-1.0.0 items shouldn't require breaking changes. |
Going to close this for now, please reopen if there are still issues. |
Just a note that |
Hey folks! Long time no see!
I come bearing some good and bad news. Let's start with the bad ones: Memory leak.
Describe the bug
Here's our memory usage in Kweb, over a day:
It's honestly not great. The 04:00 drop on the left is because we actually have a nightly restart in place for the container, because just GCing wasn't good enough.
To Reproduce
Run any Kweb app for more than 24 hours. You'll start noticing leftovers from client sessions in memory.
Expected behavior
Kweb allows the GC to clean up disconnected clients faster and does not keep references to callbacks on
Summary of our current understanding
Here's what the GC chain looks like:
You'll notice that the reference that is kept is actually a listener.
Suggested paths forward
Remove listeners from the client state properly
I honestly don't have too much insight in this. I really doubt these handlers should stay open after a client disconnect, so I suspect one of the cleanups is not running properly.
Allow configuring client cleanup timeout
So this is the good news I guess. While I might misunderstand the functionality here, I believe that enabling faster client cleanups, can partially alleviate this problem and is a good value to have configurable in the long term. I've implemented this at the PR: #193
The text was updated successfully, but these errors were encountered: