[bug] aquatic-ws - Memory Leak #169

SilentBot1 · 2024-01-05T18:10:46Z

After leaving an instance of aquatic-ws @ e2a3211 running on a Ubuntu 22.04.3 LTS machine, with this configuration, it appears there is a memory leak in the aquatic_ws process, causing memory usage to increase over the span of multiple days, until the process crashes due to exhausting all system memory (in my case ~2.4GB free - total system memory of 4GB).

An example of peer counts, message throughput and usage can be seen in the following image:

It appears the memory leak is related to peer count in some way, as the memory usage rate increases more under high load, and slows during lower load, but none the less increases over time.

I plan to disable metrics to verify if this could be the cause and will report back if this helps alleviate the issue, but I thought it would be best to open the issue first and provide updates as I try and troubleshoot this.

The text was updated successfully, but these errors were encountered:

greatest-ape · 2024-01-06T11:23:43Z

Thanks. I’ve opened an issue in the glommio repository, the async runtime that I’m using. But it would still be great to see the results without metrics. I have a suspicion that https://docs.rs/metrics-exporter-prometheus/latest/metrics_exporter_prometheus/ doesn’t free memory (for peer clients and peer id prefixes), and I would be I interested in seeing if that is indeed the case , and if so, how much of the aquatic leak comes from metrics.

SilentBot1 · 2024-01-06T18:32:43Z

Thanks for looking into this and raising an issue with glommio, seems like it will maybe be a difficult one - it's not too critical for me at the moment, as a restart once every 5-7 days isn't too bad, but if/as usage increases, I imagine this will only become more frequent.

For the first 5 hours after a restart, with statistics both enabled and disabled, this is what memory usage looks like:

Statistics Enabled:

Statistics Disabled:

In the statistics enabled timeframe, 137k connections were made to the tracker, during the statistics disabled timeframe 175k connections were made to the tracker, so this possibly explains the difference between the two at the end of the 5 hours, as the total usage at the end ended up higher, even with statistics disabled.

It looks like if there is any memory leaking from metrics_exporter_prometheus for peer client/prefixes, it's masked entirely by the glommio issue at the moment.

greatest-ape · 2024-01-07T08:57:10Z

Great, thanks. Yes, it might unfortunately take a while to fix this.

greatest-ape · 2024-01-07T10:35:22Z

Actually, I came up with an idea to possibly circumvent the issue. In local testing, it seems to fix the leak. Could you please try out the lastest commit on master?

SilentBot1 · 2024-01-07T13:32:12Z

Thank you, I have just updated and restarted - will keep you posted.

SilentBot1 · 2024-01-07T19:15:09Z

Just to provide you another 5 hour mark update, things are looking a whole lot better:

I will note, metrics now appear to be undulating after the restart:

After further looking into this, It only appears to be affecting BitComet peers (which don't actually support WebTorrent, only pulling stats from WebSocket trackers), as they don't appear to be keeping just one socket open continuously:

greatest-ape · 2024-01-08T09:06:00Z

Excellent!

The undulating BitComet counts are somewhat strange, but by your description, this seems to happen due to the clients acting in a nonstandard way rather than the tracker.

SilentBot1 · 2024-01-11T17:37:46Z

Just to provide a further update, it seems like things have continued to work as expected with the provided fix from the 7th onwards:

I'll close the issue down as the fix you've implemented has resolved the leak, though if you would like it to track the underlaying glommio issue, feel free to re-open this.

Thanks again for your help.

greatest-ape · 2024-01-12T08:18:03Z

Great! Thanks for the detailed reports and for trying out the fix.

greatest-ape mentioned this issue Jan 6, 2024

RawTask memory leak DataDog/glommio#636

Open

SilentBot1 closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] aquatic-ws - Memory Leak #169

[bug] aquatic-ws - Memory Leak #169

SilentBot1 commented Jan 5, 2024

greatest-ape commented Jan 6, 2024 •

edited

Loading

SilentBot1 commented Jan 6, 2024

greatest-ape commented Jan 7, 2024

greatest-ape commented Jan 7, 2024

SilentBot1 commented Jan 7, 2024

SilentBot1 commented Jan 7, 2024

greatest-ape commented Jan 8, 2024 •

edited

Loading

SilentBot1 commented Jan 11, 2024

greatest-ape commented Jan 12, 2024

[bug] aquatic-ws - Memory Leak #169

[bug] aquatic-ws - Memory Leak #169

Comments

SilentBot1 commented Jan 5, 2024

greatest-ape commented Jan 6, 2024 • edited Loading

SilentBot1 commented Jan 6, 2024

Statistics Enabled:

Statistics Disabled:

greatest-ape commented Jan 7, 2024

greatest-ape commented Jan 7, 2024

SilentBot1 commented Jan 7, 2024

SilentBot1 commented Jan 7, 2024

greatest-ape commented Jan 8, 2024 • edited Loading

SilentBot1 commented Jan 11, 2024

greatest-ape commented Jan 12, 2024

greatest-ape commented Jan 6, 2024 •

edited

Loading

greatest-ape commented Jan 8, 2024 •

edited

Loading