Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Memory Fragmentation (WIP) #386

Open
splitice opened this issue May 15, 2020 · 9 comments
Open

Memory Fragmentation (WIP) #386

splitice opened this issue May 15, 2020 · 9 comments

Comments

@splitice
Copy link

splitice commented May 15, 2020

  • Version: latest / 5baab3f
  • Platform: Linux Debian ARMv7

What steps will reproduce the bug?

Long running QUIC client making regular bidirectional streams in both directions

How often does it reproduce? Is there a required condition?

24-48hrs for maximum effect

What is the expected behavior?

Memory sitting around 20-30MB (max old size of 30MB). This is the usage of the process without the QUIC server components compiled in and using TCP sockets for the same protocol in place of QUIC.

What do you see instead?

Memory peaking at 100MB+ (OOM on test device)

Additional information

QUIC (most likely ngtcp2 allocated memory) seems to create a high rate of memory fragmentation when using the default glibc malloc in real world conditions. While 100MB of ram is likely not an issue in server applications in the low-end space this is significant. Additionally as this ram is allocated with an external allocator / not v8 it is not allocated as part of the nodejs memory pool it will grow unrestricted by parameters such as max old space size.

The usage appears to be fragmentation rather than a leak, however I have not entirely ruled that out.

Currently I'm testing using jemalloc to see if it exhibits more sane behavior (results to following in coming week). Turn around time on replication means that I'll be testing this over the coming week. If it jemalloc results in sane memory usage then that supports a fragmentation situation. So far with a runtime of 2hrs this seems to be supported.

@splitice splitice changed the title Memory Fragmentation Memory Fragmentation (WIP) May 15, 2020
@jasnell
Copy link
Member

jasnell commented May 15, 2020

Loving these issues. Keep them coming if you can. Will be digging back in on the quic code early next week.

@splitice
Copy link
Author

jemalloc with narenas=1 shows none of the excessive growth (confirmed over 48hrs).

That supports this being fragmentation over a leak.

I'm not really sure what the next steps here are.

@jasnell
Copy link
Member

jasnell commented May 18, 2020

We'll need to reproduce the issue. We're not doing much special around allocations here so it's going to take a bit to figure out and nail down. Just having as much information as possible on your test case and data you've collected would be a great start

@splitice
Copy link
Author

@jasnell do you have a generic server and client example that exercises the streams api (say opening a stream and reading and writing for example)? Perhaps leaving something like that running for a few days would be a good test.

This may be just a case of glibc being derpy and fragmenting with smaller allocations. Normally with TCP based streams this wouldn't be seen but I'm imagining there are alot of small allocations for dealing with small signalling packets in QUIC.

@jasnell
Copy link
Member

jasnell commented May 18, 2020

Yes, there are many small allocations and reallocs that occur frequently. That definitely could be the cause. We could look into making that more efficient, and maybe even use a slab allocator for much of it. Hmm. Ok, that gives me an idea where to start. Thank you

@splitice
Copy link
Author

@jasnell It might also be worth pushing for some malloc opts tuning in nodejs. Of course that would be a platform specific solution though (of course it may not even matter on Windows, who even knows how their malloc works).

I did a quick search of the NodeJS issues and surprisingly found no discussions on it. Despite being mostly single threaded daemon 2-8x the number of cpu cores worth of arenas are in play. Even with IO threads this seems excessive and would likely contribute to fragmentation.

I'd need to test whether just tuning malloc narenas would resolve the fragmentation. Glibc is extremely prone to fragmentation, jemalloc less so. A slab allocator however would be a great if the allocation patterns are suitable :)

@jasnell
Copy link
Member

jasnell commented May 18, 2020

I'm fairly certain that it's the reallocs that We're doing here. I'm going to start there tomorrow and see where we get

@jasnell
Copy link
Member

jasnell commented May 18, 2020

possible PR fix in #388

@splitice
Copy link
Author

splitice commented May 21, 2020

Additional information. jemalloc is not a solution to this problem introduces, not because it doesnt solve the fragmentation issues but because it introduces it's own compatibility problems with NodeJS.

It appears NodeJS is prone to lockups when running with jemalloc. I've been seeing lockups on projects not using QUIC (however my nodejs process has the QUIC patches). Especially if narenas is reduced (e.g to 1). This doesnt appear to be specific to QUIC support however. I intend to do some testing on x86 from NodeSource builds to make sure it's not ARM specific or introduced by the patch and report it over on the Node project. Usually a lockup like that indicates destructor or memory access issues. Jemalloc tends to detect those via lockup.

In the interim if it's of interest to you (or anyone else who comes across this issue). Lockups look like:

#0  0xb6d0e524 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb6d0c152 in __lll_lock_wait () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2  0xb6d0698c in pthread_mutex_lock () from /lib/arm-linux-gnueabihf/libpthread.so.0
#3  0xb6f28e82 in ?? () from /usr/lib/arm-linux-gnueabihf/libjemalloc.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

and

#0  0xb6f0d354 in ?? () from /usr/lib/arm-linux-gnueabihf/libjemalloc.so.2
Backtrace stopped: Cannot access memory at address 0xf8

Although the behaviour looks similar to jemalloc/jemalloc#1392 (https://bugs.openjdk.java.net/browse/JDK-8215355) which appears to have been an issue with stack trace iteration (http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8215355/01/webrev/hotspot.patch)? Possibly an assumption that doesnt hold true with non glibc malloc. I'm not sure if V8 does anything similar.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants