Documenting memory behaviours and leak detections #1518

wcandillon · 2024-09-13T12:41:35Z

Regarding React Native Skia and WebGPU, I have often been confused about the observable memory allocations in these libraries. But even worse, we are genuinely struggling to explain these behaviors to our clients (see #982), who believe that our libraries are leaking memory.

I think it’s fair to say that many developers, in various situations, are under the impression that Hermes is leaking memory, even though it is not.

The goal of this issue is to create a documentation page that explains the current observable memory behavior (so we can share that information with our clients) and shows how to factually check whether a library/app is leaking memory.

Is there interest in such an initiative?

Here’s a recent example: As mentioned in #982, I was under the impression that RN WebGPU was leaking memory. I spent a substantial amount of time debugging, only to confirm that all objects are eventually deleted. Yet, the allocated memory stays the same and keeps growing. The demo on iOS in dev mode goes like this:
We allocate a bunch of objects, and after a certain time, they are all deleted. However, the allocated memory in the performance monitor remains the same. In the second cycle, many more objects are allocated before deletion (thus, allocated memory increases), and this exact pattern continues, creating an almost exponential memory allocation over time.
I assumed this behavior in dev mode is "okay." However, our clients are experiencing very similar behavior, and I’m not sure what to tell them since I’m not even sure myself what’s going on.

I hope this makes sense.

tmikov · 2024-09-15T01:49:16Z

We are definitely interested in providing documentation about the observable memory behavior.

However what you are describing doesn't really align with that - we would like to understand the behavior you are describing. None of these things are supposed to happen: memory should not remain the same data is freed. Also, allocation should not accumulate exponentially. We have not actually observed what you are describing and don't have an explanation for it.

In #982 we asked for some data. It would be very useful to see the values returned by getInstrumentedStats() in sequence, in each frame and how they change, along with the number of alive host objects at the same time.

BTW, there is documentation on "Memory Profiles", although some minor details might be out of date: https://github.com/facebook/hermes/blob/main/doc/MemoryProfilers.md

wcandillon · 2024-09-16T07:51:27Z

This is a great question. in #982 once I noticed that they were no leaks indeed, I wasn't sure which useful data points I could provide to you that could be useful. would a lot of getInstrumentedStats() as a JSON export or CSV be good?

Please let me know what kind of scenario, input, output data would be perfect for you.
I hope we can fix this gap about perceived behaviour eventually.

neildhar · 2024-09-20T21:43:47Z

The behaviour you're observing is pretty unexpected, so we're eager to learn more.

Could you confirm which tool you're using to see the growing memory consumption, and the numbers that you are seeing?
Is the memory being reported for the entire app, or memory that is being specifically managed by Hermes?
Are you seeing this only in dev mode, or are you also seeing elevated memory consumption in release builds?
Sharing the result of HermesInternal.getInstrumentedStats() over the period you're seeing the memory growth would be very useful. It would allow us to see the breakdown of the portion of the heap that is being managed by Hermes. A JSON file would be good.

We have also previously seen the allocator on macOS/iOS retain a large amount of excess memory even after allocations are freed, but I would not expect memory consumption to be increasing exponentially over time.

wcandillon · 2024-09-25T12:54:56Z

Thank you for following up on this.

1. I am looking at the performance monitor of the react native app and the instrument panel on XCode.
This is in dev mode only, it sounds like it might be worth trying in release mode? If this would be a dev-mode behaviour only, we would still need a way to explain to our user but it sounds like it could be fair.
Should I just run a demo and log HermesInternal.getInstrumentedStats() every minute or so? and have it as a CSV file?

neildhar · 2024-09-25T14:44:24Z

looking at the performance monitor of the react native app and the instrument panel on XCode

To confirm, you're seeing elevated memory consumption in both the performance monitor and in the instruments panel?

sounds like it might be worth trying in release mode?

It would be interesting to see the release mode results. A disparity between dev and release modes would definitely be an interesting data point.

Should I just run a demo and log HermesInternal.getInstrumentedStats() every minute or so? and have it as a CSV file?

Yes, that would be fine. Depending on the time scale over which you see the rise in memory consumption, you may want to log more frequently than that so we have enough visibility into the memory pattern. Having extra data will only help.

dev-john-nguyen · 2024-10-04T17:48:22Z

Hoping this helps. I believe this issue is related to our app's frequent crashes. Shopify/react-native-skia#2675

HermesInternal.getInstrumentedStats() every second before crash.
hermes_stats.txt

Crash logs

neildhar · 2024-10-04T17:54:58Z

Hey @dev-john-nguyen, according to the Hermes stats, Hermes is only managing 20MB of memory. Could you share more about why you believe Hermes is responsible for the elevated memory consumption?

neildhar · 2024-10-04T19:02:42Z

@dev-john-nguyen If your app has multiple Hermes runtime instances, could you share the output of getInstrumentedStats from each instance? The attached logs show that Hermes is using fairly little memory, but it is possible another Hermes instance in the app is consuming more.

dev-john-nguyen · 2024-10-04T20:35:37Z

@neildhar I'm not entirely sure if Hermes is the cause of the elevated memory consumption, but I’d like to help identify the cause and hopefully help us work towards a solution.

As far as I know, the app doesn’t spin up multiple Hermes runtime instances, unless it’s happening dynamically in certain scenarios I’m unaware of. Is there a way to confirm whether multiple Hermes runtime instances are running in the app?

tmikov · 2024-10-04T21:24:09Z

@dev-john-nguyen there are two possible ways that I can think of this being related to Hermes:

If native memory associated with JS objects is not reported using setExternalMemoryPressure(), the Hermes GC wouldn't know about it and it wouldn't attempt to collect it.
Hypothetically a second Hermes runtime could be the owner of all of this memory. If that was the case, we still have to investigate why it isn't being freed, but at least we would be looking at the correct runtime. Is this likely? I don't know. We know that many apps these days use a second runtime with Reanimated, for example.

One way to know when a Hermes runtime is created is to put a breakpoint with a native debugger in facebook::hermes::makeHermesRuntime().

dev-john-nguyen · 2024-10-04T23:07:03Z

So it looks like reanimated is spinning up another Hermes runtime instance.

Here's getInstrumentedStats for both instances until crash. (Edited posted the right stats)

reanimated_hermes_status.txt
hermes_stats.txt

tmikov · 2024-10-04T23:55:40Z

@dev-john-nguyen thanks for collecting this data so quickly! Unfortunately, it looks like neither of the runtimes owns a lot of memory. You can look at js_allocatedBytes and js_heapSize for the JS memory, and at js_externalBytes for the native memory known to the VM. As you can see, the values are pretty small.

Ignoring the possibility of a catastrophic bug in Hermes (this is always theoretically possible, but I don't really see evidence for it), the only other explanation is that there is native memory that is not reported to the VM.

I think you can use Instruments to see where this memory is being allocated. See this link (and particularly the "check the Malloc Stack box in the Diagnostics area" part): https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use .

tmikov · 2024-10-05T03:01:13Z

@dev-john-nguyen another experiment you should run is to invoke gc() periodically (every second?). If it helps, it would prove pretty conclusively that these are not leaks, but just external memory not being reported. If it makes no difference, these are actual leaks.

You should probably do it separately on the two runtimes.

fatiherdogan01 · 2024-10-07T09:06:52Z

Hi, @dev-john-nguyen
I can conclude that this crash is related to the glog package. I had 11K crashes in production.
I manually deleted the LOG(INFO)s in node_modules/react-native/ReactCommon/jsiexecutor/jsireact/JSIExecutor.cpp. And now 0 crashes in production.
Of course, there are memory leaks in various packages. But now I have to solve the problem with the glog package.
I've been trying to solve this crash for about 2 weeks. The problem was actually simple but I focused on other things.

Can you verify me?

dev-john-nguyen · 2024-10-07T19:30:10Z

@tmikov I generated a memory graph report, and it appears to have detected a few leaks related to Hermes. I'm not very familiar with the tool, but let me know if you need more information, and I can try to provide it.

dev-john-nguyen · 2024-10-07T19:30:24Z

@fatiherdogan01
This provides a temporary fix for the bad access issue I encountered, but it doesn't address the underlying problem. The app isn't releasing memory, so it will eventually be terminated once it consumes too much.

dev-john-nguyen · 2024-10-07T19:43:47Z

@tmikov I invoked gc every second on both runtimes and it didn't help much. Pretty much still has the same behavior. Memory usage drops slightly and then stagnates.

tmikov · 2024-10-07T20:09:17Z

@dev-john-nguyen So, let's recap:

You reported that your app's memory goes up to 1.9GB of memory usage before it crashes soon after.
At the same time we saw that the memory directly managed by Hermes is about 20MB.
Lastly, you tried running gc() periodically with no change.

I think this is very strong evidence that this is not memory managed or leaked by Hermes, directly or indirectly.

So, if Hermes is not allocating the 1.9GB, you need to identify who is. 1.9GB is quite a lot of data, so hopefully it should not be that difficult.

By looking at all allocations before the crash, you should be able to get some idea of what the majority are. Are they malloc() for example?

(Again, there is always a theoretical possibility that Hermes has a catastrophic bug, so we can never completely exclude that, but either way we need to find out where the data is allocated)

trungledangAxonActive · 2024-10-09T09:39:32Z

My app is using almost 1GB (from 400MB to 1GB) but the js_allocatedBytes and js_heapSize for the JS memory, and at js_externalBytes seem like to be allocated too much. I just hide and show the same markers on the map (I'm using Google map and there's also a memory leak of Google Map but I resolved it, but the memory still increases though), I still see RAM increases.

{
  "js_VMExperiments": 0,
  "js_allocatedBytes": 168924512,
  "js_externalBytes": 10627489,
  "js_gcCPUTime": 41.120000000000665,
  "js_gcTime": 44.91600000000184,
  "js_heapSize": 188743680,
  "js_mallocSizeEstimate": 0,
  "js_markStackOverflows": 0,
  "js_numGCs": 7547,
  "js_totalAllocatedBytes": 29481105976,
  "js_vaSize": 188743680
}

One thing I notice in Xcode is the CoreFoundation's memory increases the more I use the app

First time using	After a while using

wcandillon · 2024-10-11T00:59:29Z

The comments in this issue seem to support the idea that it would be more scalable to build comprehensive documentation to support library developers, rather than trying to address every question individually.
It might also be more scalable to document that library developers should use Nitro modules for building JSI-based APIs, since Nitro modules aim to encompass all the best practices on that topic. It would be hard, if not impossible, for another library developer to catch up with these best practices, in my opinion (e.g native state, setExternalMemoryPressure).

On my side, I am in the process of migrating to Nitro modules and will report back on the observable behaviors I encounter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documenting memory behaviours and leak detections #1518

Documenting memory behaviours and leak detections #1518

wcandillon commented Sep 13, 2024

tmikov commented Sep 15, 2024

wcandillon commented Sep 16, 2024

neildhar commented Sep 20, 2024

wcandillon commented Sep 25, 2024

neildhar commented Sep 25, 2024

dev-john-nguyen commented Oct 4, 2024 •

edited

Loading

neildhar commented Oct 4, 2024

neildhar commented Oct 4, 2024

dev-john-nguyen commented Oct 4, 2024

tmikov commented Oct 4, 2024

dev-john-nguyen commented Oct 4, 2024 •

edited

Loading

tmikov commented Oct 4, 2024

tmikov commented Oct 5, 2024

fatiherdogan01 commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024 •

edited

Loading

tmikov commented Oct 7, 2024

trungledangAxonActive commented Oct 9, 2024 •

edited

Loading

wcandillon commented Oct 11, 2024

Documenting memory behaviours and leak detections #1518

Documenting memory behaviours and leak detections #1518

Comments

wcandillon commented Sep 13, 2024

tmikov commented Sep 15, 2024

wcandillon commented Sep 16, 2024

neildhar commented Sep 20, 2024

wcandillon commented Sep 25, 2024

neildhar commented Sep 25, 2024

dev-john-nguyen commented Oct 4, 2024 • edited Loading

neildhar commented Oct 4, 2024

neildhar commented Oct 4, 2024

dev-john-nguyen commented Oct 4, 2024

tmikov commented Oct 4, 2024

dev-john-nguyen commented Oct 4, 2024 • edited Loading

tmikov commented Oct 4, 2024

tmikov commented Oct 5, 2024

fatiherdogan01 commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024

dev-john-nguyen commented Oct 7, 2024 • edited Loading

tmikov commented Oct 7, 2024

trungledangAxonActive commented Oct 9, 2024 • edited Loading

wcandillon commented Oct 11, 2024

dev-john-nguyen commented Oct 4, 2024 •

edited

Loading

dev-john-nguyen commented Oct 4, 2024 •

edited

Loading

dev-john-nguyen commented Oct 7, 2024 •

edited

Loading

trungledangAxonActive commented Oct 9, 2024 •

edited

Loading