After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models #31

felladrin · 2024-05-13T22:08:06Z

Something interesting occurred while upgrading to version 1.8.0. Previously, it had been throwing an "Out of Memory" error, but that issue has now been resolved. However, a new problem has surfaced, where the async function loadModelFromUrl does not complete. It appears to be stuck in a state where it neither resolves nor rejects. It's possible that the error may be caught in the middle of the process and not being passed up.

This issue can be reproduced with models that are too large to fit into the device's memory. It works perfectly fine with smaller models.

It's possible that this problem is related to the changes made in this pull request:

Better exception handling #29

However, as I only encountered this issue on the iOS browser, it's also possible that it's related to this change:

fix OOM on iOS #23

If anyone would like to test this problem, you can use this 10-part split-gguf of TinyLlama on a device with less than 6GB of RAM: https://huggingface.co/Felladrin/gguf-sharded-TinyLlama-1.1B-1T-OpenOrca/resolve/main/tinyllama-1.1b-1t-openorca.Q3_K_S.shard-00001-of-00010.gguf.
(If an even larger model is needed, there are also Q4_K_M and Q8_0 versions available in this repository.)

The text was updated successfully, but these errors were encountered:

ngxson · 2024-05-14T08:50:27Z

Probably because the out of memory error is now thrown internally by cpp code (and not by worker js code). Can you confirm if you see error from llama_new_context_with_model? (ref. #12 (comment))

flatsiedatsie · 2024-05-14T15:53:40Z

Sounds like the same issue I came across here?

With version 1.8 Wllama doesn't seem to raise an error though? It just just states the issue in the console. But my code thinks the model has loaded OK, even though it hasn't. Is there a way to get the failed state?

// Doh, you already figured that out :-)

felladrin · 2024-05-14T15:53:45Z

Thanks for the reference. There is a lot of good info in that thread!

I've just noticed a pattern regarding this issue:

The loadModelFromUrl function is only hanging when running multi-threaded. It doesn't even print the warnings on the console. When I connect the mobile to Safari DevTools, I see the following:

From the screenshot, we can see that the device was using n_threads == 2.

When I force it to use n_threads = 1 with the same model, it then prints the warnings and also triggers the error, allowing me to catch it with the try/catch.

Indicating that the loadModelFromUrl is only not completing when using a too-large-model with multi-threading.

PS: I haven't tested your changes from #34.

felladrin · 2024-05-19T18:28:20Z

ℹ️ This issue (loadModelFromUrl hanging when used with multi-threading and loading a too-large model) is still present in v1.9.0.
I tried adjusting the stepBytes and maxBytes from getWasmMemory() to see if any combination could resolve the issue, but unfortunately, I couldn't find a solution. I've run out of ideas. Since it's running fine with small models, I've decided not to use large models (> 1 billion parameters) on mobile anymore.

Note: iOS browsers don't clear the memory of web workers properly when reloading the page. For instance, if the page is reloaded before calling wllama.exit(), trying to use wllama.loadModelFromUrl() will run with even lower memory than usual. So this hanging was more evident after reloading the page and re-running the inference.
Found these related issues that, unfortunately, don't have a solution:

ngxson · 2024-05-21T20:01:41Z

@felladrin Sorry for the late response. Yeah seems like there are a lot of problems with Safari on iOS.

This issue (loadModelFromUrl hanging when used with multi-threading and loading a too-large model) is still present in v1.9.0.

Do you get the same error as last time (i.e. Aborted()) ?

iOS browsers don't clear the memory of web workers properly when reloading the page.

Probably we can make the web worker to exit itself when the page reload. But I still doubt doing this, since this should be responsibility of the browser. I'll have a look on this when I have more time.

felladrin · 2024-05-21T21:12:24Z

Ah, no worries @ngxson!
My intention was just to document it, so other devs facing this issue can get some clue. But I'm not waiting it to be fixed, as it's working pretty fine with models with less than 500M params.

Not sure when I'll try larger models on iOS again, but if I find anything new, I'll share here!

felladrin · 2024-10-04T08:16:10Z

After the launch of iOS 18, most of those issues related to out-of-memory seem to have been gone! 🎉

I noticed that they (Apple) now force Safari to hard-reload the page when it finds it with too low memory. After the reload, with more memory available, the models usually run fine. Wllama can easily run 1B models (e.g. Llama 3.2 1B Q4_K_M) in <6GB-Memory iPhone.

flatsiedatsie · 2024-10-04T11:04:20Z

Even the next iPhone SE is rumored to have 8GB of memory, so Apple is quickly making 8GB the new baseline. (The latest iPhone also comes with at least 8GB).

ngxson mentioned this issue May 14, 2024

Improve error handling on abort() #34

Merged

ngxson added the bug Something isn't working label Jun 25, 2024

felladrin closed this as completed Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models #31

After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models #31

felladrin commented May 13, 2024

ngxson commented May 14, 2024

flatsiedatsie commented May 14, 2024 •

edited

Loading

felladrin commented May 14, 2024

felladrin commented May 19, 2024 •

edited

Loading

ngxson commented May 21, 2024

felladrin commented May 21, 2024

felladrin commented Oct 4, 2024 •

edited

Loading

flatsiedatsie commented Oct 4, 2024

After upgrading to version 1.8.0, the async function loadModelFromUrl is not completing when using large models #31

After upgrading to version 1.8.0, the async function loadModelFromUrl is not completing when using large models #31

Comments

felladrin commented May 13, 2024

ngxson commented May 14, 2024

flatsiedatsie commented May 14, 2024 • edited Loading

felladrin commented May 14, 2024

felladrin commented May 19, 2024 • edited Loading

ngxson commented May 21, 2024

felladrin commented May 21, 2024

felladrin commented Oct 4, 2024 • edited Loading

flatsiedatsie commented Oct 4, 2024

After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models #31

After upgrading to version 1.8.0, the async function `loadModelFromUrl` is not completing when using large models #31

flatsiedatsie commented May 14, 2024 •

edited

Loading

felladrin commented May 19, 2024 •

edited

Loading

felladrin commented Oct 4, 2024 •

edited

Loading