Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Crash on runtime upgrade related to wasmtime (Raspbian only) #12538

Open
2 tasks done
nazar-pc opened this issue Oct 20, 2022 · 9 comments
Open
2 tasks done

Crash on runtime upgrade related to wasmtime (Raspbian only) #12538

nazar-pc opened this issue Oct 20, 2022 · 9 comments
Assignees

Comments

@nazar-pc
Copy link
Contributor

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Initially reported by a user at https://forum.subspace.network/t/failed-to-allocate-bytes-exception-when-running-docker-on-aarch64/606 is a crash that is essentially this:

Thread 'main' panicked at 'Failed to make runtime API call during last archived block search: Application(VersionInvalid("cannot create the wasmtime engine: failed to create memory pool mapping: mmap failed to allocate 0x3080000000 bytes: Cannot allocate memory (os error 12)"))', /code/crates/sc-consensus-subspace/src/archiver.rs:72

Steps to reproduce

This happens when node tries to apply runtime upgrade, up until that point node worked fine.

Affected hardware is Raspberry PI 4 8G with 64-bit Raspbian OS.

I tried Rock64 4G and Orange PI 3 2G with sufficient swap, both of which worked fine with Ubuntu 22.04.
Then by my suggestion user tried Ubuntu 22.04 on Raspberry PI 4 and it worked fine too.

There is certainly enough RAM, not to mention there was 8-16G of swap on top of that.

We double checked that memory overcommit was allowed in the kernel on Raspbian, which was indeed the case.

So for now this seems to be Raspbian-specific, but likely reproducible on other distros.

Not sure what happens there exactly, but I think it'll affect many if not all chains and would be good to figure out what it is. Maybe this is upstream wasmtime issue, not 100% sure yet.

@nazar-pc nazar-pc changed the title Crash on runtime upgrade related to wasmtime (Rasbian only) Crash on runtime upgrade related to wasmtime (Raspbian only) Oct 20, 2022
@bkchr
Copy link
Member

bkchr commented Oct 20, 2022

CC @koute

@koute
Copy link
Contributor

koute commented Oct 20, 2022

Technically it's not failing when trying to allocate memory; it's failing to allocate address space. As long as overcommit is enabled the amount of RAM shouldn't actually make a difference whether this fails or not (unless it's mmaped with PROT_WRITE in which case you can't request more space than there is RAM, but AFAIK it isn't the case here). My educated guess would be that either Raspbian or Docker under Raspbian has a limit set for how much address space one can allocate.

Can you get the user to run ulimit -Hv and ulimit -Sv and report back the output? (It'll print out the address space hard and soft limits respectively.) Ideally both on the host system and from within Docker. (It might me unlimited on the host system, but from within Docker there might be a limit.)

Anyway, if my theory is correct then this isn't really a bug per-se. It is something that we could improve though, once I get through with my executor refactoring. (We currently set quite an excessive maximum number of instances when configuring the executor; that won't be necessary once we'll actually precisely control how many instances can be instantiated in parallel.)

@nazar-pc
Copy link
Contributor Author

Host:

pi@raspberrypi:~ $ ulimit -Hv
unlimited
pi@raspberrypi:~ $ ulimit -Sv
unlimited

Container:

nobody@9c2c148a00bd:/$ ulimit -Hv
unlimited
nobody@9c2c148a00bd:/$ ulimit -Sv
unlimited

@koute
Copy link
Contributor

koute commented Oct 25, 2022

That's interesting. It might be a kernel-level limit or something along those lines. It's a little hard to debug this without actually having access to the hardware. I'll see if I can compare Ubuntu and Raspbian's kernel configs and see if there are any differences that pop up related to this.

@nazar-pc
Copy link
Contributor Author

I can expose Raspberry PI 3 1G over Internet to you, though it is extremely weak and would probably be frustrating to work with.

@koute
Copy link
Contributor

koute commented Oct 25, 2022

So that would be with the affected Raspbian installed on it?

That could help; we could try. Doesn't really matter that the hardware's weak as long as I can poke-and-prod it through SSH with root access and maybe run some test programs; that's usually the fastest way to figure out problems like these.

@nazar-pc
Copy link
Contributor Author

Emailed access details to you

@koute
Copy link
Contributor

koute commented Oct 31, 2022

@nazar-pc Thanks for the access. You can now shut it down.

I've finished investigating the issue. I've verified that it's only possible to reserve 512GB of address space on your Raspberry Pi and anything more will return an out-of-memory error. This is due to the Raspbian's kernel being configured with only a 3 level virtual memory translation table allowing for pointers only up to 39 bits in size. Ubuntu's kernel is not affected by this, and that's why you don't see this issue on Ubuntu.

Apparently this is a known problem: raspberrypi/linux#4375

Also more details here for those interested: https://www.kernel.org/doc/html/v5.8/arm64/memory.html

So technically it's not really a bug on our side, although we are kinda wasteful with the address space we reserve and we definitely could do a better job of using less of it. This is something I can improve by the way of my executor refactoring on which I've restarted work recently. Until I'm finished it'd be the best if you'd advise that people don't run their nodes on Raspbian.

@koute koute self-assigned this Oct 31, 2022
@nazar-pc
Copy link
Contributor Author

Awesome, thanks for the update!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants