-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should the memory layout look like for wasm modules? #81
Comments
Does LLD dynamically grow the stack at runtime? That might be why it's so small. |
AFAIK, no, it'll just run into global data and then underflow (causing weird corruption errors most likely!) cc @CryZe (from our discussion today) |
The stack-size flag is supposedly just And no, it doesn't grow the stack from what I've seen. That wouldn't really work anyway, as even if it would somehow move the stack over, it would run into dlmalloc's memory, so that would be very bad. So yeah on stack overflow it just corrupts all of the data sections. |
Would it make more sense to put the stack at the bottom? That way if it overflows it will actually error (throw an exception in javascript's case) rather than corrupt stuff? |
Other things I've noticed while working on the wasm-to-rust compiler:
|
@Diggsey yeah honestly that seems like a much better idea. I'm not sure how much of all of this is in our hands though. |
I think, we don't have to have a 1024 bytes of memory just wasted for something. I think it has Emscripten roots. Placing the stack at the bottom seems like a good idea. But when the last time I thought about it I realized that it would be a tradeoff. I think the rationale of placing the data section is to make accesses to it more cheaper code-size wise, since I think the data section could have a quite a few places in which it accessed with a literal. And using numerically larger numbers make this values take more space, since literals use LEB128. |
Ah nice catch @CryZe! I updated the OP with how to configure the stack size. cc @sunfishcode and @sbc100, y'all may be interested in this as well. The current question is whether for LLD + Rust a memory layout like this may make more sense:
Or, in other words, we'd have something like:
And then the start of the stack would be something like:
Do y'all have thoughts on a memory layout like that for wasm modules? |
Just a (hopefully useful) note from a passing-by inexperienced Rust developer. Isn't a null pointer already reserved for things like Option<Box> None value? Also, there exists Unique::empty() used by things like Vec or Box, which is basically a null pointer + alignof T, meaning that a non-null pointer that is small (like literal 4) might still have a special meaning. Unless these invariants are changed, I don't think you can allow any valid pointer to go too much down the address space. If you put the stack there, you cannot really automatically detect the cases where the pointer ends up being null (or Unique::empty()), as these will happen before an underflow - each stack allocation would need to check for this explicitly. |
@BigBigos ah an excellent point! In that sense you'd expect that I think for other ones though with |
@alexcrichton not sure what you really mean about It is implemented as null + alignof, so that this invalid pointer is at the very least aligned (not sure, but maybe it is needed for some sort of automatic pointer tagging, similar to how The users of We could change the definition of |
@BigBigos hm do we actually rely on the value after |
@alexcrichton I think (if not - I apologize) you are confusing what
Most such uses in other languages (like C) use a null pointer in this case (null pointer == the value is unitialized). Rust uses As said before, it is not really a problem for |
@BigBigos oh certainly yeah makes sense!
Does |
@alexcrichton You are right, If all the other users of |
Heh indeed! Regardless the null pointer is also an issue |
For safety purposes I'd just keep the lower 16 bytes reserved (stack is 16 byte aligned, so it would have to be a multiple of 16). With that new memory layout we'd round up the stack size to fill up the remaining lower parts of the page anyway, so having the lower 16 bytes reserved isn't that problematic. |
I think we should move the stack to the first part of memory, so that stack overflows are loud, and don't silently corrupt data. I'm imagining changing the layout to this, as mentioned up-thread:
Aside: it is unfortunate we can't ask wasm memory to effectively start at 64KiB, to simulate null guard pages...
Very good point! I wonder what kind of impact this will have... Unsigned LEB128s greater than or equal to 1024 are already at least two bytes in size. Unsigned LEB128s greater than or equal to a megabyte are at least three bytes in size. Unsigned LEB128s greater than or equal to a mebibyte are four bytes in size. So assuming we create a megabyte stack and place data after that, we have 48576 bytes before we jump from three to four bytes per LEB128 global reference. Let's use twiggy-compiled-to-wasm as a test case. (I have no idea how typical it is or not). It is a 603,705 bytes large wasm file:
And it has 21139 bytes of data:
That is small enough that all the globals' addresses would only be three bytes when LEB128 encoded. I don't have an easy was to count memory instructions that operate on this range without resorting to writing more sophisticated scripts, so I don't have an idea of how much of an impact it will ultimately. It is reassuring that a reasonably-sized program's data fits in the three byte range, though. Note: I tried to use |
@alexcrichton what do we need to do to change the memory layouts for default builds? It seems to me like this "should" be a quick PR that benefits anyone trying to debug stack overflows, or potentially needing to debug stack overflows (ie everyone, and even more so because the stack is so small right now...). |
It appears that |
I've filed an LLD bug about tweaking the layout here. |
Happy to at least an option to lld. Maybe we should consider how much flexibility is needed thought. i.e. are there other layout switch we might want? Presumably we don't want to go as far a linker scripts and such? |
@sbc100 ah this comment reminds me of one other possible tweak to the memory layout. Currently our default dynamic memory allocator assumes that all existing memory in the wasm instance is unusable and only uses In that sense it might be most optimal to round up the placement of data to end as close to a page boundary as possible, maximizing what can be used for normal operation. I see that on newer version of LLD from what we're using there's a In any case this isn't necessarily a layout switch and moreso just optimization tweaks! |
Yes, the linker should be telling the runtime exactly where to start the heap so there shouldn't be any wasted space. We export a global called |
Since opening the defualt stack size has increased and |
We don't have the new |
@fitzgen I've opened rust-lang/rust#50543 to track the work to upgrading LLVM, and I've listed this on there (and implemented it in my branch) |
I'm also planning on landing this patch to compress the reloction padding: https://reviews.llvm.org/D46416. Its unrelated, and currently requires passing -O2 to the linker, but I thought you guys might like it might help you remove the need for the wasm-gc tool? |
Nice! I'll add that to our list of "need to make sure we update when we upgrade". We already pass
Oh no worries, the support added long ago with |
Currently wasm's linker, LLD, largely decides the memory layout for wasm modules and how items interact in the module's memory address space. We should either (a) decide that LLD's layout is ok or (b) figure out a different layout for ourselves and work with that!
Currently with LLD it looks like:
That's quite a small stack! It looks like LLD has options to change the start of data
--global-base
butI don't see an option to change the stack. That probably needs to change!- turns out-z stack-size=1024
The text was updated successfully, but these errors were encountered: