Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens when we write to Memory.Buffer #1452

Closed
spirobel opened this issue Jun 19, 2022 · 8 comments
Closed

What happens when we write to Memory.Buffer #1452

spirobel opened this issue Jun 19, 2022 · 8 comments

Comments

@spirobel
Copy link

spirobel commented Jun 19, 2022

I saw this document on the memory.buffer Arraybuffer object:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/Memory
It says this:

The WebAssembly.Memory object is a resizable ArrayBuffer or SharedArrayBuffer that holds the raw bytes of memory accessed by a WebAssembly Instance.

But I am really confused by it.
I often see code like this: https://github.com/Deecellar/bottom-zig/blob/26110e2a97a392b8c742747388b1ec46633be343/public/index.html#L149

                  getText: () => {
                        // We get the text from the input box 
                        let text = document.getElementById("text").value;
                        // We convert the text to a Uint8Array
                        let textArray = new Uint8Array(obj.instance.exports.memory.buffer);
                        let textLen = new TextEncoder().encode(text);
                        textArray.set(textLen);

                        // We return the Uint8Array
                        return textArray.byteOffset;
                    },

where data is written to the buffer object. But where is the data written, at what index? How does it not overwrite the stack and other application state on the heap? Who is responsible for "freeing that memory" aka telling the wasm module that this data is not used anymore.
TLDR: how do the wasm module and the javascript code / Webassembly.memory.buffer communicate what part of the memory (address space) is save to write to and what parts can be written to again?

Also: can we have more than one "memory" the Note in the spec confused me: https://twitter.com/spirobel/status/1538410200343842816 because it seems like in this example: https://twitter.com/spirobel/status/1538411361733992448 we have 2 "memories" aka the implicit memory at index 0 and the js memory at index 1.
So how many memories can we actually have? is there an upper limit?

one last question:
There is also the capability to reset the memory buffer as described here: https://webassembly.github.io/spec/js-api/index.html#reset-the-memory-buffer but it is not exposed in the API here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/Memory so does this happen behind the scenes whenever we do "new Uint8Array(memory.buffer);" again with some new data?

@juj
Copy link

juj commented Jun 23, 2022

But where is the data written, at what index?

The data is copied onto the wasm heap on the line textArray.set(textLen); starting at index 0.

How does it not overwrite the stack and other application state on the heap?

If, for example, that code was run on a wasm application compiled with Emscripten, it indeed would be very bad, as it would overrun the address 0 where a safety check cookie lives (that is used to verify no accidental writes to address 0 occurs), and then continue over to erase the global data section of the application, which generally lives at memory address 8 and upwards.

When not using Emscripten (or maybe even Clang/LLVM) to produce WebAssembly, it is up to the developer to design and define whether the Wasm Memory even contains a stack or anything else. Many developers produce WebAssembly code via their own means, so in such scenarios they are in control of designing/defining the semantical meaning of what the Wasm heap memory byte addresses mean and contain.

Who is responsible for "freeing that memory" aka telling the wasm module that this data is not used anymore.

There is actually no memory being allocated onto the wasm heap here. That wasm heap has already been allocated prior, so the size of that does not change here. If not using LLVM/Clang or Emscripten, or some other compiler (like Rust to Wasm compiler), it would then be up to the application developer to semantically design when the text data copied onto the wasm heap is no longer meaningful and should conceptually be freed.

how do the wasm module and the javascript code / Webassembly.memory.buffer communicate what part of the memory (address space) is save to write to and what parts can be written to again?

There is no communication like that needed whatsoever by WebAssembly and JavaScript specifications themselves. In the absense of using a higher system/runtime level WebAssembly producing compiler, it is all up to the developer who is generating the WebAssembly code and the JavaScript code to write "both sides of the fence" to conceptually agree on what and where in the memory to read, and what addresses to regard as being "in use" or "not in use".

When using e.g. Emscripten however, there is a strictly defined C-style process memory layout imposed onto the Wasm heap, with global data, process stacks and so on. Under such a scheme, the example code you post would not be correct. Instead of encoding the string to heap address 0 like that, with Emscripten for example one should use e.g. the function stringToNewUTF8, or manually _malloc and then later _free the memory, since Emscripten provides a heap memory allocator into a dynamic heap memory area on the Wasm memory.

Also: can we have more than one "memory"

The spec has been designed to be forward compatible with future "multi-memories", but those have not yet been realized (afaik). When using a compiler like LLVM/Clang/Emscripten, there is no support for multi-memories, since the memory address model there is strictly a single heap design only. The spec proposal is here.

If you are producing WebAssembly using your own compiler/assembler, then you would be free to use multiple memories at will.

So how many memories can we actually have? is there an upper limit?

Apparently the memory index is encoded via a i32, so the limit would likely be 2^31 distinct memories.

There is also the capability to reset the memory buffer
so does this happen behind the scenes whenever we do "new Uint8Array(memory.buffer);" again with some new data?

My understanding is that vocabulary is only used as an editorial aid for the spec, kind of a subroutine that is called by other parts of the spec. In this case resetting the memory buffer occurs as step 7 when the heap memory is grown, which is a publicly provided API.

The buffer of a memory is readonly, so it is not possible to manually reassign a new buffer over it to reset the memory later.

@spirobel
Copy link
Author

wow @juj thanks for this amazing answer! 😀 👍

When using e.g. Emscripten however, there is a strictly defined C-style process memory layout imposed onto the Wasm heap, with global data, process stacks and so on. Under such a scheme, the example code you post would not be correct. Instead of encoding the string to heap address 0 like that, with Emscripten for example one should use e.g. the function stringToNewUTF8, or manually _malloc and then later _free the memory, since Emscripten provides a heap memory allocator into a dynamic heap memory area on the Wasm memory.

I have since learned that the stack while executing code is a separate data structure / concept from a "memory" in wasm. (because it is based on the harvard architecture) does this assumption still hold when compiling C with emscripten or clang to wasm? the local variables are not stored in the wasm memory object, right?

@juj
Copy link

juj commented Jun 23, 2022

the stack while executing code is a separate data structure / concept from a "memory" in wasm
does this assumption still hold when compiling C

Essentially yes. In wasm all control flow (function frames, function arguments, and most function local variables) will reside in a hidden/secure stack that the wasm VM implements, and the Wasm code itself will not be able to observe the existence of this stack. Indeed all the function locals will reside in this stack.

However when compiling C programs with Emscripten and/or Clang/LLVM, there will exist a second stack that is the LLVM implemented stack - that stack will live in the wasm heap. As far as LLVM thinks, this stack abstraction is the same as the stack in general on native programs, but in WebAssembly programs, it will end up containing only data that would not work well to live in the wasm VM stack. This generally includes two different scenarios:

  • function local variables that need to be referenceable via a linear address, e.g. data that will be passed by address to another function, and arrays that are indexed via an address, and
  • moderate to large structs

These two scenarios are cases where it would not be feasible to utilize the hidden native Wasm stack (since one cannot reference data in that stack by address, and allocating large amounts of locals for a large struct would be very costly in terms of code size), hence this kind of a "spillover" stack is used instead.

@spirobel
Copy link
Author

However when compiling C programs with Emscripten and/or Clang/LLVM, there will exist a second stack that is the LLVM implemented stack - that stack will live in the wasm heap.

The result of this is that we cant use shared memories with emscripten / clang compiled wasm modules, right?
If we created a shared memory like this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/Memory#creating_a_shared_memory
the stacks of the separate wasm modules accessing the shared memory would interfere, right? And because we can currently only have one memory object per module instance, this is essentially a show stopper for shared memory under these circumstances, correct?
or am I getting this wrong and the modules can still have different memory under the hood?

@Macil
Copy link

Macil commented Jun 24, 2022

Emscripten uses shared memory for thread support in WebAssembly using web workers. Each worker/thread keeps the shadow stack in a different region of the shared memory. A pointer to the shadow stack is kept in a WebAssembly global which is unique per worker.

@spirobel
Copy link
Author

A pointer to the shadow stack is kept in a WebAssembly global which is unique per worker.

@Macil thanks for pointing me to this! 🙂👍 I also found this stackoverflow answer
but I am still unsure how emscripten keeps the two (or more) stacks from growing into each other.

I also thought about how I might use two wasm modules on one memory object together without using emscripten. The very save and naive way would be to just pick memory addresses that are very far apart and link the two modules with --global-base to those addresses. But as I understand wasm allocates all the memory in between, correct? wasm memory is not like typical virtual process memory so if I pick a high --global-base all the memory from 0 to there will be allocated, right?

@juj
Copy link

juj commented Jun 24, 2022

but I am still unsure how emscripten keeps the two (or more) stacks from growing into each other.

Emscripten establishes a fixed size linear stack address range for each worker thread:

https://github.com/emscripten-core/emscripten/blob/831f8ddf7d93aa9f728ea60efe001d75c72df19e/system/lib/compiler-rt/stack_limits.S#L65-L71

For the main browser thread, the stack is allocated statically at startup. For worker pthreads, the stack is allocated from the dynamic memory region that is governed by a malloc implementation (either dlmalloc or emmalloc). For Wasm Workers, the decision is left up to the creator of the Wasm Worker to define whether to statically or dynamically allocate the stack (and TLS slots) for each Worker.

But as I understand wasm allocates all the memory in between, correct?

The semantics as to whether all memory pages in the linear WebAssembly heap are committed and paged in to physical memory, or whether individual pages remain virtually allocated (and not committed to physical memory) is unfortunately not mandated at all by the WebAssembly specification. As a result, there is some awkwardness that ensues, see issue #1397.

@sunfishcode
Copy link
Member

The question here seems answered; if there are any further questions; please file new issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants