-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What happens when we write to Memory.Buffer #1452
Comments
The data is copied onto the wasm heap on the line
If, for example, that code was run on a wasm application compiled with Emscripten, it indeed would be very bad, as it would overrun the address 0 where a safety check cookie lives (that is used to verify no accidental writes to address 0 occurs), and then continue over to erase the global data section of the application, which generally lives at memory address 8 and upwards. When not using Emscripten (or maybe even Clang/LLVM) to produce WebAssembly, it is up to the developer to design and define whether the Wasm Memory even contains a stack or anything else. Many developers produce WebAssembly code via their own means, so in such scenarios they are in control of designing/defining the semantical meaning of what the Wasm heap memory byte addresses mean and contain.
There is actually no memory being allocated onto the wasm heap here. That wasm heap has already been allocated prior, so the size of that does not change here. If not using LLVM/Clang or Emscripten, or some other compiler (like Rust to Wasm compiler), it would then be up to the application developer to semantically design when the text data copied onto the wasm heap is no longer meaningful and should conceptually be freed.
There is no communication like that needed whatsoever by WebAssembly and JavaScript specifications themselves. In the absense of using a higher system/runtime level WebAssembly producing compiler, it is all up to the developer who is generating the WebAssembly code and the JavaScript code to write "both sides of the fence" to conceptually agree on what and where in the memory to read, and what addresses to regard as being "in use" or "not in use". When using e.g. Emscripten however, there is a strictly defined C-style process memory layout imposed onto the Wasm heap, with global data, process stacks and so on. Under such a scheme, the example code you post would not be correct. Instead of encoding the string to heap address 0 like that, with Emscripten for example one should use e.g. the function
The spec has been designed to be forward compatible with future "multi-memories", but those have not yet been realized (afaik). When using a compiler like LLVM/Clang/Emscripten, there is no support for multi-memories, since the memory address model there is strictly a single heap design only. The spec proposal is here. If you are producing WebAssembly using your own compiler/assembler, then you would be free to use multiple memories at will.
Apparently the memory index is encoded via a
My understanding is that vocabulary is only used as an editorial aid for the spec, kind of a subroutine that is called by other parts of the spec. In this case resetting the memory buffer occurs as step 7 when the heap memory is The |
wow @juj thanks for this amazing answer! 😀 👍
I have since learned that the stack while executing code is a separate data structure / concept from a "memory" in wasm. (because it is based on the harvard architecture) does this assumption still hold when compiling C with emscripten or clang to wasm? the local variables are not stored in the wasm memory object, right? |
Essentially yes. In wasm all control flow (function frames, function arguments, and most function local variables) will reside in a hidden/secure stack that the wasm VM implements, and the Wasm code itself will not be able to observe the existence of this stack. Indeed all the function However when compiling C programs with Emscripten and/or Clang/LLVM, there will exist a second stack that is the LLVM implemented stack - that stack will live in the wasm heap. As far as LLVM thinks, this stack abstraction is the same as the stack in general on native programs, but in WebAssembly programs, it will end up containing only data that would not work well to live in the wasm VM stack. This generally includes two different scenarios:
These two scenarios are cases where it would not be feasible to utilize the hidden native Wasm stack (since one cannot reference data in that stack by address, and allocating large amounts of |
The result of this is that we cant use shared memories with emscripten / clang compiled wasm modules, right? |
Emscripten uses shared memory for thread support in WebAssembly using web workers. Each worker/thread keeps the shadow stack in a different region of the shared memory. A pointer to the shadow stack is kept in a WebAssembly global which is unique per worker. |
@Macil thanks for pointing me to this! 🙂👍 I also found this stackoverflow answer I also thought about how I might use two wasm modules on one memory object together without using emscripten. The very save and naive way would be to just pick memory addresses that are very far apart and link the two modules with --global-base to those addresses. But as I understand wasm allocates all the memory in between, correct? wasm memory is not like typical virtual process memory so if I pick a high --global-base all the memory from 0 to there will be allocated, right? |
Emscripten establishes a fixed size linear stack address range for each worker thread: For the main browser thread, the stack is allocated statically at startup. For worker pthreads, the stack is allocated from the dynamic memory region that is governed by a
The semantics as to whether all memory pages in the linear WebAssembly heap are committed and paged in to physical memory, or whether individual pages remain virtually allocated (and not committed to physical memory) is unfortunately not mandated at all by the WebAssembly specification. As a result, there is some awkwardness that ensues, see issue #1397. |
The question here seems answered; if there are any further questions; please file new issues! |
I saw this document on the memory.buffer Arraybuffer object:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/Memory
It says this:
But I am really confused by it.
I often see code like this: https://github.com/Deecellar/bottom-zig/blob/26110e2a97a392b8c742747388b1ec46633be343/public/index.html#L149
where data is written to the buffer object. But where is the data written, at what index? How does it not overwrite the stack and other application state on the heap? Who is responsible for "freeing that memory" aka telling the wasm module that this data is not used anymore.
TLDR: how do the wasm module and the javascript code / Webassembly.memory.buffer communicate what part of the memory (address space) is save to write to and what parts can be written to again?
Also: can we have more than one "memory" the Note in the spec confused me: https://twitter.com/spirobel/status/1538410200343842816 because it seems like in this example: https://twitter.com/spirobel/status/1538411361733992448 we have 2 "memories" aka the implicit memory at index 0 and the js memory at index 1.
So how many memories can we actually have? is there an upper limit?
one last question:
There is also the capability to reset the memory buffer as described here: https://webassembly.github.io/spec/js-api/index.html#reset-the-memory-buffer but it is not exposed in the API here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/Memory so does this happen behind the scenes whenever we do "new Uint8Array(memory.buffer);" again with some new data?
The text was updated successfully, but these errors were encountered: