Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map WASM memory allocation APIs to IREE's HAL #5137

Open
ScottTodd opened this issue Mar 17, 2021 · 7 comments
Open

Map WASM memory allocation APIs to IREE's HAL #5137

ScottTodd opened this issue Mar 17, 2021 · 7 comments
Labels
hal/api IREE's public C hardware abstraction layer API runtime Relating to the IREE runtime library

Comments

@ScottTodd
Copy link
Member

Splitting this off from #5096 and a discussion on Discord here.

TL;DR: WAMR allocates memory per module (executable). IREE wants to define an allocator up a level, shared across executables. What should we do?


WASM runtimes limit what memory WASM modules can access to a single contiguous memory address range that the module can suballocate within. Applications can typically create this block of memory, resize it, offer it to instantiated modules, etc. See this article for a pretty good overview.

IREE follows several APIs (like Vulkan) in using a hierarchical setup going from application contexts down to executables:

driver registry (iree_hal_driver_registry_t)
  - driver (iree_hal_driver_t, VkInstance)
    - device (iree_hal_device_t, VkPhysicalDevice + VkDevice)
    - device
  - driver
    - device
    - device

executable (iree_hal_executable_t, VkShaderModule + VkPipeline[])
  - executable instances are created by devices and _may_ be cached/reused across devices by drivers

hal allocator (iree_hal_allocator_t)
  - may be independent from drivers/devices (e.g. for CPU implementations), or linked to a specific driver/device
  - each device has _one_ allocator, which it uses for all of its loaded executables

drivers and devices should be isolated from each other, except where resource sharing is explicitly used

While implementing a WASM HAL driver using WAMR in #5096, we found that WAMR has a different memory allocation architecture in its "iwasm" VM core:

wasm_runtime_full_init (static)
  - wasm_runtime_malloc: allocate from runtime memory environment

wasm_runtime_load(module_bytes) -> wasm_module_t

wasm_runtime_instantiate(wasm_module_t, heap_size) -> wasm_module_inst_t
  - wasm_runtime_module_malloc(wasm_module_inst_t, size): allocate from WASM module instance

wasm_runtime_create_exec_env(wasm_module_inst_t, stack_size) -> wasm_exec_env_t

So,

  • IREE has one "allocator" per "device", shared between executables
  • WAMR has one memory space per module (executable)

At face value, an "IREE WAMR device" would need to either limit itself to one executable, or it would need multiple allocators.

WAMR has a WAMR_BUILD_SHARED_MEMORY CMake option / WASM_ENABLE_SHARED_MEMORY C define that could help, but we still want isolation between drivers/devices. One of the main reasons we'd be using WASM would be for the memory sandbox.

Notably, the WASM C API, which WAMR partially implements has a different model:

wasm_engine_t
  - wasm_store_t
    - wasm_memory_t
    - wasm_module_t

that seems to map more directly to IREE's architecture (device or driver would have wasm_engine_t, device would have wasm_store_t and wasm_memory_t, executable would have wasm_module_t).


Here are a few of our options, none of which seem too favorable:

(A) Wait for WAMR to implement the latest WASM C API and use that, instead of their "iwasm" VM core

(B) Continue using WAMR's "iwasm" VM core, finding a workaround using shared memory

(C) Restrict devices in IREE's WASM HAL to one executable

(D) Externalize memory using native read/write functions loading/storing from our own heap, making each wasm environment use no local heap and contain compute-only code. This would require some further ahead-of-time compilation work on our end (turning loads/stores into calls, ensuring these loads/stores are handled in pages since they would be slow individually).

(E) Use a different WASM runtime (see this list). We picked WAMR as an initial target for its low footprint, portability, performance, and ease of integration (C/C++ and CMake with few dependencies). If we want an IREE WASM HAL to serve as a flexible deployment path, we can't really compromise on those points (e.g. by taking a Rust dependency or using a runtime that can't run on embedded systems).

@ScottTodd ScottTodd added runtime Relating to the IREE runtime library hal/api IREE's public C hardware abstraction layer API labels Mar 17, 2021
ScottTodd added a commit that referenced this issue Apr 5, 2021
#5312)

Reverts #5077

We won't be using this submodule while #5137 is blocking #5096.
@ScottTodd
Copy link
Member Author

I spent some time evaluating wasm3 and Wasmtime's APIs from this perspective:

Both seem architecturally more flexible when it comes to memory allocation and memory space management, but I don't quite see a way to satisfy our requirements yet. More discussions on Discord about this here (wasm3) and here (wasmtime).

Generally, we want to allocate a block of memory to be managed by an IREE "device" and shared between wasm modules (IREE "executables"). I think this can be solved using memory exports and imports (it's basically how SharedArrayBuffer is used on the web?), though I'm still searching for concrete examples and documentation from these runtimes. We're also wondering about thread safety (Discord discussion) - locking to allocate or load/unload a module is fine but we shouldn't need to take a lock to safely call stateless functions, for example.

@mykmartin
Copy link

Quick question: would the WebAssembly multi-memory proposal be useful for this? To avoid the external call cost, the device/HAL allocator logic could be baked into each module but operating on a single buffer shared between all modules, with other buffer(s) allocated per-module for isolated memory space as needed.

@benvanik
Copy link
Collaborator

benvanik commented Jul 16, 2021

@mykmartin it may be (my hope is that it is :)

Our usage really needs there to be a way to allocate a growable block of memory independent of any wasm module that we can then provide to each wasm module as we load them. The default memory of each wasm module is where stacks would live while the bulk data we'd work with would come from the shared memory.

It's hard for me to see in the spec if this is something the spec even cares about or if it's purely something related to the engines. From what we looked into most engines assume that they create all the memory for the loaded modules instead of allowing imports. The proposal spec looks like it would be very compatible with this approach as we could do this with just two data segments and we can easily assign the pointer address spaces in LLVM that end up as the data segment identifiers on instructions from our generated code. Then we just need the engines to have an "allocate a growable memory instance" and "import this memory instance during module instantiation".

We also could get by without multi-memory support if we could do the same "allocate a growable memory instance" and "use this memory instance during module instantiation" - we'd then assign the stack offsets ourselves and have all modules share the same exact memory. In browser land this would be like having a SharedArrayBuffer that all loaded wasm modules used - which would be useful (in my previous life I worked on Google Maps and wanted the same feature there for multithreaded decoding into staging buffers for GPU upload).

(posted some more details about what we are doing here: #2863 (comment), which shows where multi-memory may help)

@benvanik
Copy link
Collaborator

The other thing multi-memory may allow in the future is importing of large read-only constant buffers. In ML inference this would be things like your model weights (which can be 10-100MB, or much larger 2GB). Today we would need to copy those into the wasm-accessible memory - which is similar to what we need to do for GPUs with discrete memory - but it'd be nice to not have to given that the bytes already exist. If we could create a wasm_memory_t with existing read-only contents then we could import that without the full alloc+copy.

@mykmartin
Copy link

mykmartin commented Jul 20, 2021

fyi the wasmtime team have just added the multi-memory option in the C API: bytecodealliance/wasmtime#3066 - would that have an impact on the analysis in Scott's comment?

@benvanik
Copy link
Collaborator

fyi the wasmtime team have just added the multi-memory option in the C API

Nice! That would require a small bit of compiler work to enable (just tagging LLVM pointers with the right address space, I believe) but nothing major - then we could evaluate an allocator that sliced from a wasm memory block and import that into each executable in the prototype wasmtime implementation Scott has.

@aaron-schneider
Copy link

Ho there! This bug hasn't been updated in a long time. Good intentions and all, but we're moving this to the backlog. Feel free to bring it back if you think there's a reasonable chance it'll get worked on in the next 6mo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hal/api IREE's public C hardware abstraction layer API runtime Relating to the IREE runtime library
Projects
No open projects
Status: No status
Development

No branches or pull requests

4 participants