Add MVP WASM HAL driver and local module loader. #5096

ScottTodd · 2021-03-12T22:45:23Z

Progress on #2863.

This adds a functional WebAssembly HAL driver capable of running artifacts produced by IREE's compiler using -iree-hal-target-backends=wasm-llvm-aot -iree-llvm-target-triple=wasm32-unknown-unknown.

The new driver uses the WebAssembly Micro Runtime (WAMR) to load and call into WASM modules. For now this is using WAMR's "iwasm VM core", but we can later look into other runtimes or the standardized wasm-c-api (Investigate wasm-c-api to allow other WASM backends #4024).
WASM modules are embedded within our existing DyLibExecutableDef flatbuffer schema. We can switch from that flatbuffer schema to the system library format described on Define executable C API and rework LLVM target to emit it #3580 when that is ready.
CMake only until I have a Bazel build target set up for WAMR.
Not yet wired up to any e2e tests, but I have run some simple models, MNIST, and BERT through this successfully on my Windows machine.
Still has quite a few TODOs, particularly around memory allocation.

CMake only until I have a Bazel build target set up for WAMR. Still has quite a few TODOs, particularly around memory allocation.

iree/hal/wasm/registration/driver_module.c

benvanik · 2021-03-12T22:56:03Z

iree/hal/local/loaders/wasm_module_loader.c

+  memset(&init_args, 0, sizeof(RuntimeInitArgs));
+
+  // TODO(scotttodd): use Alloc_With_Allocator and forward the host_allocator
+  init_args.mem_alloc_type = Alloc_With_System_Allocator;


yeah! when we do that we can also look at the new(ish) tracy feature for memory tagging; it lets you group allocations so we'd be able to see all the ones coming from wamr separate from other things

(this was just fyi, no changes required)

iree/hal/local/loaders/wasm_module_loader.c

benvanik · 2021-03-12T23:04:00Z

iree/hal/local/loaders/wasm_module_loader.c

+  iree_hal_executable_dispatch_state_v0_t* wasm_dispatch_state =
+      (iree_hal_executable_dispatch_state_v0_t*)native_buffer_dispatch_state;
+  // Workground count/size.
+  wasm_dispatch_state->workgroup_count.x = dispatch_state->workgroup_count.x;


hmm I think we can make this slightly better now and easier to incrementally improve by adding a helper fn that takes a dispatch state and an iree_allocator_t and clones it - that way it can do a smarter single malloc vs. these individual ones, and the iree_allocator_t could be defined to pull from the wasm stack vs the heap. Something like iree_hal_executable_dispatch_state_clone(src, allocator, &dst);

I think "copy the state into this sandbox stack" will be a common thing (it'll be needed with enclaves, and shared memory sandbox processes, etc) so would be nice to get that together now even if just as a function local in here we migrate out later.

It's worth getting right now so that even if we leave it as a malloc here we are only doing one and the change to using the stack/reusing stack slots would be just modifying the iree_allocator_t vs. all this code.

If you haven't seen it, the pattern used elsewhere is to compute the total_size of the struct (sizeof(struct) + sizeof(push_constants[0]) * push_constant_count + sizeof(bindings[0]) * binding_count) and then slicing the pointers out of it. That ensures it's all contiguous.

benvanik · 2021-03-12T23:06:43Z

iree/hal/local/loaders/wasm_module_loader.c

+  wasm_dispatch_state->imports = NULL;
+
+  // Clone workgroup_id.
+  void* native_buffer_workgroup_id = NULL;


these too can be part of the single allocation for the dispatch state (following it in memory), with the pointers sliced out the same way.

benvanik · 2021-03-12T23:08:51Z

iree/hal/local/loaders/wasm_module_loader.c

+    return iree_make_status(IREE_STATUS_RESOURCE_EXHAUSTED, "malloc failed");
+  }
+  int32_t* wasm_binding_ptrs = (int32_t*)native_buffer_binding_ptrs;
+  // HACK: remember the last native buffer (usually the single output buffer)


This is the scary one :P
Every single fiber of my being is telling me to say this should be fixed now; the stack stuff can be fixed later (along with threading), but this is a biggie. It's worth implementing the HAL allocator to malloc/free from the wasm heap so all of this can be avoided. MVPs should at least approximate the final state.

If there's some bigger blockers preventing that I could be convinced, but I suspect there aren't (given that cuda/vulkan/etc work with their weird allocators). Worth at least copy/pasting the file and giving it a shot now.

The other big thing this will force is how to share memory across the wasm modules - which may require some reworking of the ownership here as the loader (or HAL allocator) may need to own some wamr things, vs each executable owning the entire world. We don't want to get into the state where you can only ever have a single executable loaded per HAL device, and currently that would be the case with a naive extension (the HAL allocator, if created independent of the loader, won't be able to share memory across executables).

Ok, looking into this. First step is figuring out how to set a different device_allocator on task_device. That isn't configurable right now, and is set here: https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.c#L141-L144

_{(maybe move discussion to Discord?)}

How about adding a (optional?) factory function for creating a device_allocator to iree_hal_task_device_params_t, which is passed to iree_hal_task_driver_create? I can give that a try, at least to get started on writing a WASM allocator.
https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.h#L27-L39

Some discussion moved to Discord here.

Still figuring out an implementation path.

iree/hal/local/loaders/wasm_module_loader.c

ScottTodd · 2021-03-12T23:43:08Z

iree/hal/local/loaders/BUILD

+    iree::hal::api
+    iree::hal::local
+    iree::schemas::dylib_executable_def_c_fbs
+    wasm_micro_runtime_vmlib


A few build errors on Android with this: logs

../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: unknown token in expression push %rbp ^ ../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: invalid operand push %rbp ^

../third_party/wasm-micro-runtime/core/shared/platform/common/posix/posix_memmap.c:40:22: error: use of undeclared identifier 'MAP_32BIT' map_flags |= MAP_32BIT; ^

Remove WASM driver registration from Bazel until it builds there.

Use named initializers for structs.

#5312) Reverts #5077 We won't be using this submodule while #5137 is blocking #5096.

benvanik · 2022-07-01T01:29:53Z

Closing as obsolete.

Add MVP WASM HAL driver and local module loader.

49f25e6

CMake only until I have a Bazel build target set up for WAMR. Still has quite a few TODOs, particularly around memory allocation.

ScottTodd added runtime Relating to the IREE runtime library hal/cpu Runtime Host/CPU-based HAL backend labels Mar 12, 2021

ScottTodd requested a review from benvanik March 12, 2021 22:45

google-cla bot added the cla: yes label Mar 12, 2021

ScottTodd force-pushed the wamr-hal branch from edf4bea to 51234cd Compare March 12, 2021 23:14

benvanik requested changes Mar 12, 2021

View reviewed changes

ScottTodd commented Mar 12, 2021

View reviewed changes

ScottTodd added 4 commits March 15, 2021 09:55

fixup! Add MVP WASM HAL driver and local module loader.

c5117c3

Remove WASM driver registration from Bazel until it builds there.

fixup! Add MVP WASM HAL driver and local module loader.

76ed4ed

Use named initializers for structs.

Trim stack size and omit default value.

ba1d80c

Add issue number for function import TODOs.

fd857eb

ScottTodd force-pushed the wamr-hal branch from 51234cd to fd857eb Compare March 16, 2021 00:35

This was referenced Mar 17, 2021

Map WASM memory allocation APIs to IREE's HAL #5137

Open

Investigate WASM as a HAL executable format #2863

Open

ScottTodd mentioned this pull request Apr 5, 2021

Revert "Add wasm-micro-runtime submodule and get building with CMake." #5312

Merged

ScottTodd added a commit that referenced this pull request Apr 5, 2021

Revert "Add wasm-micro-runtime submodule and get building with CMake." (

fd64070

#5312) Reverts #5077 We won't be using this submodule while #5137 is blocking #5096.

ScottTodd mentioned this pull request Apr 19, 2021

Adding experimental synchronous executor using inline command buffers. #5509

Merged

GMNGeoffrey removed the cla: yes label Feb 11, 2022

benvanik closed this Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MVP WASM HAL driver and local module loader. #5096

Add MVP WASM HAL driver and local module loader. #5096

ScottTodd commented Mar 12, 2021

benvanik Mar 12, 2021

benvanik Mar 12, 2021

benvanik Mar 12, 2021

benvanik Mar 12, 2021

benvanik Mar 12, 2021

benvanik Mar 12, 2021

ScottTodd Mar 15, 2021

ScottTodd Mar 15, 2021

ScottTodd Mar 16, 2021

ScottTodd Mar 12, 2021

benvanik commented Jul 1, 2022

Add MVP WASM HAL driver and local module loader. #5096

Add MVP WASM HAL driver and local module loader. #5096

Conversation

ScottTodd commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benvanik commented Jul 1, 2022