Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MVP WASM HAL driver and local module loader. #5096

Closed
wants to merge 5 commits into from

Conversation

ScottTodd
Copy link
Member

Progress on #2863.

This adds a functional WebAssembly HAL driver capable of running artifacts produced by IREE's compiler using -iree-hal-target-backends=wasm-llvm-aot -iree-llvm-target-triple=wasm32-unknown-unknown.

  • The new driver uses the WebAssembly Micro Runtime (WAMR) to load and call into WASM modules. For now this is using WAMR's "iwasm VM core", but we can later look into other runtimes or the standardized wasm-c-api (Investigate wasm-c-api to allow other WASM backends #4024).
  • WASM modules are embedded within our existing DyLibExecutableDef flatbuffer schema. We can switch from that flatbuffer schema to the system library format described on Define executable C API and rework LLVM target to emit it #3580 when that is ready.
  • CMake only until I have a Bazel build target set up for WAMR.
  • Not yet wired up to any e2e tests, but I have run some simple models, MNIST, and BERT through this successfully on my Windows machine.
  • Still has quite a few TODOs, particularly around memory allocation.

CMake only until I have a Bazel build target set up for WAMR.

Still has quite a few TODOs, particularly around memory allocation.
@ScottTodd ScottTodd added runtime Relating to the IREE runtime library hal/cpu Runtime Host/CPU-based HAL backend labels Mar 12, 2021
@ScottTodd ScottTodd requested a review from benvanik March 12, 2021 22:45
@google-cla google-cla bot added the cla: yes label Mar 12, 2021
iree/hal/wasm/registration/driver_module.c Outdated Show resolved Hide resolved
memset(&init_args, 0, sizeof(RuntimeInitArgs));

// TODO(scotttodd): use Alloc_With_Allocator and forward the host_allocator
init_args.mem_alloc_type = Alloc_With_System_Allocator;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah! when we do that we can also look at the new(ish) tracy feature for memory tagging; it lets you group allocations so we'd be able to see all the ones coming from wamr separate from other things

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this was just fyi, no changes required)

iree/hal/local/loaders/wasm_module_loader.c Outdated Show resolved Hide resolved
iree/hal/local/loaders/wasm_module_loader.c Show resolved Hide resolved
iree_hal_executable_dispatch_state_v0_t* wasm_dispatch_state =
(iree_hal_executable_dispatch_state_v0_t*)native_buffer_dispatch_state;
// Workground count/size.
wasm_dispatch_state->workgroup_count.x = dispatch_state->workgroup_count.x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think we can make this slightly better now and easier to incrementally improve by adding a helper fn that takes a dispatch state and an iree_allocator_t and clones it - that way it can do a smarter single malloc vs. these individual ones, and the iree_allocator_t could be defined to pull from the wasm stack vs the heap. Something like iree_hal_executable_dispatch_state_clone(src, allocator, &dst);

I think "copy the state into this sandbox stack" will be a common thing (it'll be needed with enclaves, and shared memory sandbox processes, etc) so would be nice to get that together now even if just as a function local in here we migrate out later.

It's worth getting right now so that even if we leave it as a malloc here we are only doing one and the change to using the stack/reusing stack slots would be just modifying the iree_allocator_t vs. all this code.

If you haven't seen it, the pattern used elsewhere is to compute the total_size of the struct (sizeof(struct) + sizeof(push_constants[0]) * push_constant_count + sizeof(bindings[0]) * binding_count) and then slicing the pointers out of it. That ensures it's all contiguous.

wasm_dispatch_state->imports = NULL;

// Clone workgroup_id.
void* native_buffer_workgroup_id = NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these too can be part of the single allocation for the dispatch state (following it in memory), with the pointers sliced out the same way.

return iree_make_status(IREE_STATUS_RESOURCE_EXHAUSTED, "malloc failed");
}
int32_t* wasm_binding_ptrs = (int32_t*)native_buffer_binding_ptrs;
// HACK: remember the last native buffer (usually the single output buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the scary one :P
Every single fiber of my being is telling me to say this should be fixed now; the stack stuff can be fixed later (along with threading), but this is a biggie. It's worth implementing the HAL allocator to malloc/free from the wasm heap so all of this can be avoided. MVPs should at least approximate the final state.

If there's some bigger blockers preventing that I could be convinced, but I suspect there aren't (given that cuda/vulkan/etc work with their weird allocators). Worth at least copy/pasting the file and giving it a shot now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other big thing this will force is how to share memory across the wasm modules - which may require some reworking of the ownership here as the loader (or HAL allocator) may need to own some wamr things, vs each executable owning the entire world. We don't want to get into the state where you can only ever have a single executable loaded per HAL device, and currently that would be the case with a naive extension (the HAL allocator, if created independent of the loader, won't be able to share memory across executables).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looking into this. First step is figuring out how to set a different device_allocator on task_device. That isn't configurable right now, and is set here: https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.c#L141-L144

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe move discussion to Discord?)

How about adding a (optional?) factory function for creating a device_allocator to iree_hal_task_device_params_t, which is passed to iree_hal_task_driver_create? I can give that a try, at least to get started on writing a WASM allocator.
https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.h#L27-L39

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some discussion moved to Discord here.

Still figuring out an implementation path.

iree/hal/local/loaders/wasm_module_loader.c Show resolved Hide resolved
iree::hal::api
iree::hal::local
iree::schemas::dylib_executable_def_c_fbs
wasm_micro_runtime_vmlib
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few build errors on Android with this: logs

../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: unknown token in expression
    push %rbp
         ^
../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: invalid operand
    push %rbp
         ^
../third_party/wasm-micro-runtime/core/shared/platform/common/posix/posix_memmap.c:40:22: error: use of undeclared identifier 'MAP_32BIT'
        map_flags |= MAP_32BIT;
                     ^

@benvanik
Copy link
Collaborator

benvanik commented Jul 1, 2022

Closing as obsolete.

@benvanik benvanik closed this Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hal/cpu Runtime Host/CPU-based HAL backend runtime Relating to the IREE runtime library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants