-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MVP WASM HAL driver and local module loader. #5096
Conversation
CMake only until I have a Bazel build target set up for WAMR. Still has quite a few TODOs, particularly around memory allocation.
memset(&init_args, 0, sizeof(RuntimeInitArgs)); | ||
|
||
// TODO(scotttodd): use Alloc_With_Allocator and forward the host_allocator | ||
init_args.mem_alloc_type = Alloc_With_System_Allocator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah! when we do that we can also look at the new(ish) tracy feature for memory tagging; it lets you group allocations so we'd be able to see all the ones coming from wamr separate from other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this was just fyi, no changes required)
iree_hal_executable_dispatch_state_v0_t* wasm_dispatch_state = | ||
(iree_hal_executable_dispatch_state_v0_t*)native_buffer_dispatch_state; | ||
// Workground count/size. | ||
wasm_dispatch_state->workgroup_count.x = dispatch_state->workgroup_count.x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think we can make this slightly better now and easier to incrementally improve by adding a helper fn that takes a dispatch state and an iree_allocator_t and clones it - that way it can do a smarter single malloc vs. these individual ones, and the iree_allocator_t could be defined to pull from the wasm stack vs the heap. Something like iree_hal_executable_dispatch_state_clone(src, allocator, &dst);
I think "copy the state into this sandbox stack" will be a common thing (it'll be needed with enclaves, and shared memory sandbox processes, etc) so would be nice to get that together now even if just as a function local in here we migrate out later.
It's worth getting right now so that even if we leave it as a malloc here we are only doing one and the change to using the stack/reusing stack slots would be just modifying the iree_allocator_t vs. all this code.
If you haven't seen it, the pattern used elsewhere is to compute the total_size of the struct (sizeof(struct) + sizeof(push_constants[0]) * push_constant_count + sizeof(bindings[0]) * binding_count
) and then slicing the pointers out of it. That ensures it's all contiguous.
wasm_dispatch_state->imports = NULL; | ||
|
||
// Clone workgroup_id. | ||
void* native_buffer_workgroup_id = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these too can be part of the single allocation for the dispatch state (following it in memory), with the pointers sliced out the same way.
return iree_make_status(IREE_STATUS_RESOURCE_EXHAUSTED, "malloc failed"); | ||
} | ||
int32_t* wasm_binding_ptrs = (int32_t*)native_buffer_binding_ptrs; | ||
// HACK: remember the last native buffer (usually the single output buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the scary one :P
Every single fiber of my being is telling me to say this should be fixed now; the stack stuff can be fixed later (along with threading), but this is a biggie. It's worth implementing the HAL allocator to malloc/free from the wasm heap so all of this can be avoided. MVPs should at least approximate the final state.
If there's some bigger blockers preventing that I could be convinced, but I suspect there aren't (given that cuda/vulkan/etc work with their weird allocators). Worth at least copy/pasting the file and giving it a shot now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other big thing this will force is how to share memory across the wasm modules - which may require some reworking of the ownership here as the loader (or HAL allocator) may need to own some wamr things, vs each executable owning the entire world. We don't want to get into the state where you can only ever have a single executable loaded per HAL device, and currently that would be the case with a naive extension (the HAL allocator, if created independent of the loader, won't be able to share memory across executables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looking into this. First step is figuring out how to set a different device_allocator
on task_device
. That isn't configurable right now, and is set here: https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.c#L141-L144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(maybe move discussion to Discord?)
How about adding a (optional?) factory function for creating a device_allocator
to iree_hal_task_device_params_t
, which is passed to iree_hal_task_driver_create
? I can give that a try, at least to get started on writing a WASM allocator.
https://github.com/google/iree/blob/51b6453e5772ecaa92e65ff6a85ce7f32a9c7869/iree/hal/local/task_device.h#L27-L39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some discussion moved to Discord here.
Still figuring out an implementation path.
iree::hal::api | ||
iree::hal::local | ||
iree::schemas::dylib_executable_def_c_fbs | ||
wasm_micro_runtime_vmlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few build errors on Android with this: logs
../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: unknown token in expression
push %rbp
^
../third_party/wasm-micro-runtime/core/iwasm/common/arch/invokeNative_em64.s:19:10: error: invalid operand
push %rbp
^
../third_party/wasm-micro-runtime/core/shared/platform/common/posix/posix_memmap.c:40:22: error: use of undeclared identifier 'MAP_32BIT'
map_flags |= MAP_32BIT;
^
Remove WASM driver registration from Bazel until it builds there.
Use named initializers for structs.
Closing as obsolete. |
Progress on #2863.
This adds a functional WebAssembly HAL driver capable of running artifacts produced by IREE's compiler using
-iree-hal-target-backends=wasm-llvm-aot -iree-llvm-target-triple=wasm32-unknown-unknown
.DyLibExecutableDef
flatbuffer schema. We can switch from that flatbuffer schema to the system library format described on Define executable C API and rework LLVM target to emit it #3580 when that is ready.