Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding experimental synchronous executor using inline command buffers. #5509

Merged
merged 1 commit into from
Apr 20, 2021

Conversation

benvanik
Copy link
Collaborator

@benvanik benvanik commented Apr 19, 2021

This is mostly just the plumbing to enable #4680 by defining a new dylib-sync runtime driver (pending #4298 to cleanup drivers).

This driver does not support recording/replaying command buffers and has some gotchas with semaphores (no deadline waits, probably corner cases for things not covered in the CTS, etc). It should get us something that builds easier on tiny systems, though.

This uses iree/base/synchronization.h for the semaphore implementation. The intent is that for bare-metal systems we can implement a version of iree/base/synchronization.h observing the specific requirements of the platform while still preserving the semaphore semantics - it's still useful to use this driver with systems with multiple threads (lots of dual-core MCUs/MPUs) or to enable pipelining. Systems that have no threads can no-op the mutex/notification (or disable interrupts/etc).

Trace of bert with -driver=dylib-sync looks pretty good:
image
(the large __init block at startup and free at shutdown will be solved by #5504 and #5472)
Note that due to the design here there are no changes needed in the compiler/runtime besides the single flag added in #5508 so the same exact module runs async as well:
image
That's pretty cool :)

Progress on #4680.

@benvanik benvanik added performance ⚡ Performance/optimization related work across the compiler and runtime hal/cpu Runtime Host/CPU-based HAL backend platform/generic 🔩 Bare metal/generic target build, execution, benchmarking, and deployment labels Apr 19, 2021
@google-cla google-cla bot added the cla: yes label Apr 19, 2021
@benvanik benvanik force-pushed the benvnaik-sync-hal branch 4 times, most recently from 45ab4b3 to 66ed8b7 Compare April 19, 2021 02:26
@benvanik benvanik marked this pull request as ready for review April 19, 2021 02:53
@benvanik benvanik closed this Apr 19, 2021
@benvanik benvanik deleted the benvnaik-sync-hal branch April 19, 2021 03:34
@benvanik benvanik restored the benvnaik-sync-hal branch April 19, 2021 03:34
@benvanik
Copy link
Collaborator Author

(typo'ed my own name, but github so it stays :)

Comment on lines +37 to +38
// on threads issuing submissions. |loaders| is the set of executable
// loaders that are available for loading in the device context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see... we could even pass a wasm module loader in here :O

So with #4298, #5096 (still blocked on the choice of wasm runtime / memory allocation architecture) also wouldn't even add its own driver module. We'd just conditionally include the executable loader. So each host executable loader will be compatible with the synchronous and asynchronous local HAL devices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! that's the goal - we can still have a registry of them/compile-time include everything linked in for the iree-run-module cases but tiny IREE cases can conditionally include what they want. it's also how the static library loader works: it's just a list of pointers that you provide for the HAL to lookup in the future: https://github.com/google/iree/blob/f02a32f88242479c37a249d1c6bbe7637545ca36/iree/hal/local/loaders/static_library_loader.h#L44-L48
(this lets you support both dynamic and static models, if you wanted to have some builtin and some deployable, etc)

// iree_hal_sync_semaphore_t
//===----------------------------------------------------------------------===//

// Creates a semaphore that allows for ordering of operations on the local host.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more on this (what a "sync semaphore" is)? This is backed by a 400LOC .c, which is only a bit less than task_semaphore.c. Seems worth a comparison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note -- semaphores are mechanisms to propagate errors, query for completion, etc and are useful whether synchronously executing work or not, and having them in both cases lets us have a consistent API to the application (and make use of them in the compiled output) without having to special case things. Even on bare-metal systems using the synchronous driver they may still have multiple cores and want to query the status of work from one core to another - semaphores are a way to do that in a thread-safe fashion (even if your threads are not traditional threads but independent execution units). On super bare-metal systems with single cores/single threaded usage the iree_slim_mutex_t/iree_notification_t types can be no-oped, in which case this becomes just a way to carry around an integer payload and a status field.

// batching up signals will reduce synchronization overhead.
iree_status_t iree_hal_sync_semaphore_multi_signal(
iree_hal_sync_semaphore_state_t* shared_state,
const iree_hal_semaphore_list_t* semaphore_list);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these semaphores assumed to be created by iree_hal_sync_semaphore_create? looks like the iree_hal_semaphore_t type is reused for each implementation. Can implementations be mixed in a list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not legal to mix HAL types from other devices (we'd need explicit import/export APIs for that), so safe to assume this (some places have some type guards but more could be added whenever we allow multiple simultaneous HAL usage - today there's only ever one HAL device at a time so not something that can be hit)

This is mostly just the plumbing to enable #4680 by defining a new
`dylib-sync` runtime driver (pending #4298 to cleanup drivers).

This driver does not support recording/replaying command buffers and has
some gotchas with semaphores (no deadline waits, probably bugs, etc).
Base automatically changed from benvanik-inline-cmd-buf to main April 20, 2021 02:52
@benvanik benvanik merged commit 1b158de into main Apr 20, 2021
@benvanik benvanik deleted the benvnaik-sync-hal branch April 20, 2021 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hal/cpu Runtime Host/CPU-based HAL backend performance ⚡ Performance/optimization related work across the compiler and runtime platform/generic 🔩 Bare metal/generic target build, execution, benchmarking, and deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants