-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify bindings in HAL command buffers. #18154
Labels
hal/api
IREE's public C hardware abstraction layer API
Comments
benvanik
added a commit
that referenced
this issue
Aug 10, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
6 tasks
benvanik
added a commit
that referenced
this issue
Aug 10, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 10, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 10, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 11, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 11, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops though no targets currently implement them. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them. Progress on #18154. Signed-off-by: Ben Vanik <[email protected]>
benvanik
added a commit
that referenced
this issue
Aug 12, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them; targets that support dispatch2 will ignore them if provided. Future changes will start to add support on targets for the simplified bindings and then remove the existing pipeline layout-based binding model as a breaking ABI change. Current target status: * [x] Local/CPU: executable-create2 and executable-dispatch2 supported (backward compat) * [x] CUDA: executable-dispatch2 supported (backward compat) * [x] HIP: executable-dispatch2 supported (backward compat) * [x] Metal: executable-dispatch2 supported (backward compat) * [x] Vulkan: executable-dispatch2 supported (backward compat) * [x] WebGPU: executable-dispatch2 supported (backward compat) Reworking the CUDA/HIP/Metal/Vulkan/WebGPU flatbuffers to support executable-create2 will be done in a follow-up. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 13, 2024
benvanik
added a commit
that referenced
this issue
Aug 14, 2024
benvanik
added a commit
that referenced
this issue
Aug 20, 2024
benvanik
added a commit
that referenced
this issue
Aug 21, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 21, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 22, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 22, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
The pass is running on HAL IR and now that descriptor sets are being removed as part of #18154 needs to be rewritten. A new version would operate on SPIR-V IR in order to replace push constants with special binding loads instead.
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
benvanik
added a commit
that referenced
this issue
Aug 26, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
The pass is running on HAL IR and now that descriptor sets are being removed as part of #18154 needs to be rewritten. A new version would operate on SPIR-V IR in order to replace push constants with special binding loads instead.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
This does not yet rename the methods and is just stripping all of the legacy ops and methods. Progress on #18154.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
The pass is running on HAL IR and now that descriptor sets are being removed as part of #18154 needs to be rewritten. A new version would operate on SPIR-V IR in order to replace push constants with special binding loads instead.
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
benvanik
added a commit
that referenced
this issue
Aug 27, 2024
* Renamed `push_constants` to `constants` (as there is no longer a `push_constants` API) * Dropped `#hal.descriptor_set.layout` * Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are now implicit) * Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding` * Removed `set` from `hal.interface.binding.subspan` * Removed `#hal.interface.binding` and the spooky action at a distance `hal.interface.binding` attr now that ordinals are implicit Progress on #18154.
banach-space
added a commit
to banach-space/iree
that referenced
this issue
Aug 30, 2024
I've just landed an update for the affected test (see iree-org#18369), but unfortunately forgot to re-base after the recent changes to HAL by Ben, see iree-org#18366 and iree-org#18154. This simply updates the test to align with the recent changes to HAL.
banach-space
added a commit
to banach-space/iree
that referenced
this issue
Aug 30, 2024
I've just landed an update for the affected test (see iree-org#18369), but unfortunately forgot to re-base after the recent changes to HAL by Ben, see iree-org#18366 and iree-org#18154. This simply updates the test to align with the recent changes to HAL. Signed-off-by: Andrzej Warzynski <[email protected]>
rohan-tan-bhowmik
pushed a commit
to rohan-tan-bhowmik/iree
that referenced
this issue
Sep 4, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them; targets that support dispatch2 will ignore them if provided. Future changes will start to add support on targets for the simplified bindings and then remove the existing pipeline layout-based binding model as a breaking ABI change. Current target status: * [x] Local/CPU: executable-create2 and executable-dispatch2 supported (backward compat) * [x] CUDA: executable-dispatch2 supported (backward compat) * [x] HIP: executable-dispatch2 supported (backward compat) * [x] Metal: executable-dispatch2 supported (backward compat) * [x] Vulkan: executable-dispatch2 supported (backward compat) * [x] WebGPU: executable-dispatch2 supported (backward compat) Reworking the CUDA/HIP/Metal/Vulkan/WebGPU flatbuffers to support executable-create2 will be done in a follow-up. Progress on iree-org#18154.
rohan-tan-bhowmik
pushed a commit
to rohan-tan-bhowmik/iree
that referenced
this issue
Sep 4, 2024
These combine push constants and push descriptor sets into the dispatch calls as in practice we have a near 1:1 relationship anyway. Pipeline layouts are still used in HAL interfaces to allow the compiler to map the information but are otherwise not used by the new ops. The `--iree-hal-experimental-dispatch2` flag enables emitting the new ops. Since executables no longer require pipeline layouts in this simplified model the `--iree-hal-experimental-executable-create2` flag can be used to stop passing them; targets that support dispatch2 will ignore them if provided. Future changes will start to add support on targets for the simplified bindings and then remove the existing pipeline layout-based binding model as a breaking ABI change. Current target status: * [x] Local/CPU: executable-create2 and executable-dispatch2 supported (backward compat) * [x] CUDA: executable-dispatch2 supported (backward compat) * [x] HIP: executable-dispatch2 supported (backward compat) * [x] Metal: executable-dispatch2 supported (backward compat) * [x] Vulkan: executable-dispatch2 supported (backward compat) * [x] WebGPU: executable-dispatch2 supported (backward compat) Reworking the CUDA/HIP/Metal/Vulkan/WebGPU flatbuffers to support executable-create2 will be done in a follow-up. Progress on iree-org#18154.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In light of reusable/indirect command buffers and increasingly broad support for device addresses I'm thinking about reworking descriptor sets and pipeline layouts in the HAL. The current design achieves 2 major goals: functioning on targets that do not support device addresses (only buffer handles) and compressing command buffer recording (by reusing descriptor sets across dispatches). We still need to support buffer handles on the API and not expose device addresses upwards (into the VM/etc) and still want to efficiently record command buffers. Reusable command buffers obviates the compression benefits on recording performance as we only record them once though it's still important to remain small on disk.
Problem
Reusable command buffers in real models has revealed two new facts: the same dispatches are used with both direct and indirect bindings for the same arguments and in practice we are poorly reusing descriptor sets when parameters are not packed. I spent some time trying to figure out how to increase reuse and couldn't without boxing us into a corner with larger programs where we rely on deduplication of executable functions. Packed parameters (where we gather into discrete buffers while loading or pack ahead of time in the compiler) do tend to get good reuse and that should be the common case however the actual savings are dependent on binding ordering across different dispatches. The original intent was that we'd have a descriptor set for frequently used buffers (constants, parameters, and transients) and a set for infrequently used buffers (usually function I/O) - we'd define the frequently used set once and reuse it for all dispatches and then define the infrequently used ones on-demand. In practice this works well when things line up: a single descriptor update can be reused for 1000+ dispatches, reducing the resource tracking, validation, and call overheads 1000x. For this to work it requires that all of those dispatches use the same descriptor set layout (set 0 binding 3 is constants, binding 7 is the first chunk of parameters, etc) in all dispatch sites in the whole program. The same was the intention with push constants: using the same constants (usually shape dimensions or buffer offsets) for multiple dispatches requires that all dispatches involved share a layout.
Doing this global analysis isn't difficult and it's mostly what happens in MaterializeInterfaces today. One issue that has shown up in practice is that individual dispatches once deduplicated will end up with both direct and indirect bindings and since these are handled differently throughout the stack we would have to duplicate entry points just to vary the ABI. Another issue is that the cost of padding the push constants and descriptor sets is very high: if a particular dispatch uses 1 binding and 1 push constant it may still need to have a layout using 29 declared bindings and 18 declared push constants and deal with shipping those to the device during execution even when unused. Our HIP/CUDA/CPU implementations compress bindings to avoid sending sparsely used tables over to the device as a way to avoid the bloat but they use the descriptor set layout in order to do this expecting it to be authoritative. If we started using sparse descriptor sets with partially used slots we'd need a secondary target-specific layout indicating which bindings and constants were used.
Having two layout mechanisms (one generic one for recording and one target-specific one for execution) is not great. Neither is duplicating entry points to support different types of bindings used during recording and execution. And neither is globally assigning descriptor set+binding/push constant ordinals when entry points may be used in multiple locations in the program.
The critical aspect of the current mechanism to preserve is the separation of push constants from bindings unlike CUDA-style void* memory that may contain device addresses and parameters interleaved. All of the descriptor set stuff is just for recording efficiency. Since we aren't reaping the benefits and likely won't be able to without making larger and worse tradeoffs a simplification would be to pass push constants and bindings per dispatch.
Proposed Changes
Descriptor set and pipeline layouts as a concept would be entirely removed from the HAL and as part of their executable definition targets would ship a full layout as today (Vulkan/WebGPU/etc) or minimal layout of just sizes/counts (CUDA/HIP/CPU) based on what was required. Command buffer recording - which now with reuse is less concerned with recording performance - would just take the hit of resource tracking and marshaling arguments as needed. API-wise, the command buffer would change to take the constants and bindings as arguments directly on dispatches:
Each target would need to define the metadata it requires to translate the constants and bindings into its own ABI and perform the required validation during execution. The Vulkan/SPIR-V flatbuffer would gain an encoding of pipeline layouts for each entry point (a set per executable and then each entry point referencing an entry in that set), CPU would add a fields to
iree_hal_executable_dispatch_attrs_v0_t
for constant and binding counts, etc.The compiler would simplify the HAL interface ABI to only include the push constant count and binding count (no longer sets) and assign ordinals to each. All of the information is available for producing the combined dispatch calls on
stream.cmd.dispatch
during lowering and conversion will be simplified by not having to issue the extra ops. Targets backends would need to produce the updated metadata when serializing executables.CPU
iree_hal_executable_dispatch_attrs_v0_t
would get a field for the number of push constants (or maybe just size) and the number of bindings.CUDA/HIP
This would be a good chance to completely rewrite the
ExecutableDef
flatbuffers to support multiple PTX/HSACO blobs, a proper export table (instead of standalone tables), and now the additional information to verify constants and bindings.Vulkan/WebGPU
The flatbuffers would gain their appropriate pipeline layout definitions and upon executable creation would create all of the runtime resources. Since we are supposed to be linking most (if not all) dispatch entry points into the same executable this amortizes the cost of creation/retaining these runtime resources to the same level as it is today. The only cases where we'd regress is when there are multiple executables that share layouts in the same compiled module but the runtime implementation can cache and reuse them if desired.
Metal
We should evaluate argument buffers as part of the #17875 work - whatever is done there will decide if we want to use the Vulkan/WebGPU approach of bindings or CUDA/HIP approach of parameter blobs.
Interactions with Indirect Bindings
#17875 mentions some approaches to supporting indirect bindings in reusable command buffers on various targets as each will have its own way of doing so. Some targets may want to partition direct from indirect bindings while others will keep them interleaved. Since dispatches to the same entry point may have different sets of direct and indirect buffers targets may want to always treat everything as if it could be indirect, use the metadata provided on the HAL interface ops to opt in only a subset of bindings to indirect usage, or do nothing at all and rely on emulation forever.
I suspect we'll end up with constants/direct bindings in native mechanisms (push constants/descriptor sets/binding groups/kernel args/kernel param buffers) and indirect bindings in their own device memory. Basically: what's known at recording time goes into native mechanisms and what's known at submission time goes in our own data structures referenced from those. This is marginally more expensive (one extra pointer indirection at execution time) but given that's it uniform caching should take care of it. Targets could also do things like push constants to specify binding table slots and then dereference a binding table or slice off and suballocate a parameter buffer with offsets into it. We'd probably have to evaluate per-target what's cheapest to submit and execute.
Since the compiled binaries change behavior we'd still be in a regime where the compiler would need to annotate the HAL interface ABI with which bindings may be indirect and possibly then have the runtime always pass direct bindings as indirect. This is no different then today and the only change is that now there aren't also descriptor sets to reason about. The best solution here is likely to export multiple entry points that handle the different modes and select appropriately during recording time.
Implementation
I've thought about ways to land things incrementally but it'd take quite a bit of work. Since we're still ok breaking things this is likely to happen in a branch that maintains compatibility at head until each target is converted. Once all targets are converted the branch can be merged as one larger breaking change. The upside is that there's a single breaking change and a single clean merge instead of piecemeal breakages or effectively copy/pasting entire HAL drivers to update the current dependencies on pipeline layouts and descriptor sets.
Happen on main with no breakages:
--iree-hal-experimental-dispatch2
to emit the new opsiree_hal_command_buffer_t
and a new executable create2 that doesn't take pipelinesset
from HAL interface ops (it's always 0 today) and adapt to the old ABI in HAL-to-VMHappen on branch, merge is a breakage:
executable_library
(compiler/runtime), unfortunately breaking2
(compiler/runtime)2
(compiler/runtime)2
-> default during merge, bump versionsAny other changes to support indirect bindings?
NO: Now would be a good time to add the right metadata for indirect bindings as needed by targets but until implementation starts it'll likely be difficult to know exactly what should be added. Part of the cleanup to the flatbuffers here will be adding the appropriate placeholders to make adding such metadata non-breaking changes in the future. Metadata added for binding layout should include flags bits to indicate indirect usage or not per binding.
The text was updated successfully, but these errors were encountered: