[Vulkan] Support uniform buffer object for passing many scalar arguments #7717

masahi · 2021-03-22T08:25:51Z

We are using push constants to pass scalar arguments to SPIR-V kernels. The vulkan spec ensures that the size of push constants storage is at least 128 bytes. Since we pass each scalar via 64 bit union, that means we can only pass 16 parameters via push constants. Unfortunately, for fused, dynamic input kernel, the number of any_dim and other params can easily go beyond that limit. See for example this crazy kernel that needs 20 scalars to be passed in https://gist.github.com/masahi/ce51c0d51c6115109203b3732f185aab

This PR enables passing parameters beyond push constants limit, using uniform buffer object. In particular, with this PR I was able to run PyTorch MaskRCNN end to end on Vulkan!

Running NMS tests in onnx/test_forward.py serves as a test case.

There are some minor TODOs left, but I'd like to get some feedback. Supporting UBO requires making non trivial changes to both codegen and runtime, but hopefully they are straightforward.

@tqchen @tmoreau89 @jwfromm @ajtulloch

TODO

Query platform specific push constant size limit rather than using a hard coded constant
~~Fix segfault on deleting UBO host buffer~~ (Seems I shouldn't allocate/deallocate the host buf explicitly, rather use vkMapMemory/vkUnMapMemory)
Add doc comments in ir_builder.h

masahi · 2021-04-08T08:54:58Z

@tqchen This is ready for review.

On the codegen side, UBO is declared in a similar way as push constants, so I made DeclareStorageVariable function to declare both storage type.

On the runtime side, I introduced CreateBuffer function to declare both device storage buffers and uniform buffers. The change should be straightforward.

One potentially controversial point could be the use of runtime function MaxPushConstantsSize() during codegen, to query the HW param and decide if we should use push constants or UBO.

tqchen · 2021-04-08T19:37:38Z

cc @ajtulloch @antinucleon please help to see if you can take a look

tmoreau89

Thank you @masahi for these changes, I left a comment on cosmetics

src/target/spirv/ir_builder.h

tmoreau89 · 2021-04-11T01:10:17Z

Thanks you @masahi your PR has been merged

tqchen · 2021-04-11T13:10:48Z

Sorry that i didn't get a chance to review. Here are some items that needs to be fixed as followup PRs:

Let us not use GetMaxPushConstantSize, as it is dynamic across devices, and the code won't behave correctly if we are doing cross compilation(where we compile on a machine that does not have vulkan), we should use a minimum TVM default (e.g. 128 bytes).
I see a few places where delete is used, always prefer std::unique_ptr when possible instead of delete

masahi · 2021-04-11T13:34:40Z

@tqchen

I added one delete but I just followed the existing use of delete in

tvm/src/runtime/vulkan/vulkan.cc

Lines 239 to 241 in 5bc1cec

    
           vkDestroyBuffer(vctx.device, pbuf->buffer, nullptr); 
        
           vkFreeMemory(vctx.device, pbuf->memory, nullptr); 
        
           delete pbuf;

Since we need to manually delete or free associated vk related resources anyway, I don't mind this delete too much

Is there a way to make GetMaxPushConstantSize work for cross compilation use cases? Because for vulkan in particular, we definitely need to query HW related information like this at compile time. For example, the max thread block sizes, warp size, what extentions are available (subgroup ops extensions correspond to warp shuffle ops in CUDA but not all vk devices support them) etc. The situation with warp size is the worst because we don't have warp_size specified for vulkan target https://github.com/apache/tvm/blob/main/src/target/target_kind.cc#L273-L275 but some topi ops uses thread_warp_size

tvm/python/tvm/topi/cuda/scatter.py

Line 196 in 8131364

warp_size = tvm.target.Target.current(False).thread_warp_size

so right now we cannot compile these ops on vulkan.

I'm ok if we decide to use a fixed max push constant size, but for other HW params we need to have a solution for runtime querying.

tqchen · 2021-04-11T13:55:37Z

Thanks @masahi. I agree that the use of delete is minor.

In the case of push constant, we can try to obtain these constant and put them into target. The main problem is the code as it is we are querying on the local machine but we need parameters on the remote. It would be useful to consider de-coupling to avoid such inconsistency(of information obtained by compiler and information in the runtime). e.g. The kernel should declare whether UBO or push constant is being used in its function meta-data, so we do not need to make dependency on the runtime parameters (e.g. the compiler decides whether to use UBO by arg size, and write that to an optional meta data of the shader to enable UBO). e.g. we can add a bitmask flag to VulkanShader::flag to indicate the difference

The runtime simply read that information and run dispatch, this will avoid inconsistency during compile time and runtime. We can then enhance the target registration to include such informations of max push constant per target, while defaults to the minimum value.

After taking a closer look, there are a few more potential things that needs to be fixed:

Vulkan runtime have deferred mode and immediate mode that needs to be executed in separate ways(see the two places using push constants), we will need to do the same thing for UBO logic, right now seems that the code will work for immediate mode but not deferred mode
Right now the UBO argument buffer simply re-allocates without deleting the original buffer, this could cause memory leaks. We will need to have a logic that re-allocates if necessary(also think about relation to synchronization in deferred case) and use the buffer if the size is large enough.

masahi · 2021-04-11T14:07:07Z

I confirmed that it also works for deferred mode (by always returning false from UseImmediate()). The relevent changes are

Need to look at the reallocate problem

tqchen · 2021-04-11T14:08:27Z

Sorry I was wrong about the re-allocation, seems you are allocating a single UBO per-kernel, so it might be fine. My original thought was along the line of single UBO per runtime.

masahi · 2021-04-11T14:10:18Z

Yes, one UBO per one pipeline https://github.com/masahi/tvm/blob/1a3dbee99c9a2c362373707678d5657e59ea6827/src/runtime/vulkan/vulkan.cc#L108

tqchen · 2021-04-11T14:10:51Z

To followup on the deffered execution case:

Because memcpy happens here: https://github.com/masahi/tvm/blob/1a3dbee99c9a2c362373707678d5657e59ea6827/src/runtime/vulkan/vulkan.cc#L1150

Imagine there are two consecutive launch of the same kernel A(uses UBO) that are deferred. Then the memcpy of the second kernel launch would override the value of the first launch(both of which hasn't yet been launched). So we want to move memcpy to the lambda in the immediate and deferred kernels(besides the push constant calls)

tqchen · 2021-04-11T14:18:53Z

To followup on the per pipeline UBO, after thinking a bit more about this design. Per pipeline UBO is fine for single threaded case, but can also be problematic under a multi-threaded setting when multiple threads are launching the same kernel A.

To make the runtime thread-safe, we normally needs to divide the data structure into constants(e.g. pipeline) and runtime structure(e.g. staging buffer, streams). The runtime structure part belongs to VulkanThreadEntry that comes with a thread-local copy to avoid threading issue. So we would want to do that, and uses a similar logic as StagingBuffer to create staging buffer for UBO.

Finally, right now seems we relies on being able to have an unified memory(host mapped memory) for UBO. Will all GPU support such kind of memory? Shall we allow explicit data copy into the UBO buffer when such memory is not supported?

tqchen · 2021-04-11T14:25:20Z

Thanks @masahi for great discussions. It seems to me that we would have benefited more reviews and discussion :) Great that we have catched these points for improvements during the discussion after merging. Given it is already merged, we could consider:

A0: Revert the PR, and bring another one, do another round of thorough reviews there.
A1: Bring a fix PR as soon as possible.

To summarize the items that needs to be addressed:

P0: Add the use of UBO as meta-data to shader, so we won't have compiler runtime in-consistentcy problem
P1: Solve the deferred execution case when a UBO buffer can be overriden by the followup kernel calls(need to do the copy in deferred stream as well).
P2: Solve the multi-thread safety issue when multiple threads calls into UBO at the same time.

…r arguments (apache#7717)" This reverts commit 5bc1cec.

masahi · 2021-04-11T14:32:42Z

ok sent the revert #7821

tmoreau89 · 2021-04-11T17:37:01Z

@masahi @tqchen Thanks for the comments, it appears as though the merge might have been to hasty. I'm good with the decision to go with A0; thanks @masahi for issuing the revert.

…r arguments (#7717)" (#7821) This reverts commit 5bc1cec.

…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

…nts (apache#7717) * ubo codegen first cut * begin runtime change for UBO * allocate and bind ubo * query memory type for uniform * refactor * do not use float64 * trying an approach similar to push constant * add more log * do not delete ubo when not using it * cumsum and nms test working with ubo * remove log * cleaning up * formatting * revert BufferArgument change * refactored codegen * minor fix * introduce value kind for ubo * fix cpplint and revert float64 change * query push constant size using runtime API * let vkmap/unmap allocate and delete host_buf * doc update * fix typo Co-authored-by: Masahiro Masuda <masahi@[email protected]>

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

masahi marked this pull request as draft March 22, 2021 08:38

masahi force-pushed the vk-ubo branch 2 times, most recently from 54a534c to 62d19eb Compare April 8, 2021 08:16

masahi marked this pull request as ready for review April 8, 2021 08:34

masahi assigned tqchen Apr 8, 2021

masahi requested a review from tqchen April 8, 2021 08:37

masahi force-pushed the vk-ubo branch from 576aa70 to 5711978 Compare April 8, 2021 08:47

tqchen added the status: need review label Apr 8, 2021

tmoreau89 approved these changes Apr 9, 2021

View reviewed changes

src/target/spirv/ir_builder.h Outdated Show resolved Hide resolved

masahi and others added 19 commits April 10, 2021 09:19

ubo codegen first cut

4f5ca8c

begin runtime change for UBO

7d2ed2b

allocate and bind ubo

e1788b8

query memory type for uniform

665d5ff

refactor

432ff24

do not use float64

5f9f82d

trying an approach similar to push constant

a8de459

add more log

23b1f40

do not delete ubo when not using it

7cfea18

cumsum and nms test working with ubo

436ff80

remove log

6c04669

cleaning up

c1b1c88

formatting

fb27fbb

revert BufferArgument change

69f2d05

refactored codegen

a5a97f4

minor fix

17597ae

introduce value kind for ubo

bfec9d3

fix cpplint and revert float64 change

95ec1db

query push constant size using runtime API

9a67f4a

fix typo

1a3dbee

masahi force-pushed the vk-ubo branch from 5711978 to 1a3dbee Compare April 10, 2021 00:19

tmoreau89 merged commit 5bc1cec into apache:main Apr 11, 2021

masahi added a commit to masahi/tvm that referenced this pull request Apr 11, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

70dea68

…r arguments (apache#7717)" This reverts commit 5bc1cec.

masahi mentioned this pull request Apr 11, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala… #7821

Merged

tqchen pushed a commit that referenced this pull request Apr 11, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

ab0dc2e

…r arguments (#7717)" (#7821) This reverts commit 5bc1cec.

tmoreau89 pushed a commit to tmoreau89/tvm that referenced this pull request Apr 11, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

0cb3cf4

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

masahi mentioned this pull request Apr 13, 2021

[Vulkan] Support uniform buffer object for passing many scalar arguments (Take 2) #7833

Merged

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

96595bf

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

84b30e7

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

59fdf11

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021

Revert "[Vulkan] Support uniform buffer object for passing many scala…

47f3448

…r arguments (apache#7717)" (apache#7821) This reverts commit 5bc1cec.

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vulkan] Support uniform buffer object for passing many scalar arguments #7717

[Vulkan] Support uniform buffer object for passing many scalar arguments #7717

masahi commented Mar 22, 2021 •

edited

Loading

masahi commented Apr 8, 2021

tqchen commented Apr 8, 2021

tmoreau89 left a comment

tmoreau89 commented Apr 11, 2021

tqchen commented Apr 11, 2021 •

edited

Loading

masahi commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

masahi commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021

masahi commented Apr 11, 2021

tqchen commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

masahi commented Apr 11, 2021

tmoreau89 commented Apr 11, 2021

[Vulkan] Support uniform buffer object for passing many scalar arguments #7717

[Vulkan] Support uniform buffer object for passing many scalar arguments #7717

Conversation

masahi commented Mar 22, 2021 • edited Loading

masahi commented Apr 8, 2021

tqchen commented Apr 8, 2021

tmoreau89 left a comment

Choose a reason for hiding this comment

tmoreau89 commented Apr 11, 2021

tqchen commented Apr 11, 2021 • edited Loading

masahi commented Apr 11, 2021 • edited Loading

tqchen commented Apr 11, 2021 • edited Loading

masahi commented Apr 11, 2021 • edited Loading

tqchen commented Apr 11, 2021

masahi commented Apr 11, 2021

tqchen commented Apr 11, 2021 • edited Loading

tqchen commented Apr 11, 2021 • edited Loading

tqchen commented Apr 11, 2021 • edited Loading

masahi commented Apr 11, 2021

tmoreau89 commented Apr 11, 2021

masahi commented Mar 22, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

masahi commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

masahi commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading

tqchen commented Apr 11, 2021 •

edited

Loading