Adjust Vulkan queue selection and creation logic #6662

ishitatsuyuki · 2020-10-11T09:16:42Z

The queue selection logic rewrite addresses a bug in the old
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
The new implementation will attempt to use compute-only queues which
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.

The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.

r? @tqchen

tqchen · 2020-10-11T14:00:45Z

src/runtime/vulkan/vulkan.cc

-          queue_family_index = i;
-        }
+    float priority = 0.0f;
+    auto compute_dedicated = std::find_if(queue_props.begin(), queue_props.end(), [](auto prop) {


Please add a comment indicating that this is for to find the compute decidated queue

Perhaps it is better to put it into a sub member function, e.g. FindComputeQueue

It is still better to find all queues. perhaps putting compute only queue first

I refactored the compute queue finding code out. What did you mean in the last comment? This implementation attempts to find a compute-only queue first, and if no such queue exists, it will fall back to any queue that supports compute.

I actually meant to still return a std::vector<> of all queues available, instead of a single queue

tqchen · 2020-10-11T14:12:14Z

Thanks @ishitatsuyuki .

I think it would be good to flesh out the queue priority further, since on certain desktops (e.g. Nvidia) having a high queue priority works fine. It would be great to confirm the impact of different queue priority on the performance, and whether we should set something between 0 and 1. Ideally, we want good amount of compute without starving graphics.

I would recommend do some experiment of values between 0 and 1, we can further enable making the TVM_VK_QUEUE_PRIORITY as an env variable to adjust the value. Let us focus on other changes

cc @tmoreau89 @ajtulloch

tqchen · 2020-10-11T14:28:27Z

Digging a bit further on the information, I could not find a lot of useful infos about the queue priority assignment,

Here is one guideline for adreno. Stating that

"
On Adreno, there are 3 values that are internally used (0.0,0.5,1.0). It’s important to note that these priorities are relative to all processes running on the device. In general, you should use a 1.0 priority, except with VR applications where you should choose a 0.5 value so that VR services running on the device needed for predicable display purposes can have higher priority
"

My reading of the statement seems to be 1.0 should be encouraged until a point where graphics is affected. I would encourage us to do a bit more experiments, and enable TVM_VK_QUEUE_PRIORITY as a part of env variable, but keep the default in 1.0 for now.

tqchen · 2020-10-11T14:29:48Z

Also https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkQueueGlobalPriorityEXT.html might be more relevant, since our main goal is to set the queue priority to be the same as the system default.

- The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment.

ishitatsuyuki · 2020-10-11T14:59:12Z

After a bit of research it seems that no implementation in Mesa actually respects pQueuePriorities; therefore I have reverted that part of the change. We might want to revisit this in the future.

As for VkQueueGlobalPriorityEXT, it's mostly meant for real-time use cases that requires GPU preemption, e.g. compositors, and requires process privileges for any higher-than-normal priority settings. Since most TVM workload is non-interactive, it's better to just not use it, or even if we use it, we would set the priority to LOW.

Also see the reply to the code comment.

ishitatsuyuki · 2020-10-12T14:03:17Z

@tqchen I think I've addressed all comments for now, can you remove the "need update" label?

tqchen · 2020-10-12T14:25:26Z

src/runtime/vulkan/vulkan.cc

-    uint32_t queue_family_index = 0;
-    std::vector<VkDeviceQueueCreateInfo> queue_create_info;
+    uint32_t queue_family_index = FindComputeQueue(phy_dev);
+    if (queue_family_index == -1U) continue;


comparing uint to negative number is an undefined behavior, that I think we should avoid.

Change the signature to

// return found or not bool FindComputeQueue(device, uint32_t* family_index);

Per other commend, maybe it is still helpful to return a vector of queue_create_info

tqchen · 2020-10-12T20:37:11Z

Thanks @ishitatsuyuki , I made some followup comments

ishitatsuyuki · 2020-10-13T04:10:42Z

I have restored the previous behavior where the device was created with all queue families we could discover (I don't think this behavior is necessary nor correct though).

Also I've opted for vector<uint32_t> return type for GetComputeQueueFamilies, as VkDeviceQueueCreateInfo contains pointers that could cause lifetime complications when used in return type.

tqchen

Thanks, some final comments

tqchen · 2020-10-13T13:06:23Z

src/runtime/vulkan/vulkan.cc

+
+    std::vector<VkDeviceQueueCreateInfo> queue_create_info;
+    for (const auto& index : queue_family_indexes) {
+      struct VkDeviceQueueCreateInfo info {};


In this case, I think we can use your previous logic by using the queue_family_index[0]. and not creating a vector. The GetComputeQueueFamilies is mainly reserved for future purposes

tqchen · 2020-10-13T21:04:55Z

Thanks @ishitatsuyuki !

* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue

tqchen requested changes Oct 11, 2020

View reviewed changes

tqchen added the status: need update need update based on feedbacks label Oct 11, 2020

ishitatsuyuki force-pushed the vulkan-queues branch from 73e9769 to b068422 Compare October 11, 2020 14:49

ishitatsuyuki force-pushed the vulkan-queues branch from b068422 to 873b961 Compare October 11, 2020 14:53

tqchen changed the base branch from master to main October 11, 2020 18:16

tqchen requested changes Oct 12, 2020

View reviewed changes

ishitatsuyuki force-pushed the vulkan-queues branch 2 times, most recently from 5ad7e23 to d60f573 Compare October 13, 2020 04:07

Return multiple queue family indexes from GetComputeQueues

b9c4ff2

ishitatsuyuki force-pushed the vulkan-queues branch from d60f573 to b9c4ff2 Compare October 13, 2020 04:09

tqchen requested changes Oct 13, 2020

View reviewed changes

Only create one queue

5c8beab

tqchen approved these changes Oct 13, 2020

View reviewed changes

tqchen merged commit 8d9ca2a into apache:main Oct 13, 2020

tqchen added status: accepted and removed status: need update need update based on feedbacks labels Oct 13, 2020

ishitatsuyuki deleted the vulkan-queues branch October 13, 2020 23:23

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust Vulkan queue selection and creation logic #6662

Adjust Vulkan queue selection and creation logic #6662

ishitatsuyuki commented Oct 11, 2020 •

edited

Loading

tqchen Oct 11, 2020

tqchen Oct 11, 2020

tqchen Oct 11, 2020

ishitatsuyuki Oct 11, 2020

tqchen Oct 12, 2020

tqchen commented Oct 11, 2020

tqchen commented Oct 11, 2020 •

edited

Loading

tqchen commented Oct 11, 2020

ishitatsuyuki commented Oct 11, 2020 •

edited

Loading

ishitatsuyuki commented Oct 12, 2020

tqchen Oct 12, 2020

tqchen Oct 12, 2020

tqchen commented Oct 12, 2020

ishitatsuyuki commented Oct 13, 2020

tqchen left a comment

tqchen Oct 13, 2020

tqchen commented Oct 13, 2020

Adjust Vulkan queue selection and creation logic #6662

Adjust Vulkan queue selection and creation logic #6662

Conversation

ishitatsuyuki commented Oct 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Oct 11, 2020

tqchen commented Oct 11, 2020 • edited Loading

tqchen commented Oct 11, 2020

ishitatsuyuki commented Oct 11, 2020 • edited Loading

ishitatsuyuki commented Oct 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Oct 12, 2020

ishitatsuyuki commented Oct 13, 2020

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Oct 13, 2020

ishitatsuyuki commented Oct 11, 2020 •

edited

Loading

tqchen commented Oct 11, 2020 •

edited

Loading

ishitatsuyuki commented Oct 11, 2020 •

edited

Loading