-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust Vulkan queue selection and creation logic #6662
Conversation
src/runtime/vulkan/vulkan.cc
Outdated
queue_family_index = i; | ||
} | ||
float priority = 0.0f; | ||
auto compute_dedicated = std::find_if(queue_props.begin(), queue_props.end(), [](auto prop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment indicating that this is for to find the compute decidated queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is better to put it into a sub member function, e.g. FindComputeQueue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still better to find all queues. perhaps putting compute only queue first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the compute queue finding code out. What did you mean in the last comment? This implementation attempts to find a compute-only queue first, and if no such queue exists, it will fall back to any queue that supports compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant to still return a std::vector<>
of all queues available, instead of a single queue
Thanks @ishitatsuyuki . I think it would be good to flesh out the queue priority further, since on certain desktops (e.g. Nvidia) having a high queue priority works fine. It would be great to confirm the impact of different queue priority on the performance, and whether we should set something between I would recommend do some experiment of values between |
Digging a bit further on the information, I could not find a lot of useful infos about the queue priority assignment, Here is one guideline for adreno. Stating that " My reading of the statement seems to be 1.0 should be encouraged until a point where graphics is affected. I would encourage us to do a bit more experiments, and enable |
Also https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkQueueGlobalPriorityEXT.html might be more relevant, since our main goal is to set the queue priority to be the same as the system default. |
73e9769
to
b068422
Compare
- The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment.
b068422
to
873b961
Compare
After a bit of research it seems that no implementation in Mesa actually respects pQueuePriorities; therefore I have reverted that part of the change. We might want to revisit this in the future. As for VkQueueGlobalPriorityEXT, it's mostly meant for real-time use cases that requires GPU preemption, e.g. compositors, and requires process privileges for any higher-than-normal priority settings. Since most TVM workload is non-interactive, it's better to just not use it, or even if we use it, we would set the priority to LOW. Also see the reply to the code comment. |
@tqchen I think I've addressed all comments for now, can you remove the "need update" label? |
src/runtime/vulkan/vulkan.cc
Outdated
uint32_t queue_family_index = 0; | ||
std::vector<VkDeviceQueueCreateInfo> queue_create_info; | ||
uint32_t queue_family_index = FindComputeQueue(phy_dev); | ||
if (queue_family_index == -1U) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparing uint to negative number is an undefined behavior, that I think we should avoid.
Change the signature to
// return found or not
bool FindComputeQueue(device, uint32_t* family_index);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other commend, maybe it is still helpful to return a vector of queue_create_info
Thanks @ishitatsuyuki , I made some followup comments |
5ad7e23
to
d60f573
Compare
d60f573
to
b9c4ff2
Compare
I have restored the previous behavior where the device was created with all queue families we could discover (I don't think this behavior is necessary nor correct though). Also I've opted for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some final comments
src/runtime/vulkan/vulkan.cc
Outdated
|
||
std::vector<VkDeviceQueueCreateInfo> queue_create_info; | ||
for (const auto& index : queue_family_indexes) { | ||
struct VkDeviceQueueCreateInfo info {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think we can use your previous logic by using the queue_family_index[0]
. and not creating a vector. The GetComputeQueueFamilies is mainly reserved for future purposes
Thanks @ishitatsuyuki ! |
* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue
* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue
* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue
* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue
* Adjust Vulkan queue selection and creation logic - The queue selection logic rewrite addresses a bug in the old implementation where the code would pass an invalid queue index when selecting any queues other than number 0. - The new implementation will attempt to use compute-only queues which are common on AMD GPUs. It's not clear how much difference will this make but hopefully it would lead to better scheduling. The queue changes were made as with the old configuration autotvm caused my system to hang, stutter or otherwise become unstable and crash. With the change I'm able to run autotvm tuning inside a desktop environment. * Return multiple queue family indexes from GetComputeQueues * Only create one queue
implementation where the code would pass an invalid queue index when
selecting any queues other than number 0.
are common on AMD GPUs. It's not clear how much difference will this
make but hopefully it would lead to better scheduling.
The queue changes were made as with the old configuration autotvm caused my
system to hang, stutter or otherwise become unstable and crash. With the change
I'm able to run autotvm tuning inside a desktop environment.
r? @tqchen