Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: Implement basic integrated GPU profiling. #12262

Merged
merged 5 commits into from
Aug 21, 2019
Merged

Conversation

hrydgard
Copy link
Owner

Currently simply measures the total GPU time of the frame, split into init and main command buffers. Will later be extended to get the execution time of individual render passes.

Enabled using a checkbox in the "Dev menu" (or permanently through ini file).

The goal is to be able to quantify improvements from various attempts at GPU optimization. Framerate isn't good enough for comparisons if you're already at 60 on mobile.

Note: If the GPU is mostly idle it'll downclock. On PC you can unthrottle to get it to run at its proper performance. I'll also later investigate locking the GPU frequency, there are some methods.

profiler

Currently simply measures the total GPU time of the frame. Will later be
extended to get the execution time of individual render passes.
@hrydgard hrydgard added this to the v1.9.0 milestone Aug 20, 2019
Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, LGTM. Just don't see anything flushing old values from timestampDescriptions.

-[Unknown]

VK_QUERY_RESULT_64_BIT);
if (res == VK_SUCCESS) {
double timestampConversionFactor = (double)vulkan_->GetPhysicalDeviceProperties().properties.limits.timestampPeriod * (1.0 / 1000000.0);
uint64_t timestampDiffMask = 0xFFFFFFFFFFFFFFFFULL; // TODO: Get from queue family
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like Adreno typically has 48, and PowerVR typically has 0 (unfortunate.)

Technically The command pool’s queue family must support a non-zero timestampValidBits is required, so we ought to check for PowerVR, but it seems like we're allowed to call vkCreateQueryPool (I think) even if that's zero. So shame on someone if they enable the ini on PowerVR...

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not like we can do something about it anyway other than hide the option. Not gonna do the extra work for such crappy implementations...

@@ -895,10 +950,20 @@ void VulkanRenderManager::BeginSubmitFrame(int frame) {
void VulkanRenderManager::Submit(int frame, bool triggerFence) {
FrameData &frameData = frameData_[frame];
if (frameData.hasInitCommands) {
if (gpuProfilingEnabled_ && triggerFence) {
// Pre-allocated query ID 1.
vkCmdWriteTimestamp(frameData.initCmd, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, frameData.timestampQueryPool_, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ought we check the initCmd timing for subsequent submits in the frame (triggerFence = false)?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably, and we should subtract the time spent between the submits and show that separately. Thought I'd leave that for later.

vkCmdResetQueryPool(initCmd, frameData.timestampQueryPool_, 0, MAX_TIMESTAMP_QUERIES);
vkCmdWriteTimestamp(initCmd, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, frameData.timestampQueryPool_, 0);
} else {
frameData.numQueries = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we frameData.timestampDescriptions.clear(); here? And really before the other path too?

Actually, is there are scenario where we want frameData.timestampDescriptions.size() != frameData.numQueries? It seems like we could just use the vector only.

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely, embarrassing mistake.

As for the separate variable, I had originally intended a description-less mode for a smaller performance hit but I don't think it's worth the trouble any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants