Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vulkan pipeline caching #76348

Merged
merged 1 commit into from
May 31, 2023

Conversation

warriormaster12
Copy link
Contributor

@warriormaster12 warriormaster12 commented Apr 22, 2023

An implementation of Vulkan pipeline caching with cache validation when reading the file. Looking for feedback before merging this pr :)

@warriormaster12 warriormaster12 requested a review from a team as a code owner April 22, 2023 16:10
@warriormaster12 warriormaster12 changed the title Vulkan, implemented pipeline-caching Vulkan, implemented pipeline caching Apr 22, 2023
@clayjohn clayjohn added this to the 4.1 milestone Apr 22, 2023
@warriormaster12 warriormaster12 force-pushed the pipeline-cache branch 13 times, most recently from e70f83c to 195bbe8 Compare April 22, 2023 19:14
@warriormaster12
Copy link
Contributor Author

@clayjohn @RandomShaper I asked around about driverABI's purpose and it seems that it is used to check if the os is 32-bit or 64-bit. There aren't any plans to support 32-bits right? If so then I can remove the variable from PipelineCacheHeader.

@warriormaster12
Copy link
Contributor Author

I've also added a pipeline cache save interval into project settings based on @reduz's one of the suggestions.
image

@clayjohn
Copy link
Member

Could you test this PR with a larger project like the TPS-demo and check what the size of the pipeline cache ends up being? I know reduz has stated earlier that he only wants to cache certain pipelines to file size and load time from the cache. I think he specifically wanted to only cache the main specialization constant variants as ideally all the other variants should be compiled on a background thread

@warriormaster12
Copy link
Contributor Author

warriormaster12 commented Apr 24, 2023

Could you test this PR with a larger project like the TPS-demo and check what the size of the pipeline cache ends up being? I know reduz has stated earlier that he only wants to cache certain pipelines to file size and load time from the cache. I think he specifically wanted to only cache the main specialization constant variants as ideally all the other variants should be compiled on a background thread

I wish I could but at the moment at least I'm having a hard time to get the project to open since it gets stuck on importing level.exr

edit: deleting the file fixed the issue.

@clayjohn
Copy link
Member

I wish I could but at the moment at least I'm having a hard time to get the project to open since it gets stuck on importing level.exr

Ah, ya compressing exr files is super flow in debug builds. If you run it with debug builds you should open up level.exr.import and change the compress/mode to 3. That way it will import as VRAM uncompressed and will be much faster

@warriormaster12
Copy link
Contributor Author

warriormaster12 commented Apr 24, 2023

@clayjohn after doing a quick test with TPS Demo and Gdquest's third person platforming demo. The results are:
TPS Demo: 6.3 mb
GDquest TP: 6.6 mb

And I assume that these graphical artifact are due to me using VRAM uncompressed with level.exr?
image

@YuriSizov YuriSizov changed the title Vulkan, implemented pipeline caching Implement Vulkan pipeline caching May 27, 2023
@warriormaster12
Copy link
Contributor Author

One moment

@warriormaster12
Copy link
Contributor Author

Done

@Ansraer
Copy link
Contributor

Ansraer commented May 27, 2023

So I took some time and built this PR (rebased on the current master) locally. The code looks good to me and some quick debugging would imply that Vulkan is correctly loading stuff from the cache (instead of compiling it).

Creating a pipeline without the cache takes on average 30.000 usec. This PR cuts that down to ~260 usec, while my driver's built-in cache averages 80.
If we are really concerned about the cache size it might make more sense to disable this on modern desktop GPUs (they all have a built-in cache) and only enable it on mobile devices out of the box.

@warriormaster12
Copy link
Contributor Author

@Ansraer I disagree, there are cases in desktop as well where pipeline cache would help. In TPS Demo, shooting a bullet causes a stutter in multiple times when you boot the game before the driver builds up a cache. With pipeline cache, it only happens on a first run.

Copy link
Member

@bitsawer bitsawer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late to the party, but I made a few comments about the code. Feel free to ignore them if there's a reason behind the issues I pointed out.

float save_interval = GLOBAL_GET("rendering/rendering_device/pipeline_cache/save_interval_mb");
VkResult vr = vkGetPipelineCacheData(device, pipelines_cache.cache_object, &pso_blob_size, nullptr);
ERR_FAIL_COND(vr);
size_t difference = (pso_blob_size - pipelines_cache.current_size) / (1024.0f * 1024.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use integer division here.

break;
}
}
if (header.data_hash != hash_murmur3_one_64(pipelines_cache.buffer.size()) || header.data_size != (uint32_t)pipelines_cache.buffer.size() || header.vendor_id != props.vendorID || header.device_id != props.deviceID || header.driver_abi != sizeof(void *) || invalid_uuid) {
Copy link
Member

@bitsawer bitsawer May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use hash_murmur3_buffer() instead to hash the full contents of pipelines_cache.buffer? Now we are just hashing the size integer which doesn't seem all that useful as we could just store the integer directly.

Of course, hashing the full buffer and megabytes of data might cause a small slowdown, but hopefully it shouldn't be too bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have interoperated "robust pipeline cache" article incorrectly so this could indeed be the correct way to hash.

}

if (FileAccess::exists("user://vulkan/pipelines.cache")) {
Vector<uint8_t> file_data = FileAccess::get_file_as_bytes("user://vulkan/pipelines.cache");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part could use some basic error and data bounds checking just in case, for example if the file exists but is corrupted, truncated or could not be read this will crash soon afterwards.

PipelineCacheHeader header = {};
header.magic = 868 + VK_PIPELINE_CACHE_HEADER_VERSION_ONE;
header.data_size = pipelines_cache.buffer.size();
header.data_hash = hash_murmur3_one_64(pipelines_cache.buffer.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, should this use hash_murmur3_buffer()?

@warriormaster12
Copy link
Contributor Author

@bitsawer thanks in general for good feedback :)

I'll look at it tomorrow asap

@@ -8957,6 +8966,102 @@ void RenderingDeviceVulkan::initialize(VulkanContext *p_context, bool p_local_de
draw_list_split = false;

compute_list = nullptr;
_load_pipeline_cache();
print_line(vformat("Startup PSO cache (%.1f MiB)", pipelines_cache.buffer.size() / (1024.0f * 1024.0f)));
Copy link
Member

@bitsawer bitsawer May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use print_verbose instead of print_line for all logging like this (including the one in _update_pipeline_cache).

@warriormaster12
Copy link
Contributor Author

@bitsawer @clayjohn @RandomShaper file size has shrunk a bit after changing to hash_buffer so that's possitive.

Let me know if the file error checks that I have added are good enough.

@bitsawer
Copy link
Member

Changes look mostly good. However, I would write the file check something like this (not tested or compiled):

Error file_error;
Vector<uint8_t> file_data = FileAccess::get_file_as_bytes("user://vulkan/pipelines.cache", &file_error);
if (file_error != OK || file_data.size() <= sizeof(PipelineCacheHeader)) {
    WARN_PRINT("Invalid/corrupt pipelines cache.");
    return;
}
PipelineCacheHeader header = {};
...

Instead of checking multiple possible error codes, just check if the read was not OK. This also checks if we actually read enough data so that the following memcpy() and header check doesn't blow up. For example, reading an empty or partially written file would currently crash because reading would be successful but it would not read enough data.

As a super minor nitpick I didn't notice earlier, you could also tweak the comment formatting like this (space after // and captialized):

// This is mostly for the editor to check if after playing the game, game's pipeline cache size still matches with editor's cache.

After changing those, looks good to me.

@warriormaster12
Copy link
Contributor Author

@bitsawer Done, hopefully this is good to go for merging

Copy link
Member

@bitsawer bitsawer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after the changes.

@YuriSizov
Copy link
Contributor

@clayjohn @RandomShaper Please make sure you're okay with the changes, when you have time :)

@RandomShaper
Copy link
Member

I'm finally reviewing this! Sorry for not doing it earlier.

Looks very well!

Now, I'm wondering if it wouldn't be nice to save the cache from another thread; namely, a low priority task in the WorkerThreadPool.

@@ -2305,6 +2305,9 @@
<member name="rendering/rendering_device/driver.windows" type="String" setter="" getter="" default="&quot;vulkan&quot;">
Windows override for [member rendering/rendering_device/driver].
</member>
<member name="rendering/rendering_device/pipeline_cache/save_interval_mb" type="float" setter="" getter="" default="3.0">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording makes it sound as if this was about time instead of size. Let me suggest save_increment_mb, for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the meanings of interval is a gap between two points. The setting describes a gap between last time we saved and next time. It could be maybe save_gap_mb or save_delta_mb if not interval?

Copy link
Member

@akien-mga akien-mga May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be save_chunk_size_mb? Or even just chunk_size_mb. (I didn't check implementation details to see what this is used for exactly so TIWAGOS.)

@warriormaster12
Copy link
Contributor Author

I'm finally reviewing this! Sorry for not doing it earlier.

Looks very well!

Now, I'm wondering if it wouldn't be nice to save the cache from another thread; namely, a low priority task in the WorkerThreadPool.

I could implement one quickly. I'm not experienced though with multi-threading so I'd like to know, should I call wait_for_ function right after creating task/group id?

@RandomShaper
Copy link
Member

I could implement one quickly. I'm not experienced though with multi-threading so I'd like to know, should I call wait_for_ function right after creating task/group id?

Nope, because by doing so you would be making the operation synchronous (locking, stalling) regardless the use of threads. You should make the call to add the task and just remember the task id. Then, when it's time to save the cache again, if the task is still in flight, skip. Upon closing the engine, you would wait for any in flight cache save task to complete and then call it normally to ensure the latest state is saved. Does that make sense to you?

@warriormaster12
Copy link
Contributor Author

It does yeah

@warriormaster12 warriormaster12 force-pushed the pipeline-cache branch 2 times, most recently from 08d73a4 to 21bfd7d Compare May 31, 2023 19:21
@warriormaster12
Copy link
Contributor Author

@RandomShaper added work threading and updated project setting name.

@akien-mga akien-mga merged commit 3dd0307 into godotengine:master May 31, 2023
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants