Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation layer crash due to mishandle of vkDestroySwapchainKHR #8584

Open
Mr-222 opened this issue Sep 24, 2024 · 5 comments
Open

Validation layer crash due to mishandle of vkDestroySwapchainKHR #8584

Mr-222 opened this issue Sep 24, 2024 · 5 comments
Labels
Bug Something isn't working WSI Window System Integration related issues

Comments

@Mr-222
Copy link

Mr-222 commented Sep 24, 2024

Environment:

  • OS: Windows11
  • GPU and driver version: NVIDIA RTX3500 Ada Generation Laptop GPU / NVIDIA RTX Driver Release 560
  • SDK or header version if building from repo: 1.3.290
  • Options enabled (synchronization, best practices, etc.): N/A

Describe the Issue

This issue heavily depends on driver implementation, I only observe this issue in NVIDIA GPU.

When VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT is enabled, which means I manually disable the handle wrapping. At this time, when recreating a swapchain by calling vkCreateSwapchainKHR with oldswapchain parameter, the validation layer will ignore situations like VKImage from different swapchains uses same handles. So we may get same image handles used by old swapchains when calling vkGetSwapchainImagesKHR, which the validation layer is not properly handled. Here's a simple example:

  1. Create swapchain1
  2. Create swapchain2
  3. Destroy swapchain1

When destroy the swapchain1, the validation layer would also delete the swapchain's images in the StateMap and ObjectTable without checking if newly created swapchain is using those image handles. However, swapchain2 is still using those images. As a result, the program would crash afterwards because it can't find corresponding image handle when creating image view and framebuffer for the new swapchain.

A simple reproduce

You can use Vulkan sample to reproduce this bug crash easily, just add the VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT extension. What I did is change the Instance.cpp in Vulkan-Samples/framework/core, change the constructor of Instance, add the extension:

Instance::Instance(const std::string                            &application_name,
                   const std::unordered_map<const char *, bool> &required_extensions,
                   const std::vector<const char *>              &required_validation_layers,
                   const std::vector<VkLayerSettingEXT>         &required_layer_settings,
                   bool                                          headless,
                   uint32_t                                      api_version)
{
  ...

	// Some of the specialized layers need to be enabled explicitly
#ifdef USE_VALIDATION_LAYER_FEATURES
	VkValidationFeaturesEXT                   validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
	std::vector<VkValidationFeatureEnableEXT> enable_features{};
	if (validation_features)
	{
#	if defined(VKB_VALIDATION_LAYERS_GPU_ASSISTED)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT);
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT);
#	endif
#	if defined(VKB_VALIDATION_LAYERS_BEST_PRACTICES)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT);
#	endif
#	if defined(VKB_VALIDATION_LAYERS_SYNCHRONIZATION)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT);
#	endif

		validation_features_info.enabledValidationFeatureCount = static_cast<uint32_t>(enable_features.size());
		validation_features_info.pEnabledValidationFeatures    = enable_features.data();
		validation_features_info.pNext                         = instance_info.pNext;
		instance_info.pNext                                    = &validation_features_info;
	}
#endif

	// Enable VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT
	VkValidationFeaturesEXT validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
	std::vector<VkValidationFeatureDisableEXT> disable_features{};
	disable_features.push_back(VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT);
	validation_features_info.disabledValidationFeatureCount = static_cast<uint32_t>(disable_features.size());
	validation_features_info.pDisabledValidationFeatures    = disable_features.data();
	instance_info.pNext                                     = &validation_features_info;

	VkLayerSettingsCreateInfoEXT layerSettingsCreateInfo{VK_STRUCTURE_TYPE_LAYER_SETTINGS_CREATE_INFO_EXT};

	// If layer settings extension enabled by sample, then activate layer settings during instance creation
	if (std::find(enabled_extensions.begin(), enabled_extensions.end(), VK_EXT_LAYER_SETTINGS_EXTENSION_NAME) != enabled_extensions.end())
	{
		layerSettingsCreateInfo.settingCount = static_cast<uint32_t>(required_layer_settings.size());
		layerSettingsCreateInfo.pSettings    = required_layer_settings.data();
		layerSettingsCreateInfo.pNext        = instance_info.pNext;
		instance_info.pNext                  = &layerSettingsCreateInfo;
	}

	// Create the Vulkan instance
	VkResult result = vkCreateInstance(&instance_info, nullptr, &handle);

  ...
}

Also add the instance extension in ray_tracing_basic.cpp:

    RaytracingBasic::RaytracingBasic()
    {
    	...
    
    	add_instance_extension(VK_EXT_VALIDATION_FEATURES_EXTENSION_NAME, true);
    }

After that, run ray_tracing_basic sample, it may crash. It may also crashes when you resize the window. Note that there's a probability to this thing, because the GPU drive doesn't always choose to reuse image handles when recreating a swapchain. According to my experiments, this bug happens four times out of ten, and it only happens when using NVIDIA GPU. So if you don't see this bug crash at the first time, try resize the window multiple times or relaunch the sample several times.

Possible cause

There are 3 places in the code that contributes this bug, the validation layer doesn't check if the image handle is being used by newly created swapchain.

First is in image_state.cpp, line 605, this would cause validation layer can't create its image view object.

void Swapchain::Destroy() {
    for (auto &swapchain_image : images) {
        ...
        dev_data.Destroy<vvl::Image>(swapchain_image.image_state->VkHandle());
        // NOTE: We don't have access to dev_data.fake_memory.Free() here, but it is currently a no-op
    }
    ...
}

The second one lies in object_tracker_utils, line 820:

void ObjectLifetimes::PreCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                       const VkAllocationCallbacks *pAllocator, const RecordObject &record_obj) {
    ...

    auto snapshot = swapchain_image_map.snapshot(
        [swapchain](const std::shared_ptr<ObjTrackState> &pNode) { return pNode->parent_object == HandleToUint64(swapchain); });
    for (const auto &itr : snapshot) {
        swapchain_image_map.erase(itr.first);
    }
}

The last one is in thread_safety_validation.cpp, line 476:

void ThreadSafety::PostCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                     const VkAllocationCallbacks* pAllocator, const RecordObject& record_obj) {
    ...
    // Host access to swapchain must be externally synchronized
    auto lock = WriteLockGuard(thread_safety_lock);
    for (auto& image_handle : swapchain_wrapped_image_handle_map[swapchain]) {
        FinishWriteObject(image_handle, record_obj.location);
        DestroyObject(image_handle);
    }
    ...
}
@spencer-lunarg
Copy link
Contributor

Thanks for reporting this!

I guess my first question is why you needed VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT (this is me grasping how people use the VVL settings)

We are currently getting things ready for the upcoming SDK, so will be a while until I can look at this, but thanks for deep investigation you have done so far

@spencer-lunarg spencer-lunarg added Bug Something isn't working WSI Window System Integration related issues labels Sep 24, 2024
@Mr-222
Copy link
Author

Mr-222 commented Sep 25, 2024

Thank you spencer, take your time! For the question, I need the extension to disable the handle wrapping feature so that I can access the “real” object allocated by the driver for some backdoor APIs. Also, I'm a little curious about the driver behavior, do you have any clue about what decides if the driver would reuse old swapchain's image handle or not?

@Mr-222
Copy link
Author

Mr-222 commented Sep 25, 2024

Also reformatted the codeblock in my description to make it clear, sorry for the inexperience of issuing if you think the codeblock is messy.

@spencer-lunarg
Copy link
Contributor

do you have any clue about what decides if the driver would reuse old swapchain's image handle or not?

It took me years and I still feel I don't understand the Linux Windowing system... Windows is worse as I have no idea how drivers interact with WSI on Windows

@Mr-222
Copy link
Author

Mr-222 commented Sep 25, 2024

lol, got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working WSI Window System Integration related issues
Projects
None yet
Development

No branches or pull requests

2 participants