Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncval SSBO/UBO regression #8084

Closed
ShabbyX opened this issue Jun 3, 2024 · 10 comments
Closed

Syncval SSBO/UBO regression #8084

ShabbyX opened this issue Jun 3, 2024 · 10 comments
Assignees
Labels
Bug Something isn't working Synchronization Synchronization Validation Object Issue

Comments

@ShabbyX
Copy link
Contributor

ShabbyX commented Jun 3, 2024

Since dc5d542 (@spencer-lunarg), we (ANGLE) see a new syncval error in a few traces like this:

[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x1dd00000001dd, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdDispatch():  Hazard WRITE_AFTER_READ for VkBuffer 0x1dd00000001dd[] in VkCommandBuffer 0x5649ae338100[], VkPipeline 0xa3e0000000a3e[], and VkDescriptorSet 0xb7c0000000b7c[], type: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, binding #4 index 0. Access info (usage: SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, prior_usage: SYNC_COMPUTE_SHADER_UNIFORM_READ, read_barriers: VkPipelineStageFlags2(0), command: vkCmdDispatch, seq_no: 25, reset_no: 9)

Looking at one of the traces (warcraft_rumble, if you have access), this looks like a tracking bug. Like, one of the buffers for which this error is generated is used as such:

image

EID 83:
image

EID 158:
image

EID 183:
image

As you can see, there is no overlap in the ranges of the buffer used in these draw calls, but EID 183 produces the following error in RenderDoc:

[ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = Buffer 1836, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDispatch():  Hazard READ_AFTER_WRITE for VkBuffer Buffer 1836 in VkCommandBuffer Command Buffer 171, VkPipeline Compute Pipeline 2960, and VkDescriptorSet Descriptor Set 3497, type: VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC, binding #0 index 0. Access info (usage: SYNC_COMPUTE_SHADER_UNIFORM_READ, prior_usage: SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, write_barriers: SYNC_COMPUTE_SHADER_SHADER_BINDING_TABLE_READ|SYNC_COMPUTE_SHADER_SHADER_SAMPLED_READ|SYNC_COMPUTE_SHADER_SHADER_STORAGE_READ|SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, command: vkCmdDispatch, seq_no: 6, reset_no: 7).
@artem-lunarg
Copy link
Contributor

@spencer-lunarg I wonder if it can be related to this part:
image

Previously it also checked if the variable is writeonly . Does the new code cover this?

@spencer-lunarg spencer-lunarg added Bug Something isn't working ShaderVal Shader Validation (SPIR-V related) labels Jun 3, 2024
@spencer-lunarg spencer-lunarg self-assigned this Jun 3, 2024
@lunarpapillo
Copy link
Contributor

Probably related: we see the same error appear in our nigthly runs of workloads against VVL, showing up many times in a particular workload (but only in one):

+ SYNC-HAZARD-READ-AFTER-WRITE: 62152

We've reset baselines for this workload for now. When this issue is resolved, we'll see how the workload behavior changes.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Jun 9, 2024

There's a confirmation it's not a bug related to spirv, actually dc5d542 made visible existing issue that syncval does not handle dynamic offset properly. The two memory regions do not overlap but for some reason write into one region conflicts with a read from another region. This happens because syncval thinks these region overlap. The reason it worked before because there was a barrier that tried to synchronize two non-overlapped regions.. Obviously barrier is needed in this scenario (it could be the barrier is for another purpose that just happened to be active here). The spirv change modified the barrier which discovered the issue that syncval thinks that regions overlap.

The two tests that show the problem artem-lunarg@29a18b4. Both of them write and read from non-overlapped regions but in a different ways:

WriteAndReadNonOverlappedUniformBufferRegions - this test PASSES. It specifies offset using VkDescriptorBufferInfo::offset.

WriteAndReadNonOverlappedDynamicUniformBufferRegions - this test FAILS. It specifies offset as dynamic offset through vkCmdBindDescriptorSets. This test is identical to the first one except how the offset is specified.

The above explains ANGLE behavior, the issue reported by @lunarpapillo might be the same issue but also can be related to spirv change.

@spencer-lunarg spencer-lunarg added Synchronization Synchronization Validation Object Issue and removed ShaderVal Shader Validation (SPIR-V related) labels Jun 9, 2024
@spencer-lunarg
Copy link
Contributor

@ShabbyX I confirmed #8117 solved the VVL errors I saw on my machine, if you can confirm on ANGLE CI then happy to close this issue

@ShabbyX
Copy link
Contributor Author

ShabbyX commented Jun 11, 2024

Looks like there are still lingering issues. Trying to remove the suppression here: https://chromium-review.googlesource.com/c/angle/angle/+/5621970 and I still see the same messages: https://ci.chromium.org/ui/p/angle/builders/try/linux-test/21042/overview

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Jun 11, 2024

@ShabbyX Did it help at least for some tests? We fixed the problem where non-overlapped regions were computed as overlapped due to incorrect dynamic offset calculation. If there's opportunity to check whether the remaining failing tests use non-overlapped regions or whether it's a different setup, that would be helpful.

@spencer-lunarg
Copy link
Contributor

@ShabbyX I re-confirmed I can still see it too, I realized I didn't have Sync Val on when re-testing 😞

@artem-lunarg
Copy link
Contributor

It took me some time to realize that the issue provides two types of error messages:

A. No dynamic descriptors

[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x1dd00000001dd, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdDispatch(): Hazard WRITE_AFTER_READ for VkBuffer 0x1dd00000001dd[] in VkCommandBuffer 0x5649ae338100[], VkPipeline 0xa3e0000000a3e[], and VkDescriptorSet 0xb7c0000000b7c[], type: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, binding #4 index 0. Access info (usage: SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, prior_usage: SYNC_COMPUTE_SHADER_UNIFORM_READ, read_barriers: VkPipelineStageFlags2(0), command: vkCmdDispatch, seq_no: 25, reset_no: 9)

B. With dynamic descriptors

[ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = Buffer 1836, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDispatch(): Hazard READ_AFTER_WRITE for VkBuffer Buffer 1836 in VkCommandBuffer Command Buffer 171, VkPipeline Compute Pipeline 2960, and VkDescriptorSet Descriptor Set 3497, type: VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC, binding #0 index 0. Access info (usage: SYNC_COMPUTE_SHADER_UNIFORM_READ, prior_usage: SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, write_barriers: SYNC_COMPUTE_SHADER_SHADER_BINDING_TABLE_READ|SYNC_COMPUTE_SHADER_SHADER_SAMPLED_READ|SYNC_COMPUTE_SHADER_SHADER_STORAGE_READ|SYNC_COMPUTE_SHADER_SHADER_STORAGE_WRITE, command: vkCmdDispatch, seq_no: 6, reset_no: 7).

The fix applies to B, but we still need to undestand what's going on with A. I can reproduce A with warcraft_rumble.

@artem-lunarg
Copy link
Contributor

@ShabbyX the latest code should fix warcraft_rumble. If there are still regressions out there please provide details.

@ShabbyX
Copy link
Contributor Author

ShabbyX commented Jun 13, 2024

Yes, thank you, the VVL error is no longer produced :)

@ShabbyX ShabbyX closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Synchronization Synchronization Validation Object Issue
Projects
None yet
Development

No branches or pull requests

4 participants