Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring whether workgroup memory is zero initialised #5508

Merged
merged 19 commits into from
Apr 17, 2024

Conversation

DJMcNab
Copy link
Contributor

@DJMcNab DJMcNab commented Apr 8, 2024

Connections

This is related to #5319 - a key reason initialisation takes so long is the 5x increase in pipeline creation time added by this feature
See also #4592, which this does not fix.
I'm planning on racing this PR with a fix for #4592, in the hope that I can get one of them in 0.20
(Both PRs should be useful, however.)

Description
Allows skipping zero initialisation of pipeline workgroup variables.

Backends:

  • Vulkan
  • DirectX
  • GLES
  • Metal

The underlying pipeline creation time issue is a really significant issue for adoption of Vello and Xilem, as it makes our projects take an extremely long time to start up.
This decreases pipeline creation time by a factor of (nearly) 5 on Android, which makes this a worthwhile tradeoff for us.

Renderer creation time with different modes (all values in millseconds). None corresponds to the new ZeroInitializeWorkgroupMemory::never(). This is on a Pixel 6

Threads Single Multi
Native 4228 1829
Polyfill 6350 2772
None 790 370

This is a breaking change, but this breaking change is shared with #5500

Testing

I have tested the improperly wired up version of this in Vello, to get the above numbers.

I have also tested this PR in Vello, which showed the expected decrease in startup time

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests - same story as Pipeline cache API and implementation for Vulkan #5319 - no new failures
  • Add change to CHANGELOG.md. See simple instructions inside file.

@DJMcNab DJMcNab requested a review from a team as a code owner April 8, 2024 14:02
Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great that we're adding this field that needs to be set in every pipeline descriptor anyone will ever write. But I understand that Vello wants to be a good neighbor when sharing Devices with other code, so we can't make it a device-wide flag.

Could we see an alternative version of this PR where the flag is on wgpu_core::pipeline::ShaderModuleDescriptor, rather than ComputePipelineDescriptor?

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 8, 2024

Could we see an alternative version of this PR where the flag is on wgpu_core::pipeline::ShaderModuleDescriptor, rather than ComputePipelineDescriptor?

Do you mean as a seperate PR, or reusing this same one?

@teoxoy
Copy link
Member

teoxoy commented Apr 8, 2024

It's worth noting we don't have the max_compute_workgroup_storage_size limit fully implemented (it's not doing anything).
Nor the 16k "maximum byte-size of an array type instantiated in the workgroup address space" from the WGSL spec.

Are the shaders in question surpassing those limits?

@teoxoy
Copy link
Member

teoxoy commented Apr 8, 2024

Using a loop to do array initialization might also side-step the issue (if it doesn't get unrolled that is).

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 8, 2024

As it happens, we are exactly on the limit for maxComputeWorkgroupStorageSize, based on a quick check (taking the values in https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/DirectX12.20extremely.20slow.20startup.20time/near/431594219 as my source of truth for that claim, we have 16 lots of 256 lots of 4 bytes). So maybe, if the 16384 is an exclusive maximum.

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 8, 2024

Yes, as I say, fixing #4592 would also likely resolve this issue, as it would cause us to output probably ~24 instructions rather than 16384

But this is much easier to implement, and the actual zero initialisation is superfluous work anyway

Besides which, wgpu would still use the native zero initialisation method on Vulkan Android, which would mean we would still need to disable it. The native method is at least faster than the polyfill. But it would be slower if #4592 reaches the same result

@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 9, 2024

Moving into draft whilst I reconfigure as requested

@DJMcNab DJMcNab marked this pull request as draft April 9, 2024 08:39
@DJMcNab DJMcNab marked this pull request as ready for review April 10, 2024 08:02
CHANGELOG.md Show resolved Hide resolved
@DJMcNab DJMcNab requested a review from jimblandy April 10, 2024 08:36
@DJMcNab
Copy link
Contributor Author

DJMcNab commented Apr 10, 2024

This now shares the breaking change with #5500, using the mechanism discussed on Matrix. So it is a breaking change, but the "same" breaking change was already needed from. CC @teoxoy , as you brought in the constants field which has been moved.

I'm now quite happy with how this API looks - it also removes the need for everyone to set constants, which was a bit weird as &Default::default

@cwfitzgerald cwfitzgerald self-requested a review April 10, 2024 23:52
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty uncontroversial!

@cwfitzgerald cwfitzgerald merged commit 965b00c into gfx-rs:trunk Apr 17, 2024
25 checks passed
@DJMcNab DJMcNab deleted the zero-initialise-workgroup branch April 17, 2024 20:17
cwfitzgerald added a commit that referenced this pull request Apr 17, 2024
waywardmonkeys added a commit to waywardmonkeys/vello that referenced this pull request May 13, 2024
This is a performance improvement for shader compilation.

See gfx-rs/wgpu#5508
github-merge-queue bot pushed a commit to linebender/vello that referenced this pull request May 13, 2024
This is a performance improvement for shader compilation.

See gfx-rs/wgpu#5508

---------

Co-authored-by: Lixou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants