Reuse sampler when creating cached bind groups #10610

Kanabenki · 2023-11-17T11:06:15Z

Objective

Some passes recreate a sampler when creating a bind group to be cached, even if the sampler is always the same.

Solution

Store the sampler in the corresponding pipeline resource.

atlv24 · 2024-01-22T05:47:50Z

I rebased this onto main and did some benchmarking on the tonemapping example, the results are interesting.

tracy shows 0.05ms reduction in mean time (1.56 before, 1.51 after), 0.02ms reduction in median time (1.42 before, 1.4 after)

nsight however, often shows a 10x increase in frametime. it reports 11-15ms per frame with this PR, 1.4ms without, but every run is different. some runs stay at a solid 3.4ms, most stay at 11, some stay at 15ms.

I also got the message "This application appears to be using a significant amount of coherently mapped memory. This may increase the time necessary to capture." when trying to get a frame capture

Kanabenki · 2024-01-22T21:13:37Z

So I tried looking at the frame times by attaching RenderDoc and Nsight, but in both cases I couldn't reproduce the performance impact.

Whether on main (1c76a09) or on this PR I got around 2.2ms across runs with RenderDoc and 2.7ms with Nsight, with some slight variations but consistently close to those values for both.

Worst case I got was some runs running around 0.3-4ms slower than those averages, and I reproduced that both for main and this PR.

Edit: Tested on Arch with a 3060 mobile, latest drivers.

atlv24 · 2024-01-23T01:55:28Z

I tested on windows 11 on a 2070 RTX, i'll try a few other computers and with latest drivers and see if i can repro but it was consistent for me.

atlv24 · 2024-01-23T19:43:48Z

This is consistent across machines, with latest drivers. Happens on windows 10, windows 11, and with 2070 RTX and 2080 Ti RTX. Not sure how to further debug this.

It always just looks like its making gargantuan amounts of buffer memory barriers interleaved with coherent buffer copies, but the exact number of them changes per run. Sometimes its way more than others, but always regular per-frame for any given run.

When it happens to get fewer of them it outperforms main at least, but its rare. Usually more than half of the frame is spent doing these pointless barriers and copies.

atlv24 · 2024-01-24T22:17:14Z

This should be good to merge once #11515 merges. I have confirmed pretty substantial CPU perf wins with this PR plus that fix, around 0.2-0.3ms savings

alice-i-cecile · 2024-01-24T22:39:46Z

As @DGriffin91 pointed out, we should verify that this doesn't cause large performance regressions on more realistic scenes involving animated materials. We don't currently have a good example to test that though: lemme make an issue.

atlv24 · 2024-01-25T09:20:10Z

#11524 shows that this is good for merge as-is

# Objective - Fixes #11516 ## Solution - Add Animated Material example (colors are hue-cycling smoothly per-mesh) ![image](https://github.com/bevyengine/bevy/assets/11307157/c75b9e66-0019-41b8-85ec-647559c6ba01) Note: this example reproduces the perf issue found in #10610 pretty consistently, with and without the changes from that PR included. Frame time is sometimes around 4.3ms, other times around 12-14ms. Its pretty random per run. I think this clears #10610 for merge.

# Objective - Some passes recreate a sampler when creating a bind group to be cached, even if the sampler is always the same. ## Solution - Store the sampler in the corresponding pipeline resource.

# Objective - Fixes bevyengine#11516 ## Solution - Add Animated Material example (colors are hue-cycling smoothly per-mesh) ![image](https://github.com/bevyengine/bevy/assets/11307157/c75b9e66-0019-41b8-85ec-647559c6ba01) Note: this example reproduces the perf issue found in bevyengine#10610 pretty consistently, with and without the changes from that PR included. Frame time is sometimes around 4.3ms, other times around 12-14ms. Its pretty random per run. I think this clears bevyengine#10610 for merge.

alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Code-Quality A section of code that is hard to understand or change labels Nov 17, 2023

JMS55 approved these changes Nov 23, 2023

View reviewed changes

JMS55 added this to the 0.13 milestone Nov 23, 2023

Kanabenki force-pushed the reuse-sampler branch from 0989e03 to f960576 Compare November 30, 2023 21:01

alice-i-cecile added the X-Controversial There is active debate or serious implications around merging this PR label Jan 22, 2024

Reuse sampler when creating cached bind group

1c76a09

Kanabenki force-pushed the reuse-sampler branch from f960576 to 1c76a09 Compare January 22, 2024 21:02

alice-i-cecile removed this from the 0.13 milestone Jan 24, 2024

atlv24 mentioned this pull request Jan 24, 2024

Early out when there's no image to replace in tonemapping example update_image_viewer #11515

Closed

alice-i-cecile added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help and removed X-Controversial There is active debate or serious implications around merging this PR labels Jan 24, 2024

alice-i-cecile mentioned this pull request Jan 24, 2024

Add an example or stress test that focuses on animating materials #11516

Closed

atlv24 mentioned this pull request Jan 25, 2024

Add Animated Material example #11524

Merged

alice-i-cecile removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Jan 25, 2024

atlv24 approved these changes Jan 26, 2024

View reviewed changes

alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 26, 2024

IceSentry approved these changes Jan 26, 2024

View reviewed changes

alice-i-cecile added this pull request to the merge queue Jan 26, 2024

Merged via the queue into bevyengine:main with commit 86e91f4 Jan 26, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse sampler when creating cached bind groups #10610

Reuse sampler when creating cached bind groups #10610

Kanabenki commented Nov 17, 2023

atlv24 commented Jan 22, 2024 •

edited

Loading

Kanabenki commented Jan 22, 2024 •

edited

Loading

atlv24 commented Jan 23, 2024

atlv24 commented Jan 23, 2024

atlv24 commented Jan 24, 2024 •

edited

Loading

alice-i-cecile commented Jan 24, 2024

atlv24 commented Jan 25, 2024

Reuse sampler when creating cached bind groups #10610

Reuse sampler when creating cached bind groups #10610

Conversation

Kanabenki commented Nov 17, 2023

Objective

Solution

atlv24 commented Jan 22, 2024 • edited Loading

Kanabenki commented Jan 22, 2024 • edited Loading

atlv24 commented Jan 23, 2024

atlv24 commented Jan 23, 2024

atlv24 commented Jan 24, 2024 • edited Loading

alice-i-cecile commented Jan 24, 2024

atlv24 commented Jan 25, 2024

atlv24 commented Jan 22, 2024 •

edited

Loading

Kanabenki commented Jan 22, 2024 •

edited

Loading

atlv24 commented Jan 24, 2024 •

edited

Loading