Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse sampler when creating cached bind groups #10610

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

Kanabenki
Copy link
Contributor

Objective

  • Some passes recreate a sampler when creating a bind group to be cached, even if the sampler is always the same.

Solution

  • Store the sampler in the corresponding pipeline resource.

@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Code-Quality A section of code that is hard to understand or change labels Nov 17, 2023
@JMS55 JMS55 added this to the 0.13 milestone Nov 23, 2023
@atlv24
Copy link
Contributor

atlv24 commented Jan 22, 2024

I rebased this onto main and did some benchmarking on the tonemapping example, the results are interesting.

tracy shows 0.05ms reduction in mean time (1.56 before, 1.51 after), 0.02ms reduction in median time (1.42 before, 1.4 after)

nsight however, often shows a 10x increase in frametime. it reports 11-15ms per frame with this PR, 1.4ms without, but every run is different. some runs stay at a solid 3.4ms, most stay at 11, some stay at 15ms.

I also got the message "This application appears to be using a significant amount of coherently mapped memory. This may increase the time necessary to capture." when trying to get a frame capture

image
image
image

@alice-i-cecile alice-i-cecile added the X-Controversial There is active debate or serious implications around merging this PR label Jan 22, 2024
@Kanabenki
Copy link
Contributor Author

Kanabenki commented Jan 22, 2024

So I tried looking at the frame times by attaching RenderDoc and Nsight, but in both cases I couldn't reproduce the performance impact.

Whether on main (1c76a09) or on this PR I got around 2.2ms across runs with RenderDoc and 2.7ms with Nsight, with some slight variations but consistently close to those values for both.

Worst case I got was some runs running around 0.3-4ms slower than those averages, and I reproduced that both for main and this PR.

Edit: Tested on Arch with a 3060 mobile, latest drivers.

@atlv24
Copy link
Contributor

atlv24 commented Jan 23, 2024

I tested on windows 11 on a 2070 RTX, i'll try a few other computers and with latest drivers and see if i can repro but it was consistent for me.

@atlv24
Copy link
Contributor

atlv24 commented Jan 23, 2024

This is consistent across machines, with latest drivers. Happens on windows 10, windows 11, and with 2070 RTX and 2080 Ti RTX. Not sure how to further debug this.

It always just looks like its making gargantuan amounts of buffer memory barriers interleaved with coherent buffer copies, but the exact number of them changes per run. Sometimes its way more than others, but always regular per-frame for any given run.

When it happens to get fewer of them it outperforms main at least, but its rare. Usually more than half of the frame is spent doing these pointless barriers and copies.

@atlv24
Copy link
Contributor

atlv24 commented Jan 24, 2024

This should be good to merge once #11515 merges. I have confirmed pretty substantial CPU perf wins with this PR plus that fix, around 0.2-0.3ms savings

@alice-i-cecile alice-i-cecile added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help and removed X-Controversial There is active debate or serious implications around merging this PR labels Jan 24, 2024
@alice-i-cecile
Copy link
Member

As @DGriffin91 pointed out, we should verify that this doesn't cause large performance regressions on more realistic scenes involving animated materials. We don't currently have a good example to test that though: lemme make an issue.

@atlv24
Copy link
Contributor

atlv24 commented Jan 25, 2024

#11524 shows that this is good for merge as-is

@alice-i-cecile alice-i-cecile removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Jan 25, 2024
@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 26, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 26, 2024
Merged via the queue into bevyengine:main with commit 86e91f4 Jan 26, 2024
24 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Jan 26, 2024
# Objective

- Fixes #11516

## Solution

- Add Animated Material example (colors are hue-cycling smoothly
per-mesh)


![image](https://github.com/bevyengine/bevy/assets/11307157/c75b9e66-0019-41b8-85ec-647559c6ba01)

Note: this example reproduces the perf issue found in #10610 pretty
consistently, with and without the changes from that PR included. Frame
time is sometimes around 4.3ms, other times around 12-14ms. Its pretty
random per run. I think this clears #10610 for merge.
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
# Objective

- Some passes recreate a sampler when creating a bind group to be
cached, even if the sampler is always the same.

## Solution

- Store the sampler in the corresponding pipeline resource.
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
# Objective

- Fixes bevyengine#11516

## Solution

- Add Animated Material example (colors are hue-cycling smoothly
per-mesh)


![image](https://github.com/bevyengine/bevy/assets/11307157/c75b9e66-0019-41b8-85ec-647559c6ba01)

Note: this example reproduces the perf issue found in bevyengine#10610 pretty
consistently, with and without the changes from that PR included. Frame
time is sometimes around 4.3ms, other times around 12-14ms. Its pretty
random per run. I think this clears bevyengine#10610 for merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Code-Quality A section of code that is hard to understand or change C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants