Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move adding DynamicUniformIndex to Extract #5037

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Jun 17, 2022

Objective

prepare_uniform_components's commands must be run with exclusive access to the render world and can take quite a bit of time for components on lots of entities, particularly with archetypes with many big components. This is doing redundant work that is already being done in Extract.

Solution

  • Implement Default on DynamicUniformIndex.
  • Move adding DynamicUniformIndex into Extract instead of Prepare.
  • Change prepare_uniform_components to query for &mut DynamicUniformIndex instead of using commands.

Performance

This was tested against the many_cubes stress test. Here are the respective timing changes:

stage/system main this PR
Full Frame 20.52ms 18.79ms
Extract (Stage) 3.7ms 3.83ms
Prepare (Stage) 2.64ms 1.13ms
extract_meshes (system) 1.45ms 1.69us
extract_meshes (commands) 881.75us 1.48ms
prepare_uniform_components (system) 1.12ms 929.56us
prepare_uniform_components (commands) 1.26ms 253ns

Changelog

TODO

Migration Guide

TODO

@james7132 james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jun 17, 2022
@james7132 james7132 requested a review from superdump June 17, 2022 20:03
Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. One question - instead of having a default value of 0, wouldn’t it be better to make it an Option and skip drawing the thing if its index was never initialised?

@james7132
Copy link
Member Author

james7132 commented Jun 18, 2022

I like this. One question - instead of having a default value of 0, wouldn’t it be better to make it an Option and skip drawing the thing if its index was never initialised?

This adds a branch in the middle of the render stage, which I'm hesitant to bloat even more given how heavy it already is, and it's assured to written to during prepare too. It also makes the component bigger, which deflates the performance gains we see here. Perhaps under a #[cfg(debug_assertions)]?

Comment on lines +40 to +47
impl<C: Component> Clone for DynamicUniformIndex<C> {
fn clone(&self) -> Self {
Self {
index: self.index,
marker: PhantomData,
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the manual Clone necessary here?
It doesn't relax the C: Component bound and Copy is still derived.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhantomData<T> only implements Clone iff T: Clone, which also transitively holds for the derived impl. This implements Clone and Default regardless of what T is.

Comment on lines +31 to +38
impl<C: Component> Default for DynamicUniformIndex<C> {
fn default() -> Self {
Self {
index: 0,
marker: PhantomData,
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default impl also could be replaced by a derive.

@james7132
Copy link
Member Author

james7132 commented Jun 19, 2022

Wanted to just note that if we merge #4902, we can avoid the secondary copy inside the extraction commands by adding a ExtractedComponentUniforms<C> resource we write to instead of inserting via Commands. This does make it harder to query for the the uniforms after the fact, but this minimizes the number of heavy copies being done. This effectively forces the extract stage systems to do what UniformComponentPlugin already did before, but it definitely avoids heavier copies, both into the render world and within it.

EDIT: Tried this, the queue_material_meshes system is dependent on MeshUniform as a component.

I also tested to see if we could defer the inverse().transpose() computation in extract_meshes, and it does seem like it shuffles about 100us of time in many_cubes to the prepare stage. Unlike before, this can be done in-place, without any copying, only a temporary on the stack is used.

@superdump
Copy link
Contributor

I like this. One question - instead of having a default value of 0, wouldn’t it be better to make it an Option and skip drawing the thing if its index was never initialised?

Could you test and profile it to see if it does make a practical performance difference? It would be a win for correctness in case something doesn’t actually ever get set.

@superdump
Copy link
Contributor

Wanted to just note that if we merge #4902, we can avoid the secondary copy inside the extraction commands by adding a ExtractedComponentUniforms<C> resource we write to instead of inserting via Commands. This does make it harder to query for the the uniforms after the fact, but this minimizes the number of heavy copies being done. This effectively forces the extract stage systems to do what UniformComponentPlugin already did before, but it definitely avoids heavier copies, both into the render world and within it.

EDIT: Tried this, the queue_material_meshes system is dependent on MeshUniform as a component.

I also tested to see if we could defer the inverse().transpose() computation in extract_meshes, and it does seem like it shuffles about 100us of time in many_cubes to the prepare stage. Unlike before, this can be done in-place, without any copying, only a temporary on the stack is used.

This again becomes a question of whether the model matrix is used/desired to be used elsewhere in the render schedule such that not calculating it upfront incurs calculation of it multiple times. If it just moves time from one place to another with no other overall performance benefits then the only pro is that it makes the extract stage shorter. That would be enough but only if we think no one ever needs the model matrix. I wonder if TAA would need it for motion vectors or how that works…

@james7132
Copy link
Member Author

Could you test and profile it to see if it does make a practical performance difference? It would be a win for correctness in case something doesn’t actually ever get set.

Tried this with a if let Some(index) = dynamic.index(). The Render phase perf doesn't tangibly change (7.49ms -> 7.50ms), though I think this may cause incorrect bindings if we don't panic on not finding an index. Changing it to panic via unwrap negatively affects render perf on my machine (7.49ms ->7.63ms).

@superdump
Copy link
Contributor

Could you test and profile it to see if it does make a practical performance difference? It would be a win for correctness in case something doesn’t actually ever get set.

Tried this with a if let Some(index) = dynamic.index(). The Render phase perf doesn't tangibly change (7.49ms -> 7.50ms), though I think this may cause incorrect bindings if we don't panic on not finding an index. Changing it to panic via unwrap negatively affects render perf on my machine (7.49ms ->7.63ms).

Good to know that panic incurs that performance hit. I was thinking that a missing index would somehow cause that entity not to be drawn by propagating up an error or something. But if we don’t already have error returns from draw functions then maybe it’s not worth it. I’m just kind of expecting it to be easy enough to make code where some entities never have their dynamic index updated and then they will be drawn using whatever the transform is for index 0. I suppose another way to handle it would be to make that model matrix produce vertices containing nans in the clip position and then it will be dropped, but that feels like a hack where it would be better to just not draw the thing.

@james7132
Copy link
Member Author

james7132 commented Jun 20, 2022

Good to know that panic incurs that performance hit. I was thinking that a missing index would somehow cause that entity not to be drawn by propagating up an error or something. But if we don’t already have error returns from draw functions then maybe it’s not worth it. I’m just kind of expecting it to be easy enough to make code where some entities never have their dynamic index updated and then they will be drawn using whatever the transform is for index 0. I suppose another way to handle it would be to make that model matrix produce vertices containing nans in the clip position and then it will be dropped, but that feels like a hack where it would be better to just not draw the thing.

Having seen the perf hit, I tried the opposite and changed RenderCommand to return a Result<(), RenderCommandError> instead and changed the existing unwraps into ok_or(...)?s, and saw a slight improvement in perf (7.49ms -> 7.37ms in the same stress test). This is likely because we're removing all machinery for panicking from the functions, allowing for fewer instructions/smaller jumps. I can either add it to this PR, though I'd rather not due to how large/controversial it would be relative to this change.

@superdump
Copy link
Contributor

superdump commented Jun 21, 2022

Having seen the perf hit, I tried the opposite and changed RenderCommand to return a Result<(), RenderCommandError> instead and changed the existing unwraps into ok_or(...)?s, and saw a slight improvement in perf (7.49ms -> 7.37ms in the same stress test). This is likely because we're removing all machinery for panicking from the functions, allowing for fewer instructions/smaller jumps. I can either add it to this PR, though I'd rather not due to how large/controversial it would be relative to this change.

Sounds reasonable to do it in a separate PR. If you don't intend to do that straight away, could you add a TODO comment?

@superdump superdump requested a review from cart June 26, 2022 00:08
render_device: Res<RenderDevice>,
render_queue: Res<RenderQueue>,
mut component_uniforms: ResMut<ComponentUniforms<C>>,
components: Query<(Entity, &C)>,
mut components: Query<(&C, &mut DynamicUniformIndex<C>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this break the UniformComponentPlugin in the general case?
This now assumes that DynamicUniformIndex is added in the extract step, but that isn't the case for something using, say, ExtractComponentPlugin.

We aren't currently using this anywhere else, but given that this is intended to be a generalized (and user facing) abstraction, I think we should discuss ways to make this "fool proof".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the ball on this PR, but thinking on this a bit more. I think it makes a lot of sense to take an approach where we keep indices as components while directly writing extracted components to their target staging buffers.

Indices are small. 4-8 bytes typically. Compare this with the equivalent MeshUniform, which is 132 bytes currently. If we are going to heavily leverage commands for rendering, we should be minimizing the number of large copies that are being performed. I'd much rather us copy heavy components once and then just shuffle the indices around.

If we still need the intermediate data during Prepare or Queue, we can always refer back to the buffer in memory. It's less ergonomic, but alleviates the heaviest parts of running the Render World right now.

@Weibye Weibye added the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label Aug 10, 2022
@james7132
Copy link
Member Author

Closing this as the renderer is already moving in a non-direct ECS storage direction, and the introduction of the instancing and batching changes makes this difficult to merge.

@james7132 james7132 closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Adopt-Me The original PR author has no intent to complete this work. Pick me up!
Projects
Status: Responded
Development

Successfully merging this pull request may close these issues.

5 participants