Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning or modifying many different 2d or 3d materials hangs for minutes or crashes #15893

Closed
DGriffin91 opened this issue Oct 14, 2024 · 7 comments · Fixed by #15988
Closed
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Examples An addition or correction to our examples D-Straightforward Simple bug fixes and API improvements, docs, test and examples P-Regression Functionality that used to work but no longer does. Add a test for this! S-Ready-For-Implementation This issue is ready for an implementation PR. Go for it!
Milestone

Comments

@DGriffin91
Copy link
Contributor

DGriffin91 commented Oct 14, 2024

Bevy version 89e19aa

The many_cubes example with cargo run --example many_cubes --release -- --vary-material-data-per-instance hangs indefinitely (Update: Tried just letting this run and after a little over 2 minutes the example started working).

This regression also affects modifying materials at run time. See example: #15893 (comment)

Windows 10 / RTX3060 / Vulkan

The issue was introduced at 7b81ae7 with Update WGPU to version 22

Apple M1 / Metal: Hangs for 4 minutes
Win10 / GTX1060 / Vulkan / i7 6700k: Hangs for 12 minutes
Win10 / RTX3060 / Vulkan / 7950x: Hangs for 2 minutes
Win10 / RTX3060 / Dx12 / 7950x: Crashes (Note Dx12 also crashes in 0.14)

2024-10-14T19:44:43.028155Z ERROR wgpu_hal::dx12::descriptor: Unable to allocate descriptors: RangeAllocationError { fragmented_free_length: 1 }
2024-10-14T19:44:43.028302Z ERROR wgpu::backend::wgpu_core: Handling wgpu errors as fatal by default
thread 'main' panicked at \.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-0.20.1\src\backend\wgpu_core.rs:2996:5:
wgpu error: Validation Error
Caused by:
    In Device::create_bind_group
      note: label = `StandardMaterial`
    Not enough memory left.

Minimal-ish 3d example:

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let mesh = Mesh3d(meshes.add(Cuboid::new(1.0, 1.0, 1.0)));
    for i in 0..50000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial3d(materials.add(Color::WHITE)),
            Transform::from_xyz(4.0, 0.0, -i as f32 * 2.0),
        ));
    }
    commands.spawn(Camera3d::default());
}
  • In the 3d example 50k hangs for about 30s on 3060/Vulkan and crashes on DX12 with --release (worse without release)

Minimal-ish 2d example:

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<ColorMaterial>>,
) {
    let mesh = Mesh2d(meshes.add(Rectangle::default()));
    for i in 0..200000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial2d(materials.add(Color::WHITE)),
            Transform::from_xyz(i as f32, 0.0, 0.0),
        ));
    }
    commands.spawn(Camera2d);
}
  • 200k on Apple M1 hangs for about 70s
  • 200k on 3060/Vulkan hangs for about 20s and crashes on Dx12
  • Dx12 crash seems to happen with only 3k, but works with 2k (Note Dx12 also crashes in 0.14)
  • (all with --release, worse without release)

Here's vtune filtered in on just the portion of time where it's hanging on the minimal 3d example:
Image

https://github.com/gfx-rs/wgpu/blob/c746c90ac0f34e19d975668e022b5e8c367201c3/wgpu-core/src/device/resource.rs#L2299
Image

vtune tested using release with debug symbols: --profile release-with-debug

[profile.release-with-debug]
inherits = "release"
debug = true
@DGriffin91 DGriffin91 added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Oct 14, 2024
@BenjaminBrienen BenjaminBrienen added C-Examples An addition or correction to our examples P-Regression Functionality that used to work but no longer does. Add a test for this! S-Ready-For-Implementation This issue is ready for an implementation PR. Go for it! D-Straightforward Simple bug fixes and API improvements, docs, test and examples and removed S-Needs-Triage This issue needs to be labelled labels Oct 14, 2024
@alice-i-cecile alice-i-cecile added this to the 0.15 milestone Oct 14, 2024
@DGriffin91 DGriffin91 changed the title many_cubes example with vary-material-data-per-instance hangs indefinitely Spawning many different 2d or 3d materials hangs for minutes or crashes Oct 14, 2024
@teoxoy
Copy link
Contributor

teoxoy commented Oct 16, 2024

That call to retain is not great as it will move all elements if the bind groups have been dropped in the same order that they have been created in. Since v22 our ownership model is closer to what it should be; I think we were previously dropping bind groups later in bulk.

Edit: The cause of this is gfx-rs/wgpu#5874 which fixed a leak so we are now behaving properly but should find a way to minimize the scanning of those weak refs (opened: gfx-rs/wgpu#6419).

@tychedelia
Copy link
Contributor

This issue is that we're using the same texture for every material, which means the retain loop is effectively exponential as we continue to try to retain 1..N materials that all share the same texture handle. While the behavior is obviously not great, I think this is a somewhat edge case we're hitting because of our stress test and is unlikely to affect users as long as they don't try to spawn huge numbers of materials with shared resource bindings in a similar manner.

@DGriffin91
Copy link
Contributor Author

DGriffin91 commented Oct 18, 2024

I think it's very common to share resources like textures across a significant number of different materials. For example, I've seen lots of actual games in production use the same grunge texture across a ton of different materials using mixing different channels of that same texture at different scales, tinting, blending, etc...

@tychedelia
Copy link
Contributor

tychedelia commented Oct 18, 2024

I think it's very common to share resources like textures across a significant number of different materials

Totally! Testing on my mbp I can spawn ~20k materials using --material-texture-count before I start get a beachball. Correct me if I'm wrong relative to production use, but that still seems like a ton of unique materials to all share the same texture. Definitely still a major performance regression.

@DGriffin91
Copy link
Contributor Author

Totally! Testing on my mbp I can spawn ~20k materials using --material-texture-count before I start get a beachball. Correct me if I'm wrong relative to production use, but that still seems like a ton of unique materials to all share the same texture.

I don't think 20k materials sharing the same texture is at all out of the question. That texture might be something related to the environment, a LUT of some kind, or something else that is widely shared etc... The many cubes example spawns 160k cubes with varying materials. Loading an actual large scene with a count like that would take 12 minutes on the Core i7 6700k / GTX1060 system for just this portion. If this regression was half the performance of the previous version of wgpu that would be one thing. But it appears to around 500x slower than it was in bevy 0.14 on the Core i7 6700k.

This also affects updating materials. The example below runs at 43ms/frame on bevy 0.14 with the 7950x and 3060. This is already very slow (idk if the performance issue with it in 0.14 is because of bevy, wgpu or both). In 0.15 it runs at 950ms/frame.

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .add_systems(Update, update_materials)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let mesh = Mesh3d(meshes.add(Cuboid::new(1.0, 1.0, 1.0)));
    for i in 0..5000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::linear_rgb(1.0, 0.0, 0.0),
                unlit: true,
                ..default()
            })),
            Transform::from_xyz(4.0, 0.0, -i as f32 * 2.0),
        ));
    }
    commands.spawn(Camera3d::default());
}
fn update_materials(mut materials: ResMut<Assets<StandardMaterial>>, time: Res<Time>) {
    for (i, (_, m)) in materials.iter_mut().enumerate() {
        m.base_color = Color::hsv(
            (time.elapsed_secs() * 100.0 + i as f32).rem_euclid(360.0),
            1.0,
            1.0,
        );
    }
}

@DGriffin91 DGriffin91 changed the title Spawning many different 2d or 3d materials hangs for minutes or crashes Spawning or modifying many different 2d or 3d materials hangs for minutes or crashes Oct 18, 2024
@DGriffin91
Copy link
Contributor Author

DGriffin91 commented Oct 18, 2024

@tychedelia One ubiquitous example of a shared resource would, at least in bevy, be the placeholder texture. That might be what makes these minimal examples so slow if your guess is correct about the issue being related to sharing the same texture.

@tychedelia
Copy link
Contributor

tychedelia commented Oct 18, 2024

This also affects updating materials.

Okay, this actually feels like a much bigger deal since it's not possible to hide behind loading. You've fully convinced me! Thanks.

That might be what makes these minimal examples so slow if your guess is correct about the issue being related to sharing the same texture.

Parking on a breakpoint

Image

@BenjaminBrienen BenjaminBrienen added the A-Rendering Drawing game state to the screen label Oct 29, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 5, 2024
Fixes #15893

---------

Co-authored-by: François Mockers <[email protected]>
mockersf added a commit that referenced this issue Nov 5, 2024
Fixes #15893

---------

Co-authored-by: François Mockers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Examples An addition or correction to our examples D-Straightforward Simple bug fixes and API improvements, docs, test and examples P-Regression Functionality that used to work but no longer does. Add a test for this! S-Ready-For-Implementation This issue is ready for an implementation PR. Go for it!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants