Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error instead of panic in check bind #6012

Merged
merged 6 commits into from
Jul 24, 2024
Merged

Error instead of panic in check bind #6012

merged 6 commits into from
Jul 24, 2024

Conversation

sagudev
Copy link
Contributor

@sagudev sagudev commented Jul 22, 2024

Connections
Fixes #6011

Description
Removed zipping of binding entries introduced in 4a19ac2 (to make sure binding numbers actually match) and add unknown error for fallback.

Testing
Fixes repro and servo CTS run does not crash anymore.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@sagudev
Copy link
Contributor Author

sagudev commented Jul 22, 2024

there is still one failing test

• CRASH [expected OK] /_webgpu/webgpu/cts.https.html?q=webgpu:api,validation,encoding,programmable,pipeline_bind_group_compat:default_bind_group_layouts_never_match,render_pass:*
2024-07-22T17:06:01.3143733Z 
2024-07-22T17:06:01.3143816Z       
2024-07-22T17:06:01.3154960Z       called Option::unwrap() on a None value (thread WGPU, at /home/runner/.cargo/git/checkouts/wgpu-4f4c57509d80a789/8a89e85/wgpu-core/src/command/bind.rs:182)
2024-07-22T17:06:01.3155117Z          0: servoshell::backtrace::print
2024-07-22T17:06:01.3155256Z          1: servoshell::panic_hook::panic_hook
2024-07-22T17:06:01.3155391Z          2: std::panicking::rust_panic_with_hook
2024-07-22T17:06:01.3155561Z          3: std::panicking::begin_panic_handler::{{closure}}
2024-07-22T17:06:01.3155749Z          4: std::sys_common::backtrace::__rust_end_short_backtrace
2024-07-22T17:06:01.3155845Z          5: rust_begin_unwind
2024-07-22T17:06:01.3155964Z          6: core::panicking::panic_fmt
2024-07-22T17:06:01.3156072Z          7: core::panicking::panic
2024-07-22T17:06:01.3156178Z          8: core::option::unwrap_failed
2024-07-22T17:06:01.3156458Z          9: wgpu_core::command::bind::compat::BoundBindGroupLayouts<A>::get_invalid
2024-07-22T17:06:01.3156620Z         10: wgpu_core::command::render::State<A>::is_ready
2024-07-22T17:06:01.3156747Z         11: wgpu_core::command::render::draw
2024-07-22T17:06:01.3157126Z         12: <wgpu_core::command::render::RenderPass<A> as wgpu_core::command::dyn_render_pass::DynRenderPass>::end
2024-07-22T17:06:01.3157247Z         13: webgpu::wgpu_thread::WGPU::run
2024-07-22T17:06:01.3157443Z         14: std::sys_common::backtrace::__rust_begin_short_backtrace
2024-07-22T17:06:01.3157622Z         15: core::ops::function::FnOnce::call_once{{vtable.shim}}
2024-07-22T17:06:01.3158008Z         16: std::sys::pal::unix::thread::Thread::new::thread_start
2024-07-22T17:06:01.3158108Z         17: <unknown>
2024-07-22T17:06:01.3158190Z         18: <unknown>

@sagudev
Copy link
Contributor Author

sagudev commented Jul 22, 2024

Why exactly we do Ok only when is_equal (which uses Arc::ptr_eq, so it must be same object) instead of doing content comparison (we already do it, but only for error printing => no error should be ok too I think).

@teoxoy
Copy link
Member

teoxoy commented Jul 23, 2024

We use is_equal in is_valid as well, this is an invariant currently because we deduplicate BGLs at creation.

let bgl_result = device.bgl_pool.get_or_init(entry_map, |entry_map| {
let bgl =
device.create_bind_group_layout(&desc.label, entry_map, bgl::Origin::Pool)?;
bgl.exclusive_pipeline
.set(binding_model::ExclusivePipeline::None)
.unwrap();
let bgl = Arc::new(bgl);
Ok(bgl)
});

@teoxoy
Copy link
Member

teoxoy commented Jul 23, 2024

I looked at the default_bind_group_layouts_never_match test and it's not clear to me what we are missing. Is the .unwrap() failing on all the subtests?

@sagudev
Copy link
Contributor Author

sagudev commented Jul 23, 2024

I looked at the default_bind_group_layouts_never_match test and it's not clear to me what we are missing. Is the .unwrap() failing on all the subtests?

No, some works, I am currently writing repro in rust using wgpu (also causes crash in firefox: https://gpuweb.github.io/cts/standalone/?q=webgpu:api,validation,encoding,programmable,pipeline_bind_group_compat:default_bind_group_layouts_never_match,render_pass:pipelineType=%22auto0%22;bindingType=%22auto0%22;swap=true;empty=false;renderCommand=%22draw%22) I think invariant might be broken when using get_bind_group_layout, because I debug printed both bgls and they have same content except for ptrs (raw field and tracker index), diff:

@@ -1,7 +1,7 @@
- expected_bgl BindGroupLayout {
+ actual_bgl BindGroupLayout {
   │     raw: Some(
   │         BindGroupLayout {
-  │             raw: 0x7f07d052a780,
+  │             raw: 0x7f07d052a830,
   │             desc_count: DescriptorTotalCount {
   │                 sampler: 0,
   │                 combined_image_sampler: 0,
@@ -133,7 +133,7 @@ expected_bgl BindGroupLayout {
   │     label: "",
   │     tracking_data: TrackingData {
   │         tracker_index: TrackerIndex(
-  │             68,
+  │             69,
   │         ),
   │         tracker_indices: SharedTrackerIndexAllocator {
   │             inner: Mutex {

@teoxoy
Copy link
Member

teoxoy commented Jul 23, 2024

That can happen if the BGLs differ in exclusive_pipeline, if they don't then it's a bug somewhere.

@sagudev
Copy link
Contributor Author

sagudev commented Jul 23, 2024

That can happen if the BGLs differ in exclusive_pipeline

This should already be catched above, but it's worth rechecking. EDIT: It's okay.

@sagudev
Copy link
Contributor Author

sagudev commented Jul 23, 2024

@sagudev
Copy link
Contributor Author

sagudev commented Jul 23, 2024

Same commit is bad one, expected error instead of crash:

ERROR: Validation {
    source: ContextError {
        string: "RenderPass::end",
        cause: RenderPassError {
            scope: Draw {
                kind: Draw,
                indexed: false,
            },
            inner: Draw(
                IncompatibleBindGroup(
                    IncompatibleBindGroupError {
                        index: 2,
                        pipeline: ResourceErrorIdent {
                            type: "RenderPipeline",
                            label: "",
                        },
                        diff: [
                            "Should be compatible an with an implicit BindGroupLayout with '' label",
                            "Assigned implicit BindGroupLayout with '' label",
                        ],
                    },
                ),
            ),
        },
        label_key: "label",
        label: "",
    },
    description: "Validation Error\n\nCaused by:\n    In RenderPass::end\n    In a draw command, kind: Draw\n    In a draw command, kind: Draw\n    Bind group at index 2 is incompatible with the current set RenderPipeline with '' label\n",
}

@sagudev
Copy link
Contributor Author

sagudev commented Jul 23, 2024

Before 4a19ac2 there was always something in the diff (even if we didn't know exactly what didn't match), so I think adding fallback using unwrap_or would me the most appropriate solution (for now, as the root cause is for separate issue).

@sagudev sagudev changed the title check that bindings number match Do not panic in check bind (compatibility) Jul 23, 2024
@sagudev sagudev marked this pull request as ready for review July 23, 2024 17:48
@sagudev sagudev requested a review from a team as a code owner July 23, 2024 17:48
@sagudev sagudev requested a review from teoxoy July 23, 2024 17:49
@Wumpf Wumpf added the PR: needs back-porting PR with a fix that needs to land on crates label Jul 24, 2024
@sagudev sagudev changed the title Do not panic in check bind (compatibility) Error instead of panic in check bind Jul 24, 2024
@sagudev
Copy link
Contributor Author

sagudev commented Jul 24, 2024

rebased and added changelog entry

@teoxoy
Copy link
Member

teoxoy commented Jul 24, 2024

repro: https://github.com/sagudev/wgpu-problem/tree/another-empty-bind-panic-2 (this took way to much time, because sagudev/wgpu-problem@2da4ad2)

Thanks for looking into it and putting up a minimal reproducible example!

I'll look at it and try to come up with a fix this week. Part of the reason I added the .unwrap() is to catch cases like this where something we didn't expect happened. This would have gone unnoticed for a lot longer otherwise.

Let's land this as is for now to get it in the patch release.

@teoxoy teoxoy merged commit 2897fb5 into gfx-rs:trunk Jul 24, 2024
25 checks passed
@sagudev
Copy link
Contributor Author

sagudev commented Jul 24, 2024

repro: https://github.com/sagudev/wgpu-problem/tree/another-empty-bind-panic-2 (this took way to much time, because sagudev/wgpu-problem@2da4ad2)

Thanks for looking into it and putting up a minimal reproducible example!

I'll look at it and try to come up with a fix this week. Part of the reason I added the .unwrap() is to catch cases like this where something we didn't expect happened. This would have gone unnoticed for a lot longer otherwise.

I will open new issue for this, with all known info in one place.

cwfitzgerald pushed a commit to cwfitzgerald/wgpu that referenced this pull request Jul 31, 2024
Removed zipping of binding entries introduced in 4a19ac2 (to make sure binding numbers actually match) and add unknown error for fallback.
# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: needs back-porting PR with a fix that needs to land on crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regression: Panic instead of validation error when checking binding compatibility
3 participants