Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DebugPrintf prevents panics' infinite loops from being (unsoundly) optimized away. #1048

Closed
charles-r-earp opened this issue Apr 24, 2023 · 4 comments
Labels
t: bug Something isn't working

Comments

@charles-r-earp
Copy link
Contributor

#768

See https://github.com/charles-r-earp/rust-gpu/tree/debug_printf_panic.

A minimal example that prints and panics.

#![no_std]
#![feature(asm_experimental_arch)]

use spirv_std::macros::spirv;

#[spirv(compute(threads(1)))]
pub fn main() {
    unsafe {
        spirv_std::macros::debug_printfln!();
    }
    // panic becomes infinite loop, normally optimized away
    // but with debug_printf it is not.
    panic!();
}
cargo run -p debug-printf-panic
// output
; SPIR-V
; Version: 1.5
; Generator: Google rspirv; 0
; Bound: 22
; Schema: 0
               OpCapability Shader
               OpCapability VulkanMemoryModel
               OpExtension "SPV_KHR_non_semantic_info"
          %1 = OpExtInstImport "NonSemantic.DebugPrintf"
               OpMemoryModel Logical Vulkan
               OpEntryPoint GLCompute %2 "main"
               OpExecutionMode %2 LocalSize 1 1 1

               ; Debug Information
          %3 = OpString "
"

               ; Types, variables and constants
       %void = OpTypeVoid
          %7 = OpTypeFunction %void

               ; Function 2
          %2 = OpFunction %void None %7
          %8 = OpLabel
         %17 = OpExtInst %void %1 1 %3
               OpBranch %18
         %18 = OpLabel
               OpLoopMerge %21 %18 None
               OpBranch %18
         %21 = OpLabel
               OpUnreachable
               OpFunctionEnd

#version 450
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

void main()
{
    // unimplemented ext op 12
    for (;;)
    {
    }
}

This causes a crash or hang when running the shader.

@charles-r-earp charles-r-earp added the t: bug Something isn't working label Apr 24, 2023
@eddyb
Copy link
Contributor

eddyb commented Apr 24, 2023

    // panic becomes infinite loop, normally optimized away
    // but with debug_printf it is not.

This causes a crash or hang when running the shader.

Yes, that's the intended behavior from the infinite loop, that a panic hangs.
It's actually a bug that the infinite loop gets optimized away sometimes (maybe it should get a control barrier or something, to force it to stick around).

If you want a more useful panic behavior (which, for the record, I fully agree we need to improve this), maybe open an issue about making the abort intrinsic (which panic! invokes) exit the shader instead of hanging?

We've been blocked on an OpAbort for so long, that I kept hearing over the years that it might be happening, but at this point I'm pretty confident we can just have a custom "emulated unwinding" that relies on wrapping every call in if unwinding { return; }, then letting heavy inlining and CFG structurization deal with it.

(the hardest part is forcing the user to provide some way to report the panic, if their code can ever panic)

@charles-r-earp
Copy link
Contributor Author

Yes, that's the intended behavior from the infinite loop, that a panic hangs. It's actually a bug that the infinite loop gets optimized away sometimes (maybe it should get a control barrier or something, to force it to stick around).

Hmmm, this just returns if out of bounds:

#[spirv(compute(threads(1)))]
pub fn main(#[spirv(storage_buffer, descriptor_set = 0, binding = 0)] y: &mut [u32]) {
    y[1] = 1;
}
; SPIR-V
; Version: 1.5
; Generator: Google rspirv; 0
; Bound: 23
; Schema: 0
               OpCapability Shader
               OpCapability VulkanMemoryModel
               OpExtension "SPV_KHR_non_semantic_info"
               OpMemoryModel Logical Vulkan
               OpEntryPoint GLCompute %1 "main" %2
               OpExecutionMode %1 LocalSize 1 1 1

               ; Annotations
               OpDecorate %_runtimearr_uint ArrayStride 4
               OpDecorate %_struct_5 Block
               OpMemberDecorate %_struct_5 0 Offset 0
               OpDecorate %2 Binding 0
               OpDecorate %2 DescriptorSet 0

               ; Types, variables and constants
       %uint = OpTypeInt 32 0
%_runtimearr_uint = OpTypeRuntimeArray %uint
  %_struct_5 = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer__struct_5 = OpTypePointer StorageBuffer %_struct_5
       %void = OpTypeVoid
          %9 = OpTypeFunction %void
          %2 = OpVariable %_ptr_StorageBuffer__struct_5 StorageBuffer
     %uint_0 = OpConstant %uint 0
       %bool = OpTypeBool
     %uint_1 = OpConstant %uint 1
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint

               ; Function 1
          %1 = OpFunction %void None %9
         %15 = OpLabel
         %17 = OpArrayLength %uint %2 0
         %18 = OpULessThan %bool %uint_1 %17
               OpSelectionMerge %20 None
               OpBranchConditional %18 %20 %21
         %21 = OpLabel
               OpReturn
         %20 = OpLabel
         %22 = OpAccessChain %_ptr_StorageBuffer_uint %2 %uint_0 %uint_1
               OpStore %22 %uint_1
               OpReturn
               OpFunctionEnd

#version 450
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

layout(binding = 0, std430) buffer _5_2
{
    uint _m0[];
} _2;

void main()
{
    if (!(1u < uint(_2._m0.length())))
    {
        return;
    }
    _2._m0[1u] = 1u;
}

But adding back the print keeps the loop:

#[spirv(compute(threads(1)))]
pub fn main(#[spirv(storage_buffer, descriptor_set = 0, binding = 0)] y: &mut [u32]) {
    y[1] = 1;
    unsafe {
        spirv_std::macros::debug_printfln!();
    }
}
; SPIR-V
; Version: 1.5
; Generator: Google rspirv; 0
; Bound: 28
; Schema: 0
               OpCapability Shader
               OpCapability VulkanMemoryModel
               OpExtension "SPV_KHR_non_semantic_info"
          %1 = OpExtInstImport "NonSemantic.DebugPrintf"
               OpMemoryModel Logical Vulkan
               OpEntryPoint GLCompute %2 "main" %3
               OpExecutionMode %2 LocalSize 1 1 1

               ; Debug Information
          %4 = OpString "
"

               ; Annotations
               OpDecorate %_runtimearr_uint ArrayStride 4
               OpDecorate %_struct_7 Block
               OpMemberDecorate %_struct_7 0 Offset 0
               OpDecorate %3 Binding 0
               OpDecorate %3 DescriptorSet 0

               ; Types, variables and constants
       %uint = OpTypeInt 32 0
%_runtimearr_uint = OpTypeRuntimeArray %uint
  %_struct_7 = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer__struct_7 = OpTypePointer StorageBuffer %_struct_7
       %void = OpTypeVoid
         %11 = OpTypeFunction %void
%_ptr_StorageBuffer__runtimearr_uint = OpTypePointer StorageBuffer %_runtimearr_uint
          %3 = OpVariable %_ptr_StorageBuffer__struct_7 StorageBuffer
     %uint_0 = OpConstant %uint 0
       %bool = OpTypeBool
     %uint_1 = OpConstant %uint 1
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint

               ; Function 2
          %2 = OpFunction %void None %11
         %17 = OpLabel
         %18 = OpAccessChain %_ptr_StorageBuffer__runtimearr_uint %3 %uint_0
         %19 = OpArrayLength %uint %3 0
         %20 = OpULessThan %bool %uint_1 %19
               OpSelectionMerge %22 None
               OpBranchConditional %20 %22 %23
         %23 = OpLabel
               OpBranch %26
         %26 = OpLabel
               OpLoopMerge %27 %26 None
               OpBranch %26
         %27 = OpLabel
               OpUnreachable
         %22 = OpLabel
         %24 = OpAccessChain %_ptr_StorageBuffer_uint %3 %uint_0 %uint_1
               OpStore %24 %uint_1
         %25 = OpExtInst %void %1 1 %4
               OpReturn
               OpFunctionEnd

#version 450
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

layout(binding = 0, std430) buffer _7_3
{
    uint _m0[];
} _3;

void main()
{
    if (!(1u < uint(_3._m0.length())))
    {
        for (;;)
        {
        }
    }
    _3._m0[1u] = 1u;
    // unimplemented ext op 12
}

Silently continuing is problematic since then you can end up with uninitialized / uncomputed outputs. Hanging is preferable but it would be nice to exit execution with an error to the runtime. In particular, it would be useful to identify which shader failed, at least for dev because otherwise it can be hard to track down. If the runtime could catch the panic it could report a helpful error, as well as a backtrace.

@eddyb
Copy link
Contributor

eddyb commented Apr 25, 2023

You can try RUSTGPU_CODEGEN_ARGS=--no-spirv-opt, very likely that spirv-opt is optimizing out the loop {} in some cases.

This is a well-known problem with low-level IRs, LLVM had this Rust unsoundness until relatively recently, and ofc SPIR-V did the worst possible thing and copied the bad LLVM behavior that LLVM had already been trying to resolve.

(EDIT: are you on Discord or Zulip? it's much easier to dig into something like this in real-time)

@eddyb eddyb changed the title DebugPrintf leads to infinite loop with panics. DebugPrintf prevents panics' infinite loops from being (unsoundly) optimized away. Apr 25, 2023
@eddyb
Copy link
Contributor

eddyb commented Jul 21, 2023

Rust-GPU panic has significantly changed in these PRs:

By now there should not be anything remotely like infinite loops left around (and while there is a way to get OpUnreachable instead, which is guaranteed UB, it's opt-in and not really meant to be used outside of testing).

@eddyb eddyb closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants