App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

walterlv · 2024-07-05T03:51:50Z

Description

Write a .NET 8 application that calls a native library using P/Invoke with a callback.
Run the app, then attach the dotnet debugger before the callback is called.
- Visual Studio Managed (.NET Core for Unix): https://learn.microsoft.com/en-us/visualstudio/debugger/remote-debugging-dotnet-core-linux-with-ssh?view=vs-2022#attach-the-debugger
- JetBrains Rider: https://www.jetbrains.com/help/rider/SSH_Remote_Debugging.html#debug-application-on-remote-machine
We'll see an output "Trace/Breakpoint Trap" and the app crashes.

Note: Not all native callbacks cause this issue so I've written a minimal reproducible example below.

Reproduction Steps

Minimal reproducible example 1:

Clone this repo: https://github.com/walterlv/Walterlv.Issues.TraceBreakpointTrap
build the demo to a linux machine
Run the app, then attach the dotnet debugger.

dotnet publish -c debug -r linux-x64 --self-contained

$ ./TraceBreakpointTrapDemo
### Trace/Breakpoint Trap issue on .NET debugger ###
Please attach a dotnet debugger and use 'Set next statement'.
Trace/breakpoint trap

Reproducible example 2:

https://github.com/Haltroy/CefGlue

Expected behavior

The app should not crash when the dotnet debugger is attached.

Actual behavior

The app crashes with an output "Trace/Breakpoint Trap".

Regression?

I've only tested this on .NET 8.0.302

Known Workarounds

I've found several workarounds:

Detect if the debugger is attached and don't call the callback.
Use the "Native (GDB)" or "Native (LLDB)" debugger instead of the "Managed (.NET Core for Unix)" debugger.

Note:

The Debugger.IsAttached property cannot detect the native debugger so I added alternative options --sleep <seconds> and --skip-attach for the minimal reproducible example above.
The native debugger is very difficult to use, so I hope this issue can be fixed.

Configuration

.NET: 8.0.302
OS:
- Ubuntu 22.04 LTS
- Debian 12
- UnionTech OS GNU/Linux 20
- Kylin V10 SP1
Architecture:
- x64
- ARM64

I didn't find any environment that doesn't have this issue.

Other information

dotnet tool install -g dotnet-sos
dotnet sos install
ulimit -c unlimited
Run echo "0x3F"> /proc/<pid>/coredump_filter after the process starts and the pid is known.
Attach the debugger and wait for the output Trace/Breakpoint Trap (core dumped).
lldb --core core TraceBreakpointTrapDemo

$ lldb --core core TraceBreakpointTrapDemo
SOS_HOSTING: Failed to find runtime directory
Unrecognized command 'setsymbolserver' because managed hosting failed or was disabled. See sethostruntime command for details.
(lldb) target create "TraceBreakpointTrapDemo" --core "core"
Core file '/home/uos/lvyi/Walterlv.Issue.TraceBreakpointTrap/core' (x86_64) was loaded.
(lldb) clrstack
OS Thread Id: 0x7ef9 (1)
        Child SP               IP Call Site
00007F4AF37DBA38 00007F4AF45F3B41 Walterlv.Issues.TraceBreakpointTrap.VolumeManager.ContextStateCallback(IntPtr, IntPtr)
(lldb) bt
* thread #1, name = 'TraceBreakpoint', stop reason = signal SIGTRAP
  * frame #0: 0x00007f4af45f3b41
    frame #1: 0x00007f4b6ba904f9 libpulse.so.0`___lldb_unnamed_symbol12$$libpulse.so.0 + 73
    frame #2: 0x00007f4b6ba93002 libpulse.so.0`___lldb_unnamed_symbol28$$libpulse.so.0 + 514
    frame #3: 0x00007f4b6ba931d2 libpulse.so.0`___lldb_unnamed_symbol29$$libpulse.so.0 + 98
    frame #4: 0x00007f4b6ba459b2 libpulsecommon-14.2.so`___lldb_unnamed_symbol101$$libpulsecommon-14.2.so + 258
    frame #5: 0x00007f4b6baa63c0 libpulse.so.0`pa_mainloop_dispatch + 672
    frame #6: 0x00007f4b6baa65cc libpulse.so.0`pa_mainloop_iterate + 60
    frame #7: 0x00007f4b6baa6670 libpulse.so.0`pa_mainloop_run + 32
    frame #8: 0x00007f4b6bab43f9 libpulse.so.0`___lldb_unnamed_symbol111$$libpulse.so.0 + 105
    frame #9: 0x00007f4b6ba51628 libpulsecommon-14.2.so`___lldb_unnamed_symbol119$$libpulsecommon-14.2.so + 88
    frame #10: 0x00007f4b73452fa3 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:486
    frame #11: 0x00007f4b7305d60f libc.so.6`__GI___clone at clone.S:95
(lldb) dis
->  0x7f4af45f3b41: subq   $0x20, %rsp
    0x7f4af45f3b45: leaq   0x20(%rsp), %rbp
    0x7f4af45f3b4a: movq   %rdi, -0x8(%rbp)
    0x7f4af45f3b4e: movq   %rsi, -0x10(%rbp)
    0x7f4af45f3b52: movq   %rdx, -0x18(%rbp)
    0x7f4af45f3b56: cmpl   $0x0, 0x897d3(%rip)
    0x7f4af45f3b5d: je     0x7f4af45f3b64
(lldb)

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-07-05T03:52:12Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

tommcdon · 2024-07-08T18:55:45Z

Hi @walterlv! Thanks for reporting this bug!

I didn't find any environment that doesn't have this issue.

Do you know if this issue reproduces on Windows?

tommcdon · 2024-07-08T19:00:58Z

Do you know if this issue reproduces on Windows?

Ahh nevermind this question as the repro is very specific to linux.

Do you know if the callback/debugging issue is specific to the libpulse API (e.g. does a standalone repo that uses callback from C++ to C# on Linux reproduce the issue)? I am curious if there is something specific to libpulse that is causing the problem, for example a difference in calling convention, etc...

lindexi · 2024-07-09T01:11:29Z

@tommcdon I can repro this issues by @walterlv 's repo in my linux system. And I can sure it's not the libpulse bug, because I can repro this issues with https://github.com/Haltroy/CefGlue

I can not reproduce on Windows because I fail to run the libpulse on Windows... I mean I do not know if it can be reproduced on Windows.

tommcdon · 2024-07-09T20:23:42Z

Possible duplicate to #102767. @hoyosjs

walterlv · 2024-07-10T03:59:58Z

Thanks to my friend @kkwpsv, he helped me to find out more information about this issue.

@tommcdon This issue is quite different from #102767:

This issue is related to the dotnet debugger on linux (and only on linux).
This issue might not related to the callback but I can't figure out whether it is or not.

Let's see more details here.

Debug run the app using a dotnet debugger (I was using the JetBrains Rider linux version) and let the app stops at a breakpoint.
Attach lldb to the running process.
Continue the app in the dotnet debugger.
Continue the app in the lldb debugger.

Then,

See all the threads in the lldb debugger using thread backtrace all and we that thread 3 .NET EventPipe is stopped with signal SIGTRAP
Resume the app and the thread 3 receives a detail signal signal SIGSEGV: address not mapped to object (fault address: 0xbafa13a0).

The stack traces are shown as follows:

https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventPipeEventProvider.cs

[UnmanagedCallersOnly]
private static unsafe void Callback(byte* sourceId, int isEnabled, byte level,
    long matchAnyKeywords, long matchAllKeywords, Interop.Advapi32.EVENT_FILTER_DESCRIPTOR* filterData, void* callbackContext)
{
    EventPipeEventProvider _this = (EventPipeEventProvider)GCHandle.FromIntPtr((IntPtr)callbackContext).Target!;
    if (_this._eventProvider.TryGetTarget(out EventProvider? target))
    {
        _this.ProviderCallback(target, sourceId, isEnabled, level, matchAnyKeywords, matchAllKeywords, filterData);
    }
}

tommcdon · 2024-07-23T20:21:41Z

@hoyosjs

mdh1418 · 2024-08-09T19:24:01Z

Hi @walterlv and @lindexi,

We haven't been able to repro the exact issue from your repros yet, but the SIGSEGV for the EventPipeEventProvider callback looks eerily similar to #80666 (comment), where the _gchandle used in the callback had been freed before the callback completes.

If the dotnet debugger is hitting the same EventPipeEventProvider Callback issue, then there is a partial fix already merged through #106040 and a second PR #106156 that is open

lindexi · 2024-08-10T03:48:13Z

@mdh1418 Thank you. What VisualStudio version and dotnet version you use? And do you debug the application run on Linux?

Can I test the daily dotnet version which merged #106040 ?

tommcdon · 2024-08-10T14:42:25Z

What VisualStudio version and dotnet version you use? And do you debug the application run on Linux?

We used the latest version of the C# extension in VS Code

Can I test the daily dotnet version which merged #106040 ?

Yes - the daily builds from https://github.com/dotnet/sdk/blob/main/documentation/package-table.md contain the fix.

kkwpsv · 2024-08-12T09:21:01Z

@tommcdon I test again with https://aka.ms/dotnet/9.0.1xx/daily/dotnet-sdk-linux-x64.tar.gz.
There is no SIGSEV now. The process still exits with SIGTRAP.

I debugged it with lldb. Here's the output:

jwilliamsonveeam · 2024-09-27T21:15:36Z

Seems like the same problem I'm seeing here: microsoft/DockerTools#444

lindexi · 2024-09-28T02:24:50Z

@jwilliamsonveeam Sorry, the microsoft/DockerTools#444 is too long, I'm afraid I'm missing out on important information.

jwilliamsonveeam · 2024-09-30T22:01:58Z

@lindexi I updated my last comment with a small self contained example of a program that fails with a sigtrap in the native c code callback.
microsoft/DockerTools#444 (comment)
and a zip of the whole solution is in this thread if you have access.
https://developercommunity.visualstudio.com/t/dotnet-process-silently-crashes-when-deb/10740222?

Alxe · 2024-10-01T11:13:33Z

I've run @walterlv's reproducer (Walterlv.Issues.TraceBreakpointTrap) and reproduced the issue as well.

I've been debugging a similar issue where the scenario is as follows:

A C# callback (annotated with UnmanagedFunctionPointer) is sent to a C function through P/Invoke (annotated with DllImport).
The C code is run in a thread distinct from the one that installed the C# callback.
If the debugger is attached when the C# callback is executed for the first time, the application crashed with a SIGTRAP.
If the debugger is attached after the C# callback has been executed once, the application works correctly.

Using @walterlv's reproducer as a base, I've modified it with these changes and managed to avoid the crash. The output from my execution is as follows:

$ ./artifacts/bin/Walterlv.Issues.TraceBreakpointTrap/debug/TraceBreakpointTrapDemo --skip-attach
### Trace/Breakpoint Trap issue on .NET debugger ###

Context state changed: 1
If you want to debug this demo using other debuggers (e.g. GDB, LLDB), you can use the following options:

  --sleep <seconds>  Sleep for a while before attaching debugger.
  --skip-attach      Skip attaching debugger and run directly.

Please attach a dotnet debugger and use 'Set next statement'.
Context state changed: 2
Context state changed: 3
Context state changed: 4
Context state changed: 5
Issue may not be reproduced. Exit.

In the output, changes 1 to 4 are from before the debugger is attached. Once the debug is attached, change 5 is printed but there's no crash.

Additionally, in my own (non-shareable) projects, I've been able to use a C debugger (lldb or gdb) to manually call the callback (through a function pointer) directly from the debugger. This led to the C# application throwing the following error:

Fatal error. Invalid Program: attempted to call a UnmanagedCallersOnly method from managed code.

This error is seemingly thrown here, but I don't have a fine understanding of the dotnet runtime.
However, it leads me to believe that the key is that there are two distinct threads.

janvorli · 2024-10-01T12:57:33Z

If the debugger is attached when the C# callback is executed for the first time, the application crashed with a SIGTRAP.

If the debugger is attached after the C# callback has been executed once, the application works correctly.

I think this may have revealed the culprit. The thing is that .NET runtime only handles signals when the thread those occurred on are known to the runtime. That means that they were either created by the runtime or called into the runtime. If the debugger sets the breakpoint on the UnmanagedCallersOnly marked method before it calls into the runtime and registers the thread as one that runs managed code, the SIGTRAP would not call the handler in the runtime and it would invoke the default signal handler that terminates the process.

This error is seemingly thrown here

This code is for NativeAOT, in coreclr, the error comes from here:

runtime/src/coreclr/vm/dllimportcallback.cpp

Lines 187 to 196 in 008ee9f

    
           extern "C" VOID STDCALL ReversePInvokeBadTransition() 
        
           { 
        
               STATIC_CONTRACT_THROWS; 
        
               STATIC_CONTRACT_GC_TRIGGERS; 
        
               // Fail 
        
               EEPOLICY_HANDLE_FATAL_ERROR_WITH_MESSAGE( 
        
                                                        COR_E_EXECUTIONENGINE, 
        
                                                        W("Invalid Program: attempted to call a UnmanagedCallersOnly method from managed code.") 
        
                                                       ); 
        
           }

Alxe · 2024-10-01T13:35:49Z

@janvorli Hello and thanks for your input!

I'll be reviewing the ReversePInvokeBadTransition function, as I think I already added a native breakpoint there (it's a extern "C" function) and was able to hit it once.

However, I'd like to point out that the yet-unregistered thread is receiving a SIGTRAP regardless of whether I had a .NET breakpoint or not. Is there anything relevant that the debugger could be doing on thread registration? Could you share some links to code?

jwilliamsonveeam · 2024-10-01T18:58:39Z

https://github.com/jwilliamsonveeam/TimerCallBackDemo
I created a repo with my failing case. I also do not need any breakpoints in order for this to fail with a SIGTRAP with the debugger attached.

janvorli · 2024-10-01T20:46:46Z

The debugger can set some breakpoints on its own for its internal purposes. @tommcdon would most likely know if it can be the case here.

Alxe · 2024-10-07T07:34:22Z

@janvorli If the debugger is setting its own breakpoint (e.g. on managed-to-unmanaged transitions) and then reaching it before the thread is properly registered with the .NET runtime (e.g. on the first .NET interaction of a thread), then the SIGTRAP and subsequent crash would make sense.

@tommcdon Could you please confirm if my assumption is correct?

dotnet-issue-labeler bot added the area-Diagnostics-coreclr label Jul 5, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 5, 2024

tommcdon added this to the 9.0.0 milestone Jul 20, 2024

tommcdon removed the untriaged New issue has not been triaged by the area owner label Jul 20, 2024

tommcdon modified the milestones: 9.0.0, 10.0.0 Aug 9, 2024

jwilliamsonveeam mentioned this issue Sep 27, 2024

dotnet process silently crashes when debugger is attached microsoft/DockerTools#444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

walterlv commented Jul 5, 2024

dotnet-policy-service bot commented Jul 5, 2024

tommcdon commented Jul 8, 2024

tommcdon commented Jul 8, 2024

lindexi commented Jul 9, 2024 •

edited

Loading

tommcdon commented Jul 9, 2024

walterlv commented Jul 10, 2024 •

edited

Loading

tommcdon commented Jul 23, 2024

mdh1418 commented Aug 9, 2024

lindexi commented Aug 10, 2024

tommcdon commented Aug 10, 2024

kkwpsv commented Aug 12, 2024

jwilliamsonveeam commented Sep 27, 2024

lindexi commented Sep 28, 2024

jwilliamsonveeam commented Sep 30, 2024

Alxe commented Oct 1, 2024

janvorli commented Oct 1, 2024

Alxe commented Oct 1, 2024 •

edited

Loading

jwilliamsonveeam commented Oct 1, 2024 •

edited

Loading

janvorli commented Oct 1, 2024

Alxe commented Oct 7, 2024

App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

Comments

walterlv commented Jul 5, 2024

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

dotnet-policy-service bot commented Jul 5, 2024

tommcdon commented Jul 8, 2024

tommcdon commented Jul 8, 2024

lindexi commented Jul 9, 2024 • edited Loading

tommcdon commented Jul 9, 2024

walterlv commented Jul 10, 2024 • edited Loading

tommcdon commented Jul 23, 2024

mdh1418 commented Aug 9, 2024

lindexi commented Aug 10, 2024

tommcdon commented Aug 10, 2024

kkwpsv commented Aug 12, 2024

jwilliamsonveeam commented Sep 27, 2024

lindexi commented Sep 28, 2024

jwilliamsonveeam commented Sep 30, 2024

Alxe commented Oct 1, 2024

janvorli commented Oct 1, 2024

Alxe commented Oct 1, 2024 • edited Loading

jwilliamsonveeam commented Oct 1, 2024 • edited Loading

janvorli commented Oct 1, 2024

Alxe commented Oct 7, 2024

lindexi commented Jul 9, 2024 •

edited

Loading

walterlv commented Jul 10, 2024 •

edited

Loading

Alxe commented Oct 1, 2024 •

edited

Loading

jwilliamsonveeam commented Oct 1, 2024 •

edited

Loading