Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeAOT ComWrappers: fix race condition in when creating native wrapper #85235

Merged
merged 2 commits into from
Apr 25, 2023

Conversation

AustinWise
Copy link
Contributor

If a ComWrapper for a native object becomes unrooted, the GCHandle in _rcwCache can be nulled out by the GC. If a new wrapper is created using GetOrCreateObjectForComInstance before the NativeObjectWrapper finializer has a chance to run, a null value could be returned from GetOrCreateObjectForComInstance.

This PR adds test that passes on the main branch using CoreCLR but fails under NativeAOT. After this change, the test passes for NativeAOT as well.

…nstance

If a ComWrapper for a native object becomes unrooted, the GCHandle in
_rcwCache can be nulled out by a GC. If a new wrapper is created using
GetOrCreateObjectForComInstance before the NativeObjectWrapper
finializer has a chance to run, a null value could be return from
GetOrCreateObjectForComInstance.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Apr 24, 2023
jkoritzinsky
jkoritzinsky previously approved these changes Apr 24, 2023
@AustinWise
Copy link
Contributor Author

Thinking about this a bit more, the finializer will remove the new object from the cache. So this is not a correct solution.

@jkoritzinsky
Copy link
Member

Once I've merged in #85000, I'll re-trigger CI and the outerloop with this change (as that PR adds a lot of test coverage for ComWrappers on NativeAOT).

@AustinWise
Copy link
Contributor Author

AustinWise commented Apr 24, 2023

I think this should work correctly now. The finalizer for NativeObjectWrapper will now only remove its own GCHandle from the cache.

@jkoritzinsky
Copy link
Member

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkoritzinsky
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legs I wanted to pass have passed and the code looks good!

@jkoritzinsky jkoritzinsky merged commit 32c600f into dotnet:main Apr 25, 2023
@AustinWise AustinWise deleted the austin/ComWrappersGc branch April 25, 2023 16:33
@MichalStrehovsky
Copy link
Member

The legs I wanted to pass have passed and the code looks good!

There is a failure in System.Runtime.InteropServices.Tests in the Build windows-x64 Release NativeAOT_Checked_Libs_SizeOpt leg that looks a little bit concerning:

[SKIP] System.Runtime.InteropServices.Tests.Int64Tests.WriteInt64_NotReadable_ThrowsArgumentException
[SKIP] System.Runtime.InteropServices.Tests.IsTypeVisibleFromComTests.IsTypeVisibleFromCom_Windows_ReturnsExpected
--------------------------------------------------
Debug Assertion Violation

Expression: '!IsGCSpecial()'

File: D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\thread.cpp, Line: 1151

We have a dump that can be pulled down with runfo get-helix-payload -j bc38ca60-eb63-44e0-83f6-87f1bc93b3d2 -w System.Runtime.InteropServices.Tests -o c:\hel.

00 000000ef`68bfde60 00007ff6`ae80de43     KERNELBASE!RaiseFailFastException+0x152
01 (Inline Function) --------`--------     System_Runtime_InteropServices_Tests!PalRaiseFailFastException+0xd [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\PalRedhawkFunctions.h @ 162] 
02 000000ef`68bfe440 00007ff6`ae8064cb     System_Runtime_InteropServices_Tests!Assert+0xe3 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\rhassert.cpp @ 83] 
03 (Inline Function) --------`--------     System_Runtime_InteropServices_Tests!Thread::InlineTryFastReversePInvoke+0x52 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\thread.cpp @ 1151] 
04 000000ef`68bff4e0 00007ff6`aeafc15c     System_Runtime_InteropServices_Tests!RhpReversePInvoke+0x7b [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\thread.cpp @ 1309] 
05 000000ef`68bff510 00007ff6`ae80f293     System_Runtime_InteropServices_Tests!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers_ManagedObjectWrapperHolder__IsRootedCallback+0x14 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/ComWrappers.NativeAot.cs @ 407] 
06 000000ef`68bff550 00007ff6`ae811878     System_Runtime_InteropServices_Tests!RestrictedCallouts::InvokeRefCountedHandleCallbacks+0xa3 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\RestrictedCallouts.cpp @ 230] 
07 000000ef`68bff580 00007ff6`ae8576ce     System_Runtime_InteropServices_Tests!PromoteRefCounted+0x98 [D:\a\_work\1\s\src\coreclr\gc\objecthandle.cpp @ 94] 
08 000000ef`68bff5c0 00007ff6`ae8571b4     System_Runtime_InteropServices_Tests!ScanConsecutiveHandlesWithoutUserData+0x5e [D:\a\_work\1\s\src\coreclr\gc\handletablescan.cpp @ 446] 
09 000000ef`68bff600 00007ff6`ae857944     System_Runtime_InteropServices_Tests!BlockScanBlocksWithoutUserData+0x44 [D:\a\_work\1\s\src\coreclr\gc\handletablescan.cpp @ 557] 
0a (Inline Function) --------`--------     System_Runtime_InteropServices_Tests!SegmentScanByTypeChain+0x8b [D:\a\_work\1\s\src\coreclr\gc\handletablescan.cpp @ 1599] 
0b 000000ef`68bff630 00007ff6`ae81bb4c     System_Runtime_InteropServices_Tests!TableScanHandles+0x214 [D:\a\_work\1\s\src\coreclr\gc\handletablescan.cpp @ 1716] 
0c 000000ef`68bff710 00007ff6`ae8127f3     System_Runtime_InteropServices_Tests!HndScanHandlesForGC+0x1ac [D:\a\_work\1\s\src\coreclr\gc\handletable.cpp @ 790] 
0d 000000ef`68bff7d0 00007ff6`ae8582fb     System_Runtime_InteropServices_Tests!Ref_TraceNormalRoots+0x223 [D:\a\_work\1\s\src\coreclr\gc\objecthandle.cpp @ 1086] 
0e 000000ef`68bff890 00007ff6`ae829a4f     System_Runtime_InteropServices_Tests!GCScan::GcScanHandles+0x6b [D:\a\_work\1\s\src\coreclr\gc\gcscan.cpp @ 168] 
0f 000000ef`68bff8c0 00007ff6`ae837f8c     System_Runtime_InteropServices_Tests!WKS::gc_heap::background_mark_phase+0x94f [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 35718] 
10 000000ef`68bffa40 00007ff6`ae82c8e1     System_Runtime_InteropServices_Tests!WKS::gc_heap::gc1+0x1cc [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 21343] 
11 000000ef`68bffb00 00007ff6`ae80bcc0     System_Runtime_InteropServices_Tests!WKS::gc_heap::bgc_thread_function+0xd1 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 36737] 
12 000000ef`68bffb30 00007ffd`49624de0     System_Runtime_InteropServices_Tests!<lambda_50746ab14019c8a71cb250fc2876ed26>::operator()+0xb0 [D:\a\_work\1\s\src\coreclr\nativeaot\Runtime\gcrhenv.cpp @ 1284] 
13 000000ef`68bffb60 00007ffd`4a85e40b     kernel32!BaseThreadInitThunk+0x10
14 000000ef`68bffb90 00000000`00000000     ntdll!RtlUserThreadStart+0x2b

Does it look familiar to anyone? Cc @dotnet/ilc-contrib

@AustinWise
Copy link
Contributor Author

This is a code path that was recently added in #85087 . Before this change the GC restricted callout code for ref counted handles was not being used (Obective-C interop has its own way of calling managed code from the GC).

@jkoritzinsky
Copy link
Member

@MichalStrehovsky is there a special attribute we need to put on a method other than/in addition to UnmanagedCallersOnly for GC callouts?

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky is there a special attribute we need to put on a method other than/in addition to UnmanagedCallersOnly for GC callouts?

Do you want to suppress the reverse p/invoke transition? Putting the SuppressGCTransition callconv on it might do it if RyuJIT respects that.

@jkotas
Copy link
Member

jkotas commented Apr 26, 2023

It is a bug in the ReversePInvoke transition. The ReversePInvoke transition should be no-op in DoNotTriggerGc region. The whole purpose of DoNotTriggerGc regions is to turn the ReversePInvoke transition into no-op that is not quite happening here.

@jkoritzinsky
Copy link
Member

Opened an issue to track the problem at #85377

@VSadov
Copy link
Member

VSadov commented Apr 26, 2023

It is a bug in the ReversePInvoke transition. The ReversePInvoke transition should be no-op in DoNotTriggerGc region.

In other places we assume that DoNotTriggerGc is not allowed to reverse pinvoke

// If the thread is currently in the "do not trigger GC" mode, we must not allocate, we must not reverse pinvoke, or

An interesting issue when a GC callout needs to call native code.

@jkotas
Copy link
Member

jkotas commented Apr 26, 2023

This comment is wrong. Reverse PInvoke was always allowed in "do not trigger GC" regions.

@VSadov
Copy link
Member

VSadov commented Apr 26, 2023

DoNotTriggerGc running in preemptive mode is a bit of a strange thing. It could only work when this is a thread doing GC.

We use DoNotTriggerGc for two purposes:

  • to prevent GC when a thread temporarily can't report its stack roots. It happens in exceptions handling.
    This case must not leave cooperative mode.
  • to not trigger another GC. It happens in GC callouts.
    This can call native code. For this case transition to/from native code is essentially meaningless. We are running GC code and will not report current frames. Setting/clearing transition frames would be redundant, but it is easier to just do that, for consistency.

The end result is the same - we should fix reverse prinvoke to allow GC threads to return without waiting for GC, instead of asserting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Interop-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants