Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC: Fix CheckPromoted for frozen objects #76251

Merged
merged 18 commits into from
Oct 4, 2022
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 27, 2022

Fixes #76219

The problem is - when GC scans SyncBlocks in GCWeakPtrScanElement and meets a SyncBlock holding a weak pointer to a frozen object (RuntimeType in my case) it invokes this handler which is effectively CheckPromoted where that weak ref is recognized as unreachable because it doesn't pass if (!g_theGCHeap->IsPromoted(*pRef)) so the syncblock is cleared/removed/reused.

cc @jkotas @cshung @Maoni0

Easily reproduces on this small snippet on Windows-x86 Checked - #76219 (comment)
(not on the latest Main since frozen type objects were just reverted in #76235 because of this issue)

@ghost
Copy link

ghost commented Sep 27, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #76219

The problem is - when GC scans SyncBlocks in GCWeakPtrScanElement and meets a SyncBlock holding a weak pointer to a frozen object (RuntimeType in my case) it invokes this handler which is effectively CheckPromoted where that weak ref is recognized as unreachable because it doesn't pass if (!g_theGCHeap->IsPromoted(*pRef)) so the syncblock is cleared/removed/reused.

cc @jkotas @cshung @Maoni0

Author: EgorBo
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

@jkotas jkotas requested a review from Maoni0 September 27, 2022 15:46
@EgorBo
Copy link
Member Author

EgorBo commented Sep 27, 2022

Ah, I assume it either has to be moved to gc.cpp to bool GCHeap::IsPromoted(Object* object) or we can see if simply adding | 1 bit to a ref in SyncBlock for frozen objects on creating is enough (basically making it a strong ref if I understand it correctly)? I feel like the first option is safer?

@jkotas
Copy link
Member

jkotas commented Sep 27, 2022

I am not sure about the best place to add this check. I would like to hear @Maoni0's opinion.

I suspect that we may have similar problem for weak handles and dependent handles that will also lead to bad failure modes.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 27, 2022

Looks similar to my previous change in gc.cpp - https://github.com/dotnet/runtime/pull/73110/files

@EgorBo
Copy link
Member Author

EgorBo commented Sep 27, 2022

@jkotas, had a chat with @Maoni0 and I have an impression that the optimization that we decided to use - avoid managed pinned handles (for both RuntimeType and string literals) is the reason. Previously, all RuntimeType objects were handled by that pinned table so RuntimeType objects were always marked while now e.g. RuntimeType can be completely unused for some period of time and then used again (with header containing out-of-date syncblock index)

@EgorBo
Copy link
Member Author

EgorBo commented Sep 27, 2022

Repro with strings:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Runtime.CompilerServices;

public class Class1
{
    public static void Main()
    {
        Test();
        // "hello" is unused now, SyncBlock is cleared
        GC.Collect();

        // Now we need "hello" again!
        Test();
    }

    static void Test()
    {
        Say("hello");
    }
    static void Say(string str)
    {
        lock (str)
        {
            Console.WriteLine(str);
        }
    }
}

Didn't test it with Regions yet, I am still on x86 build with old gc.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 27, 2022

Possible solutions (after some chatting with Maoni):

  1. Clear ObjectHeader once we collect SyncBlocks for frozen objects - it might work but it likely won't help with other potential problems like WeakReference to a RuntimeType object that goes unused for some period of time (and then used again)
  2. Bring back pinned handles for FOH objects - it will also slightly simplify code, e.g. for string literals we had to do some additional changes to have unmanaged handles.
  3. Same as 2) but having a pinned table (or registry or whatever name) in FOH itself and it will be rooted by m_pPinnedHeapHandleTable - more work than 2) the good news is that we won't have to care about slot reusing, only adding new ones since FOH objects live forever.

@EgorBo EgorBo marked this pull request as draft September 27, 2022 21:33
@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

Implemented 3rd option

@EgorBo EgorBo marked this pull request as ready for review September 28, 2022 00:03
@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cshung
Copy link
Member

cshung commented Sep 28, 2022

I am not sure I understand, but are we saying we are going to mark/unmark all frozen objects for every GC as if they are roots?

@jkotas
Copy link
Member

jkotas commented Sep 28, 2022

The fix for this problem should also cover objects from file-mapped frozen heaps. File-mapped frozen heaps do not have GC handles or an extra array pointing to the frozen heap objects.

It looks like the whole point of registry that you have implemented is to mark every single object on FOH. It is cheaper to just walk FOH and mark everything on it directly. We do not need an extra array to do that. What prevents us from doing that if we believe that marking all objects on FOH during the GC is the right tradeoff?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

I am not sure I understand, but are we saying we are going to mark/unmark all frozen objects for every GC as if they are roots?

Is there a better solution? We used to mark/unmark the same objects before FOH too so the main savings here are jit that is able to bake direct references to various objects in codegen + omit write barriers barriers when we assign them to fields/arrays. Also, we allocate less of pinned handles in VM.

My initial fix was IsFrozen check in gc.cpp's bool GCHeap::IsPromoted(Object* object) but after a chat with @Maoni0 I changed it to a safer option - root all FOH objects.

Basically, I want to fix at least two cases:

  1. SyncBlocks for FOH objects should not be ever cleared (or FOH Objects' Headers have to be reset too)
  2. WeakReference pointing to a FOH object should never report it as dead even if nobody uses given FOH object at a given time.
    Same for dependent handlers I assume.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

It looks like the whole point of registry that you have implemented is to mark every single object on FOH. It is cheaper to just walk FOH and mark everything on it directly. We do not need an extra array to do that. What prevents us from doing that if we believe that marking all objects on FOH during the GC is the right tradeoff?

Yes, that's the point. What you suggest makes sense to me but I am not sure I understand how to do it, I assume somewhere in gc.cpp at mark phase GC should just walk all objects in FOH and mark (and then unmark) them, right?

@Maoni0
Copy link
Member

Maoni0 commented Sep 28, 2022

no, we do not mark every object on FOH. they are only marked (if anyone refers to them) on the FOH segments that are in range. this naturally happens with GC's mark process.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

no, we do not mark every object on FOH. they are only marked (if anyone refers to them) on the FOH segments that are in range. this naturally happens with GC's mark process.

Right, but previously all the objects we moved now to FOH were always marked because they were referenced by some special array objects in VM. Now these objects are not handled by any special internal arrays/handlers, and instead are saved to some unmanaged data structures, e.g. string literal hash table or size_t ExposedManagedObject field in MethodTable so they're not marked any more in some cases while technically still alive and can be used again

@cshung
Copy link
Member

cshung commented Sep 28, 2022

Oh, this is a segment-only problem? Frozen segments are always out of range for regions?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

Oh, this is a segment-only problem? Frozen segments are always out of range for regions?

I might be wrong but the problem only reproduces with segments yes, the gcstress crash we had + winforms issue are both x86 where regions are not enabled

@cshung
Copy link
Member

cshung commented Sep 28, 2022

Just in case it is a segment-only issue, we can potentially optimize away all these in case we are running under regions, which is going to be the default anyway. Setting the mark bit or not, having the mark_phase to spend time proportional to the number of frozen objects (just because of sync block and weak references) is unfortunate.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

@cshung @Maoni0 so when you have time could you please review my gc.cpp changes? thanks in advance

@Maoni0
Copy link
Member

Maoni0 commented Sep 28, 2022

let's have a quick meeting to discuss this and get to a final solution instead of trying various things.

@cshung
Copy link
Member

cshung commented Sep 28, 2022

I think the original intent and promise for frozen segments is that these objects are NOT managed by the GC at all, we just did minimal work to make the runtime happy with that (e.g. the Write Barrier won't crash, we don't accidentally leave mark bits, ...). It would be unfortunate if we break that intent and spent time proportional to the number of frozen objects per GC.

In the abstract, the bug is that the GC is inadvertently freeing the "associated resources" of a frozen object if it is "in range". A natural way of fixing this should be to avoid that from happening.

The "associated resource" in this particular case is the sync block, but it could also be weak references or dependent handle (or others).

The problem with fixing this directly is that it might be expensive to check if the "associated resource" belongs to a frozen object or not. This issue could be side-stepped if the "associated resource" were labeled as such.

With some thought, it is not hard to see the labeling must be done by the VM. (It would be meaningless if the GC needs to run the expensive check during GC to determine the label).

If we were changing the VM anyway, we also have the opportunity to seggregate those resources so that they are not presented to the GC at all. Suppose when the sync-block were created, we knew they belong to frozen objects, and so we store them somewhere else, then the GC wouldn't be looping through them, truly achieving frozen objects has zero GC cost.

This is assuming we don't create those "associated resources" too often, which may or may not be true.

Weak Reference and Dependent Handle are particularly easy to handle if we were to change the VM. We can simply replace that with a strong handle underneath when we detected it is a frozen object, and we are done. A dependent handle depending on a frozen object is just really dumb and costly.

@jkotas
Copy link
Member

jkotas commented Sep 28, 2022

It would be unfortunate if we break that intent and spent time proportional to the number of frozen objects per GC.

That intent is broken already (see the existing seg_clear_mark_bits method). You are spending time proportional to the number of (in-range) frozen objects per full GC already. Fortunately, this happens rarely with segments or never with regions. I do not think that it should be a goal for frozen objects to have zero GC cost.

Suppose when the sync-block were created, we knew they belong to frozen objects, and so we store them somewhere else, then the GC wouldn't be looping through them, truly achieving frozen objects has zero GC cost.

It is still valuable to free the non-precious syncblocks, even when get attached to the frozen object. I do not see a problem here.

A dependent handle depending on a frozen object is just really dumb and costly.

Well, we would need to introduce new types of handles to implement your idea. The new type of handle is somewhat expensive too. It would be worth it only if the dependent handles pointing to frozen objects are very common. I am pretty sure that it is not the case. I believe that having a few dependent handles pointing to frozen objects is the best among the available options.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 28, 2022

So presumably the question is where to fix the problem: on GC side or VM (two VMs) side

It is still valuable to free the non-precious syncblocks, even when get attached to the frozen object. I do not see a problem here.

Do I understand you correctly that you want to leave everything as is (as of Main) and fix the problem on VM side e.g.:

  1. Clear object's header when we recycle a syncblock for FOH objects
  2. Somehow make all WeakReferences to frozen objects strong ones
  3. Same for dependent handles?
  4. If we introduce a new entity in VM - will we catch potential bugs if we forget to handle it for FOH?
  5. Do all of the above for NativeAOT (besides SyncBlocks which don't exist there)

At least this is how I understand the desire to collect unused SyncBlocks for frozen objects.

PS: SyncBlocks should be rare beasts for frozen objects, right? I mean the general pattern we promote is to have a static readonly object SyncObject instead of locking on typeof/string literal. Not sure how popular MethodImplOptions.Synchronized is

@jkotas
Copy link
Member

jkotas commented Sep 28, 2022

The change you have right should allow the non-precious syncblocks to be collected.

I like the change you have in the PR right now. I think it is the best path to fix this issue.

@jkotas
Copy link
Member

jkotas commented Sep 28, 2022

SyncBlocks should be rare beasts for frozen objects, right?

Right. SyncBlocks are rare in general.

The non-precious syncblocks handle the case where you have a little bit of contention on a long-lived instance, syncblock gets created and then never used again. We do not want to be stuck with this syncblock forever. We want to be able to free it while the long-lived instance is still alive.

Maoni0 and others added 3 commits October 2, 2022 15:30
+ moved where we are calling mark_ro_segments to the right places -
  for BGC this needs to be in the 2nd non concurrent phase)
  for blocking GCs I just moved it to a more appropriate place
+ for full blocking GCs it's not enough to do mark_ro_segments when gen start seg is not ephemeral.
when we are doing a gen2 GC, even if we have acquired a new seg for that heap, we still need
to mark all in range ro segs because our logic in IsPromoted expects when there's in range ro
segs we need the mark bit to tell us these ro objects are marked.
+ got rid of some unnecessary checks

Note that I do make the assumption that ro segs are always threaded at the beginning of gen2. We
actually always thread these into heap 0's gen2 seg list but I can see a chance of that changing,
if we want a better balancing of the mark_ro_segments work.
@EgorBo
Copy link
Member Author

EgorBo commented Oct 3, 2022

Thanks for the pushed changes! Looks like there are two test failures I am able to reproduce locally - a frozen string is not marked when IsPromoted is called, here is the trace:

 	coreclr.dll!DbgAssertDialog(const char * szFile, int iLine, const char * szExpr) Line 461	C++
>	coreclr.dll!WKS::GCHeap::IsPromoted(Object * object) Line 45284	C++
 	coreclr.dll!CheckPromoted(Object * * pObjRef, unsigned int * pExtraInfo, unsigned int lp1, unsigned int lp2) Line 325	C++
 	coreclr.dll!ScanConsecutiveHandlesWithoutUserData(Object * * pValue, Object * * pLast, ScanCallbackInfo * pInfo, unsigned int * __formal) Line 446	C++
 	coreclr.dll!BlockScanBlocksWithoutUserData(TableSegment * pSegment, unsigned int uBlock, unsigned int uCount, ScanCallbackInfo * pInfo) Line 557	C++
 	coreclr.dll!SegmentScanByTypeMap(TableSegment * pSegment, const int * rgTypeInclusion, void(__stdcall*)(TableSegment *, unsigned int, unsigned int, ScanCallbackInfo *) pfnBlockHandler, ScanCallbackInfo * pInfo) Line 1666	C++
 	coreclr.dll!TableScanHandles(HandleTable * pTable, const unsigned int * puType, unsigned int uTypeCount, TableSegment *(__stdcall*)(HandleTable *, TableSegment *, Holder<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock,0,&CompareDefault<CrstBase *>,1> *) pfnSegmentIterator, void(__stdcall*)(TableSegment *, unsigned int, unsigned int, ScanCallbackInfo *) pfnBlockHandler, ScanCallbackInfo * pInfo, Holder<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock,0,&CompareDefault<CrstBase *>,1> * pCrstHolder) Line 1725	C++
 	coreclr.dll!HndScanHandlesForGC(HandleTable * hTable, void(__stdcall*)(Object * *, unsigned int *, unsigned int, unsigned int) scanProc, unsigned int param1, unsigned int param2, const unsigned int * types, unsigned int typeCount, unsigned int condemned, unsigned int maxgen, unsigned int flags) Line 790	C++
 	coreclr.dll!Ref_CheckAlive(unsigned int condemned, unsigned int maxgen, unsigned int lp1) Line 1408	C++
 	coreclr.dll!GCScan::GcShortWeakPtrScan(int condemned, int max_gen, ScanContext * sc) Line 143	C++
 	coreclr.dll!WKS::gc_heap::background_mark_phase() Line 35053	C++
 	coreclr.dll!WKS::gc_heap::gc1() Line 20981	C++
 	coreclr.dll!WKS::gc_heap::bgc_thread_function() Line 36016	C++
 	coreclr.dll!WKS::gc_heap::bgc_thread_stub(void * arg) Line 33961	C++
 	coreclr.dll!`anonymous-namespace'::CreateSuspendableThread::__l2::<lambda>(void * argument) Line 1415	C++
 	coreclr.dll!unsigned long <lambda>(void *)::<lambda_invoker_stdcall>(void * argument) Line 1419	C++
 	[External Code]	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	

ro_segments_in_range is TRUE.

What is weird that this code is executed just right after "mark all ro segments" logic - seg_set_mark_array_bits_soh but maybe it's not correct, I implemented it without knowing how to do it correctly for background gcю

@@ -11009,7 +11009,7 @@ void gc_heap::seg_set_mark_array_bits_soh (heap_segment* seg)
if (bgc_mark_array_range (seg, FALSE, &range_beg, &range_end))
{
size_t beg_word = mark_word_of (align_on_mark_word (range_beg));
size_t end_word = mark_word_of (align_on_mark_word (range_beg));
size_t end_word = mark_word_of (align_on_mark_word (range_end));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You pasted that commit message from another PR, right? :-)

@EgorBo
Copy link
Member Author

EgorBo commented Oct 3, 2022

@Maoni0 @cshung the tests seem to be passing now (except CoreCLR Product Build FreeBSD x64 release that doesn't look to be related) so please approve if it looks good (modulo the "disable gc regions" change that, as I understand it, was to trigger testing of segments on x64 -- will revert it once it's green.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 3, 2022

/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Oct 4, 2022

Outerloop failures are #76511, gcstress are pretty clean except one build error/timeout.

Reverted the test change to disable gc regions.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed what I wrote in the commit msgs I pushed slightly to reflect the final state. please use this when you merge -

+ we need to proactively go mark all the in range ro segs because these objects' life time isn't accurately
expressed. The expectation is all objects on ro segs are reported as marked. 

+ fixed an existing bug - for full blocking GCs it's not enough to handle ro segs when gen start seg is 
not ephemeral, it needs to be checking for condemned gen because you can still have ro segs in range 
even with one segment per heap.

+ got rid of some unnecessary checks. 

+ I do make the assumption that ro segs are always threaded at 
the beginning of gen2 so we can exit as soon as we see an rw seg. we actually always thread these into 
heap 0's gen2 seg list but I can see a chance of that changing, if we want a better balancing of work 
done for ro segs.

+ a bit of code refactoring since now we have more code related to ro segs.

@EgorBo EgorBo merged commit cfabadb into dotnet:main Oct 4, 2022
@EgorBo EgorBo deleted the fix-checkpromoted branch October 4, 2022 18:17
EgorBo added a commit that referenced this pull request Oct 6, 2022
…en objects" (#76649)

This PR reverts #76235 with a few manual modifications:
The main issue why the initial PRs (#75573 and #76135) were reverted has just been resolved via #76251:

GC could collect some associated (with frozen objects) objects as unreachable, e.g. it could collect a SyncBlock, WeakReferences and Dependent handles associated with frozen objects which could (e.g. for a short period of time) be indeed unreachable but return back to life after.

Co-authored-by: Jan Kotas <[email protected]>
Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests fail in x86 after the latest update with "Internal CLR error. (0x80131506)"
6 participants