Completely lock-free ClassLoader::LookupTypeKey #61346

VSadov · 2021-11-09T03:23:48Z

The goal is to simplify use of m_AvailableTypesLock

Currently m_AvailableTypesLock lock may be taken in lookups, and lookups may happen in a GC_NOTRIGGER scope.
Thus the lock is CRST_UNSAFE_ANYMODE with an additional requirement that callers, with exception of GC threads, must switch to COOP mode before using this lock - to avoid deadlocks during GC.
The requirement it fragile. Besides all other uses of m_AvailableTypesLock are in preemptive mode and it is better when preemptive mode can stay preemptive.

Once we do not need to take locks in Lookup, we can make m_AvailableTypesLock just an ordinary lock.

VSadov · 2021-11-09T16:12:34Z

@davidwrighton @jkotas - please take a look. Thanks!

VSadov · 2021-11-09T16:20:10Z

@hoyosjs - could you take a look at DAC casts in the new code?
They compile, but I am not highly confident it is all correct. A look by someone who knows that better would be helpful.

VSadov · 2021-11-09T17:13:31Z

src/coreclr/vm/clsload.cpp

-                                  CrstAvailableParamTypes,
-                                  (CrstFlags)(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD));
+                              CrstAvailableParamTypes,
+                              CRST_DEBUGGER_THREAD);


@hoyosjs - If debugger only does lookups, then it will no longer take this look, then CRST_DEBUGGER_THREAD is not needed.
Can debugger load/publish types?

I know we can't load assemblies, but it's possible that a FuncEval can load a type I believe.

Is funceval running on debugger thread?

I assumed that CRST_DEBUGGER_THREAD does not cover funceaval, since funceval could JIT and JIT does all kind of stuff (including loading assemblies). If that is attributed to the debugger thread, then debugger thread is not different from anything else.
I am not very familiar with how that all works though.

CRST_DEBUGGER_THREAD is not a big nuisance here. I was just not sure it is still necessary.

I've tried removing CRST_DEBUGGER_THREAD and running both regular and diagnostics tests (with Chk bits) - everything runs as before. Not sure if that is enough proof that CRST_DEBUGGER_THREAD is unnecessary.

I just noticed that CRST_DEBUGGER_THREAD is always paired with CRST_UNSAFE_ANYMODE (or sometimes CRST_GC_NOTRIGGER_WHEN_TAKEN), so since we are removing the other one, maybe we do not need CRST_DEBUGGER_THREAD either.

But do not think CRST_DEBUGGER_THREAD is a big nuisance either way - it basically increments/decrements a counter that is checked in couple asserts.

jkotas · 2021-11-10T08:35:33Z

src/coreclr/vm/dacenumerablehash.inl

+    //                   slot [1] will contain the next version of the table if it resizes
+    S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));
+
+    // REVIEW: I need a temp array here, relatively small (under 1K elements typically), is this the right heap for that?


You should use regular C/C++ heap. The loader heaps are not designed for allocating temporary arrays like this.

Are the tails long enough to make it worth it? Would it be better to just walk them as necessary?

I was thinking about the old table, which would be at 2x load factor at this point and that we are holding up somebody's progress (most likely JIT), so should use a tail array as a guarantee against degenerate cases.

However, these chains would be in the new table, and load factor while resizing will be 0 - 0.5, so buckets should be short. There is always a possibility of a case where a lot of items form a chain, but after tweaking the hash function in the previous change, it will be statistically rare .
Also, doubling-up resize can really happen only a few times in the life of the module, so I think the risk of causing an observable pause is low.

I will switch this to just walking.

In System.Linq.Expressions.Tests, which liberally uses generics, the max walk during individual resizes is generally 1 - 5, but every run has some outliers with max walk > 50. I think it is acceptable.

It makes me wonder though if there is a cheap way to bring the clustering further down by improving the hash.

Hmm, the outliers are all from a different hashtable - EEClassHashTable, which shares the base implementation with EETypeHashTable, but has different elements and hash function.

Its resize factor is 4x, triggered at the same 2x load factor. In theory it should have even shorter buckets at resize.
I will have to look closer.

We are hashing types by namespace/name and see a lot of types with empty namespace and names like "<>c" .

EEClassHashTable seems to be intentionally placing same-named nested classes in the same bucket.
I think walking the chain is still acceptable. - If these chains are acceptable for lookups, then they are ok for very rare 4x resizes.

EEClassHashTable could use some clean-up in a separate change.

I have a change that fixes the nested type collision issue for EEClassHashTable. It will be a separate PR.

VSadov · 2021-11-10T21:40:00Z

src/coreclr/vm/ceeload.inl

@@ -35,6 +35,7 @@ void LookupMap<TYPE>::SetValueAt(PTR_TADDR pValue, TYPE value, TADDR flags)

    value = dac_cast<TYPE>((dac_cast<TADDR>(value) | flags));

+    // REVIEW: why this is not VolatileStore? What guarantees that we do not have concurrent readers?


I think LookupMap<TYPE> can be read concurrently while writer has not left the write lock yet (so on ARM the reader may occasionally not see all the writes). I will look at this separately. Maybe it is ok for some subtle reason.

It would be a separate issue anyways

VSadov · 2021-11-10T23:55:33Z

@jkotas @hoyosjs - any more comments on this PR ?

src/coreclr/vm/dacenumerablehash.inl

PR review suggestion Co-authored-by: Jan Kotas <[email protected]>

VSadov · 2021-11-12T21:50:21Z

@jkotas @hoyosjs - any more comments on this PR ?

jkotas · 2021-11-11T00:48:31Z

src/coreclr/vm/dacenumerablehash.inl

+    DWORD cNewBuckets = NextLargestPrime(cBuckets * SCALE_FACTOR);
+    // two extra slots - slot [0] contains the length of the table,
+    //                   slot [1] will contain the next version of the table if it resizes
+    S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));


Suggested change

S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));

S_SIZE_T cbNewBuckets = (S_SIZE_T(cNewBuckets) + S_SIZE_T(2)) * S_SIZE_T(sizeof(PTR_VolatileEntry));

jkotas · 2021-11-11T00:49:46Z

src/coreclr/vm/dacenumerablehash.inl

    PTR_VolatileEntry *pNewBuckets = (PTR_VolatileEntry*)(void*)GetHeap()->AllocMem_NoThrow(cbNewBuckets);
    if (!pNewBuckets)
        return;

+    // element 0 stores the length of the table
+    ((size_t*)pNewBuckets)[0] = cNewBuckets;


Would it make sense to define constants or something for the fake 0 and 1 bucket indices to make this easier to understand?

I think making "Length" and "Next" helpers that take an array could make this more clear.

jkotas · 2021-11-11T00:53:17Z

src/coreclr/vm/dacenumerablehash.inl

@@ -129,21 +130,24 @@ void DacEnumerableHashTable<DAC_ENUM_HASH_ARGS>::BaseInsertEntry(DacEnumerableHa
    // Remember the entry hash code.
    pVolatileEntry->m_iHashValue = iHash;

-    // Compute which bucket the entry belongs in based on the hash.
-    DWORD dwBucket = iHash % m_cBuckets;
+    auto curBuckets = GetBuckets();


I think that uses of auto like this do not help with code readability.

We have rule against it in the repo C# coding conventions. We do not have explicit repo C++ coding conventions, but I think it is still to best to use explicit types when the type is not obvious from the right hand side.

jkotas · 2021-11-11T00:56:02Z

src/coreclr/vm/dacenumerablehash.inl

-
-    // Compute which bucket the entry belongs in based on the hash.
-    DWORD dwBucket = iHash % VolatileLoad(&m_cBuckets);
+    PTR_VolatileEntry* curBuckets = GetBuckets();


I do not think that this works with DAC. The cast from DPTR(PTR_VolatileEntry) to PTR_VolatileEntry* will only marshal the first array entry. This needs to pass DPTR(PTR_VolatileEntry) all the way through to the places we index into it.

hoyosjs

We should probably run this through debugger tests.

src/coreclr/vm/dacenumerablehash.inl

VSadov · 2021-11-13T02:47:35Z

src/coreclr/vm/dacenumerablehash.inl

+    VolatileLoadBarrier();
+
+    // in a case if resize is in progress, look in the new table as well.
+    auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1];


@hoyosjs - I assume this is correct? (modulo auto). In this case I want to index on the remote side. Is this the right way to do this?

Suggested change

auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1];

auto nextBuckets = dac_cast<DPTR(PTR_VolatileEntry)>(pContext->m_curTable)[1];

I made GetNext a common helper. With that it is more convenient if m_curTable is (DPTR(PTR_VolatileEntry)

src/coreclr/vm/dacenumerablehash.inl

…ded a comment about a cast.

VSadov · 2021-11-13T03:58:24Z

src/coreclr/vm/dacenumerablehash.h

+
+    static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets)
+    {
+        return (DPTR(PTR_VolatileEntry))dac_cast<TADDR>(buckets[SLOT_NEXT]);


@hoyosjs - is this correct or from TADDR it must be another dac_cast ?

Like:

static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets) { return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT])); }

If I understand this right buckets[SLOT_NEXT] is pointing to a new hash table that's an array of linked lists? Then it's

return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT]));

which can be simplified to

return dac_cast<DPTR(PTR_VolatileEntry)>(buckets[SLOT_NEXT]);

Yes. The buckets[SLOT_NEXT] will be pointing to the next larger table, if resize happened while you are doing a lookup. It will be rare. It is possible, even though extremely rare to see more than one new version.
Every new one is 2x larger than previous, so there can't be too many even in the ultimate worst case.

src/coreclr/vm/dacenumerablehash.inl

VSadov · 2021-11-13T19:20:02Z

I have run diagnostics tests with bits right before my change and after (multiple times). I do not see any regressions due to the change.

VSadov · 2021-11-13T20:21:33Z

The new test failure on Win arm64 is caused by #61548 - it is not because of these changes.

jkotas

LGTM. Thanks!

hoyosjs

DAC changes lgtm

VSadov · 2021-11-14T22:32:23Z

Thanks!!

VSadov added 6 commits November 8, 2021 18:05

embedded size

1b003e3

next array

5330325

read consistency while rehashing

c5db8c4

comments

c264f69

remove CRST_UNSAFE_ANYMODE and COOP mode in the callers

e8e7d6e

typo

b9b4813

dotnet-issue-labeler bot added the area-VM-coreclr label Nov 9, 2021

fix 32bit builds

749695a

couple changes from review.

5569c66

VSadov commented Nov 9, 2021

View reviewed changes

jkotas reviewed Nov 10, 2021

View reviewed changes

Walk the buckets in resize.

7c3cd08

VSadov commented Nov 10, 2021

View reviewed changes

remove a REVIEW: comment.

f8b1021

jkotas reviewed Nov 11, 2021

View reviewed changes

src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved

Update src/coreclr/vm/dacenumerablehash.inl

067cefb

PR review suggestion Co-authored-by: Jan Kotas <[email protected]>

jkotas reviewed Nov 12, 2021

View reviewed changes

hoyosjs reviewed Nov 13, 2021

View reviewed changes

src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved

src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved

remove use of auto

1516f6b

VSadov commented Nov 13, 2021

View reviewed changes

DAC stuff

efe6043

hoyosjs reviewed Nov 13, 2021

View reviewed changes

src/coreclr/vm/dacenumerablehash.inl Show resolved Hide resolved

Constructor and GrowTable are not called by DAC, no need for DPTR, ad…

f455581

…ded a comment about a cast.

VSadov commented Nov 13, 2021

View reviewed changes

hoyosjs reviewed Nov 13, 2021

View reviewed changes

src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved

SKIP_SPECIAL_SLOTS

e874b3c

More compact dac_cast in GetNext

c7eb7e7

jkotas approved these changes Nov 14, 2021

View reviewed changes

hoyosjs approved these changes Nov 14, 2021

View reviewed changes

VSadov merged commit 5da8f31 into dotnet:main Nov 14, 2021

VSadov deleted the locks2 branch November 14, 2021 22:32

This was referenced Nov 16, 2021

Include hashes of containing types when computing hash in EEClassHashTable #61652

Closed

A few follow up changes to LookupTypeKey change #61718

Merged

ghost locked as resolved and limited conversation to collaborators Dec 15, 2021

		@@ -35,6 +35,7 @@ void LookupMap<TYPE>::SetValueAt(PTR_TADDR pValue, TYPE value, TADDR flags)

		value = dac_cast<TYPE>((dac_cast<TADDR>(value) \| flags));

		// REVIEW: why this is not VolatileStore? What guarantees that we do not have concurrent readers?

	S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));
	S_SIZE_T cbNewBuckets = (S_SIZE_T(cNewBuckets) + S_SIZE_T(2)) * S_SIZE_T(sizeof(PTR_VolatileEntry));

	auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1];
	auto nextBuckets = dac_cast<DPTR(PTR_VolatileEntry)>(pContext->m_curTable)[1];

Completely lock-free ClassLoader::LookupTypeKey #61346

Completely lock-free ClassLoader::LookupTypeKey #61346

Conversation

VSadov commented Nov 9, 2021 • edited Loading

VSadov commented Nov 9, 2021

VSadov commented Nov 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

VSadov Nov 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

VSadov commented Nov 10, 2021

VSadov commented Nov 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hoyosjs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Nov 13, 2021 • edited Loading

Choose a reason for hiding this comment

VSadov commented Nov 13, 2021 • edited Loading

VSadov commented Nov 13, 2021

jkotas left a comment

Choose a reason for hiding this comment

hoyosjs left a comment

Choose a reason for hiding this comment

VSadov commented Nov 14, 2021

VSadov commented Nov 9, 2021 •

edited

Loading

VSadov Nov 10, 2021 •

edited

Loading

VSadov Nov 13, 2021 •

edited

Loading

VSadov Nov 10, 2021 •

edited

Loading

VSadov Nov 13, 2021 •

edited

Loading

VSadov commented Nov 13, 2021 •

edited

Loading