Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type key hash tweaks. #61234

Merged
merged 7 commits into from
Nov 8, 2021
Merged

Type key hash tweaks. #61234

merged 7 commits into from
Nov 8, 2021

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Nov 5, 2021

TL;DR:
This change considerably reduces how often we need to compare types to resolve hash collisions in PendingTypeLoadTable and EETypeHashTable dictionaries.

More details:

While working on hackathon project I noticed that HashKey::ComputeHash is not the best. - It mixes into the hash the number of generic arguments, which does not differ between instantiations of the same type and as such does not improve uniqueness of the hash, while it does not mix the arguments themselves, meaning all instantiations of a given type will have the same hash.

While looking at how to improve HashKey::ComputeHash, I've realized that only PendingTypeLoadTable uses that and otherwise we use HashTypeKey(), which is a better hash function. So I switched PendingTypeLoadTable to use HashTypeKey as well.

This reduced the number of collisions in PendingTypeLoadTable:
HelloWorld: 190 -> 154 (20% reduction in something that is not using generics much)
System.Linq.Expressions.Tests: ~5000 -> ~2900 (number varies between runs, but I see about 40% reduction)


I have also noticed that the HashTypeKey() computes hash of instantiated types by recursively hashing 2 levels of typedefs, which still can easily cause collisions if instantiation differences are one level lower. Considering that typehandles of type arguments are unique, simply hashing type argument pointers would produce much better hash.
It would be cheaper too. Hashing a linearly laid out sequence of pointers would touch a lot less memory than recursive walk through their parts.

With this change I see the number of collisions in PendingTypeLoadTable:
HelloWorld: 0 collisions (190 -> 0 overall reduction.)
System.Linq.Expressions.Tests: ~300 collisions. (5000 -> 300, ~95% reduction overall)


Out of curiosity I have instrumented the EETypeHashTable::FindItem - to see how the change impacts collision rate on the read path.

On System.Linq.Expressions.Tests I see:
before change: ~500000 - 700000 collisions.
after change: 0 - 5 collisions


NOTE: EETypeHashTable has 2 levels of collision resolution.
There is an upper level (in FindItem) that deals with poor hashing when different TypeKeys get the same hashcode - these are relatively expensive since we resolve them by comparing the type's constituent parts - is it the same module, the same number of type args, is it the same definition, have actually the same typeargs, etc... This is the kind of resolution that was reduced in this change up to 100000x times.

There is a lower level (BaseFindFirstEntryByHash, BaseFindNextEntryByHash) that deals with bucketization collisions that must happen when int32 hash is mapped to a smaller number of buckets (the table uses a typical "mod prime" hash reducer). I see roughly the same number of hash comparisons before or after this change, as expected, since the change does not change the bucketing strategy.

@VSadov VSadov marked this pull request as ready for review November 5, 2021 18:09
@VSadov
Copy link
Member Author

VSadov commented Nov 5, 2021

CC: @davidwrighton

@VSadov
Copy link
Member Author

VSadov commented Nov 8, 2021

@jkotas - maybe you can take a look at this PR too? (also related to dictionaries in the loader)

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -166,7 +166,7 @@ class PendingTypeLoadEntry
}
#endif //DACCESS_COMPILE

TypeKey GetTypeKey()
TypeKey& GetTypeKey()
Copy link
Member Author

@VSadov VSadov Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC wanted & here, since we pass the result by reference. Not sure how it worked with MSVC.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants