Speed up KeyAnalyzer for substring based frozen collections #89863

IDisposable · 2023-08-02T16:58:21Z

Building on PR #89689, I tried making everything as explicitly inline-able and sealed to get the JIT to eliminate any conditional jumps in the hot loop by teasing out the various comparator options into trait-like classes to mix in to get sealed classes where all the logic in conditional-free (down to the internals of the String.AsSpan() methods themselves.

This required making some complex extractions. The essence is that now we construct an equality comparer and hash set for the from-the-left attempt, and then lazily construct another comparer and matching HashSet<string> only if we need one for from-the-right attempt. Since in the vast number of cases, we're going to spend the bulk of the execution time iterating through the uniqueStrings we're presented with, the overhead of having another comparer/collection is pretty low in comparison.

To ensure we don't bloat memory, we now .Clear() the set just before we return false; from HasSufficientUniquenessFactor since we know that we're going to get called again for a different index/count... this means that both HashSet<string>s spend their lifetimes empty except in mid-check (we could clear them on success, as well, but the old code didn't bother).

The resulting SubstringEqualityComparer classes could be useful elsewhere, but are currently just internal to the System.Collections.Immutable assembly. We could easily extend them to also support the InvariantCulture and CurrentCulture options, but not sure when we would want to use either (it wouldn't make much sense for a FrozenDictionary/FrozenSet, but there could be other use-cases).

Hoisted the calculation of the maximum offset per-pass up to the top of the count (length loop), because that calculation applies to both left-justified and right-justified, and also doesn't change during the inner loops.

The comparers expose an interface that internally manipulates the starting-index and count in such a way that it gets in-lined completely. Basically you tell it where to Start() when you change the count (width) of the slice, such that when doing a pass of count width in a from-the-left pass, you start with index of zero. When doing a from-the-right pass, you start with the negative count offset (e.g. as far right as will accommodate the slice width) . Once you start, you can GoLeft() or GoRight() to bump the index appropriately (without field/property calls).

The implementation uses:

Static "mix-in" Slicer methods for LeftSlice and RightSlice to extract the slice's ReadOnlySpan<char>
Static "mix-in" OrdinalEquality for Equals and GetHashCode operations to use when implementing IEqualityComparer<string> in an ordinal case-sensitive way
Static "mix-in" OrdinalInsensitiveEquality for Equals and GetHashCode operations to use when implementing IEqualityComparer<string> in an ordinal case-insensitive way
Sealed concrete encapsulations of the mix-ins for LeftSubstringOrdinalComparer, RightSubstringOrdinalComparer, LeftSubstringCaseInsensitiveComparer, and RightSubstringCaseInsensitiveComparer
Sealed concrete FullStringComparer for use when we decide that we aren't going to use substrings at all.
Use of the IGenericSpecializedWrapper / GSW pattern to "pass in" the specialization to eliminate all virtcalls and can be completely inlined
Aggressive inline of the slicers/comparers via liberal application of [MethodImpl(MethodImplOptions.AggressiveInlining)]

I suspect we could pass the chosen slicer out in the AnalysisResults as it encapsulates the offset, length, and ignore/honor case for use in the resulting Frozen collection constructions, but that would be another pass.

I also added some more tests to the KeyAnalyserTests for ContainsAnyLetters and HasSufficientUniquenessFactor to verify newly exposed behavior and also added some testing based on the known instances of freezable data that is being used in the FrozenFromKnownValuesTests.

The other changes imported from PR #88516 here are:

Eliminate the the inner TryUseSubstring because we can just early return the calculated results as we build them
Hoist the calculation of acceptableNonUniqueCount out to the top level since it never changes (which means we pass that into the HasSufficientUniquenessFactor method for it to "use up" internally (passed by value, so unchanged at call-site)
Eliminated the delegate ReadOnlySpan<char> GetSpan and use, which helps reduce dynamic dispatch overhead in the CreateAnalysisResults method
Eliminated the IsLeft field of the SubstringComparer since we can tell by the Index being negative that we're doing right-justified slicing (and documented that on the class)
Changed the logic managing the Index and Count on the comparer for right-justified substrings.
Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to the Equals and GetHashCode overrides.

The slicing is really only changed once per Count, so move the IsLeft-dependent logic into `Slicer` method and eliminate the `GetSpan` delegate. Changed to also pass the already-computed `set` of unique substrings to the `CreateAnalysisResults` method, so we don't recompute the slices twice. In order than either the set or the original `uniqueStrings` can be passed, swapped that argument for the `Analyze` method to take the `uniqueStrings` as a `string[]` (which it already is at all call-sites).

Subtle bug in that the entire string is being placed in the set, not the span.

Since we are working with the same set of input strings in each strategy there's no reason to take the length every time we make an attempt (per count, both left and right justified). Hoist the calculation of the acceptable number of collisions out to the top, do it once, and pass that number into the `HasSufficientUniquenessFactor` method for it to (locally) use up.

Benchmarks ever so slightly better.

Looks like the overhead of IEnumerable<string> is not worth the savings for the benchmark test data. Perhaps it would matter less if we were freezing more strings, but not likely

This will cost one extra `HashSet<string>` and one extra `SubstringComparer` to be allocated, but might make the code run faster. Use the GSW strategy for virtual flattening

Replace ISubstringComparer with ISubstringEqualityComparer as that's more indicative of the interface it's exposing, and rename everything derived from that. Fixed the implementation of FullStringEqualityComparer to not actually use the slice ReadOnlySpan(s) in the Equals and GetHashCode specializations. Added comments for the SubstringEquality classes Renamed the files for the specialization of the comparers. Fixed the unit tests under debug builds. Extended the test suite data for the KeyAnalyzer tests to exercise using the data in FrozenFromKnownValuesTests.

This allows ALL the virtualization to be discarded and inlined by restore the AggressiveInlining hint to JIT and using mix-ins.

ghost · 2023-08-02T16:58:38Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Building on PR #88516, I tried making everything as explicitly inline-able and sealed to get the JIT to eliminate any conditional jumps in the hot loop by teasing out the various comparator options into trait-like classes to mix in to get sealed classes where all the logic in conditional-free (down to the internals of the String.AsSpan() methods themselves.

This required making some complex extraction the essence is that now we construct an equality comparer and hash set for the from-the-left attempt, and then lazily construct another comparer and matching hash set only if we need one for from-the-right attempt. Since in the vast number of cases, we're going to spend the bulk of the execution time iterating through the uniqueStrings we're presented with, the overhead of having another comparer/collection is pretty low in comparison.

To ensure we don't bloat memory, we now .Clear() the set to right before we return false; from HasSufficientUniquenessFactor since we know that we're going to get called again for a different index/count... this means that both HashSet<string>s spend their lifetimes empty except in mid-check (we could clear them on success, as well, but the old code didn't bother).

The resulting SubstringEqualityComparer classes could be useful elsewhere, but are currently just internal to the System.Collections.Immutable assembly. We could easily extend them to also support the InvariantCulture and CurrentCulture options, but not sure when we would want to use either (it wouldn't make much sense for a FrozenDictionary/FrozenSet, but there could be other use-cases).

Also hoisted the calculation of the maximum offset per-pass up to the top of the count (length loop), because that calculation applies to both left-justified and right-justified, and also doesn't change during the inner loops.

The comparers expose an interface that internally manipulates the starting-index and count in such a way that it gets in-lined completely with no virtcalls. Basically you tell it where to Start() when you change the count (width) of the slice, such that when doing a pass of count width in a from-the-left pass, you start with index of zero. When doing a from-the-right pass, you start with the negative count offset (e.g. as far right as will accommodate the slice width) . Once you start, you can GoLeft() or GoRight() to bump the index appropriately (without field/property virtcalls).

The implementation uses:

Static "mix-in" Slicer methods for LeftSlice and RightSlice to extract the slice's ReadOnlySpan<char>
Static "mix-in" OrdinalEquality for Equals and GetHashCode operations to use when implementing IEqualityComparer<string> in an ordinal case-sensitive way
Static "mix-in" OrdinalInsensitiveEquality for Equals and GetHashCode operations to use when implementing IEqualityComparer<string> in an ordinal case-insensitive way
Sealed concrete encapsulations of the mix-ins for LeftSubstringOrdinalComparer, RightSubstringOrdinalComparer, LeftSubstringCaseInsensitiveComparer, and RightSubstringCaseInsensitiveComparer
Sealed concrete FullStringComparer for use when we decide that we aren't going to use substrings at all.
Use of the IGenericSpecializedWrapper / GSW pattern to ensure all calls are de-virtualized and can be completely inlined
Aggressive inline of the slicers/comparers via liberal application of [MethodImpl(MethodImplOptions.AggressiveInlining)]

I suspect we could/should pass them out in the AnalysisResults and use the chosen actual extractor as it already has the offset, length, and ignore/honor case encapsulated, but that would be another pass. If we do that, I would like to add some tests for those classes first.

I also added some more tests to the KeyAnalyserTests for ContainsAnyLetters and HasSufficientUniquenessFactor to verify newly exposed behavior and also added some testing based on the known instances of freezable data that is being used in the FrozenFromKnownValuesTests.

The other changes imported from PR #88516 here are:

Eliminate the the inner TryUseSubstring because we can just early return the calculated results as we build them
Hoist the calculation of acceptableNonUniqueCount out to the top level since it never changes (which means we pass that into the HasSufficientUniquenessFactor method for it to "use up" internally (passed by value, so unchanged at call-site)
Eliminated the delegate ReadOnlySpan<char> GetSpan and use, which helps reduce dynamic dispatch overhead in the CreateAnalysisResults method
Eliminated the IsLeft field of the SubstringComparer since we can tell by the Index being negative that we're doing right-justified slicing (and documented that on the class)
Changed the logic managing the Index and Count on the comparer for right-justified substrings.
Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to the Equals and GetHashCode overrides.

Author:	IDisposable
Assignees:	-
Labels:	`area-System.Collections`, `community-contribution`
Milestone:	-

IDisposable · 2023-08-02T16:59:21Z

This should supersede both PR #89689 and PR #88516

IDisposable · 2023-08-02T17:27:06Z

SharpLab with all the classes combined so you can see the resulting IL/JIT here

IDisposable · 2023-08-08T17:16:01Z

@adamsitnik, just to level-set... this is likely barely worth all the changes if we're only trying to speed up KeyAnalyzer itself... but if this is a gateway toward eventually exposing the higher-performance slicing comparers (based on ReadOnlySpan<char>) then it might be worth landing. It would also be useful to expose eventually the Slicer in the AnalyzeResults for the frozen collections to use as that would eliminate the delegates there, but that's not in the scope of this PR yet.

It does seem to net a 1-3% improvement against baseline for non-trivial sets, and the JITted code is very compact despite all the in-lining and is branchless beyond null-checks and array bounds length checks that are part of the inlined String.AsSpan() code.

If desired, I can tease out the unit tests that I added (which are higher value) and the more subtle changes I made along the way to make a simpler version of this.

Additionally, I think it might be worth discussing in a distinct issue the idea of making the String.AsSpan(start, length) understand negative offsets internally to eliminate a ton of this code, but that's a public API change as right now passing negative values will throw.

stephentoub · 2023-08-09T16:04:09Z

Thanks, @IDisposable. Regarding "but if this is a gateway toward eventually exposing the higher-performance slicing comparers", we don't have any plans or intention to do so.

IDisposable · 2023-08-09T17:33:22Z

Thanks @stephentoub!

Regarding the other points:

If desired, I can tease out the unit tests that I added (which are higher value) and the more subtle changes I made along the way to make a simpler version of this.

This would be limited to the smaller changes like hoisting the constant calculations exposing the internals for testing, and the tests and possibly going back to my original attempt
UPDATE: created a PR with the unit-tests and trivial hoistings #90301

Additionally, I think it might be worth discussing in a distinct issue the idea of making the String.AsSpan(start, length) understand negative offsets internally to eliminate a ton of this code, but that's a public API change as right now passing negative values will throw.

Any idea if this would be something folks would like to see done? It would be nice to not have to put in conditionals in the first place and leverage the internal knowledge. The current AsSpan merely throws when confronted with negative start index, so it would be a "light breaking" change. The concept would be that "foo".AsSpan(-2, 2) would return a ReadOnlySpan<char> pointing at the last two characters from the string (e.g. the ['o','o']), and "zot".AsSpan(-3,1) -> a ReadOnlySpan<char> pointing at ['z']. This would be tweaking code for MemoryExtensions and the ReadOnlySpan<T> constructor

eiriktsarpalis · 2023-08-09T18:08:06Z

Let's revisit this in .NET 9. We're very close to concluding .NET 8 development and making performance sensitive changes can be risky at this stage.

IDisposable · 2023-08-09T18:09:18Z

Makes perfect sense! I will put up a PR for just the additional unit tests as that would be nice :)

IDisposable · 2023-08-10T09:11:23Z

Extracted just the unit test changes and a couple calculation hoists into PR #90301

eiriktsarpalis · 2023-08-10T09:39:42Z

Given #90301 could you close this PR? You can always rebase and reopen it if and when the other one gets merged.

IDisposable · 2023-08-10T09:50:44Z

Sure!

IDisposable added 8 commits July 29, 2023 02:16

Revert one optimization

c8c8801

Subtle bug in that the entire string is being placed in the set, not the span.

Back to string[] for HasSufficientUniquenessFactor

860ebb3

Benchmarks ever so slightly better.

Back to ReadOnlySpan

c1dd5d9

Looks like the overhead of IEnumerable<string> is not worth the savings for the benchmark test data. Perhaps it would matter less if we were freezing more strings, but not likely

Use trait-based classes to get everything inlined

7060c23

This will cost one extra `HashSet<string>` and one extra `SubstringComparer` to be allocated, but might make the code run faster. Use the GSW strategy for virtual flattening

Switch to a cursor-based design for the slicers

c599a6c

This allows ALL the virtualization to be discarded and inlined by restore the AggressiveInlining hint to JIT and using mix-ins.

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 2, 2023

dotnet-issue-labeler bot added the area-System.Collections label Aug 2, 2023

IDisposable mentioned this pull request Aug 2, 2023

Speed up KeyAnalyzer for substring based frozen collections #89689

Closed

IDisposable changed the title ~~Speed up KeyAnalyzer~~ Speed up KeyAnalyzer for substring based frozen collections Aug 2, 2023

build-analysis bot mentioned this pull request Aug 2, 2023

Assert in System.Net.Sockets.SocketAsyncEventArgs.CompleteAcceptOperation #89806

Closed

stephentoub requested a review from adamsitnik August 4, 2023 03:58

eiriktsarpalis added this to the 9.0.0 milestone Aug 9, 2023

IDisposable mentioned this pull request Aug 10, 2023

KeyAnalyzer tests. #90301

Merged

eiriktsarpalis closed this Aug 10, 2023

ghost locked as resolved and limited conversation to collaborators Sep 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up KeyAnalyzer for substring based frozen collections #89863

Speed up KeyAnalyzer for substring based frozen collections #89863

IDisposable commented Aug 2, 2023 •

edited

Loading

ghost commented Aug 2, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 8, 2023 •

edited

Loading

stephentoub commented Aug 9, 2023

IDisposable commented Aug 9, 2023 •

edited

Loading

eiriktsarpalis commented Aug 9, 2023

IDisposable commented Aug 9, 2023 •

edited

Loading

IDisposable commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

IDisposable commented Aug 10, 2023

Speed up KeyAnalyzer for substring based frozen collections #89863

Speed up KeyAnalyzer for substring based frozen collections #89863

Conversation

IDisposable commented Aug 2, 2023 • edited Loading

ghost commented Aug 2, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 2, 2023

IDisposable commented Aug 8, 2023 • edited Loading

stephentoub commented Aug 9, 2023

IDisposable commented Aug 9, 2023 • edited Loading

eiriktsarpalis commented Aug 9, 2023

IDisposable commented Aug 9, 2023 • edited Loading

IDisposable commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

IDisposable commented Aug 10, 2023

IDisposable commented Aug 2, 2023 •

edited

Loading

IDisposable commented Aug 8, 2023 •

edited

Loading

IDisposable commented Aug 9, 2023 •

edited

Loading

IDisposable commented Aug 9, 2023 •

edited

Loading