Add AVX2 support to IndexOfAnyValues #78863

MihaZupan · 2022-11-26T01:52:37Z

Easier to review while ignoring whitespace

This change makes IndexOfAnyValues noticeably faster than a regular IndexOf(char) without #78861 :)

Method	Toolchain	Length	Mean	Error	Ratio
IndexOfAnyAsciiChar	main	1	2.270 ns	0.0145 ns	1.00
IndexOfAnyAsciiChar	pr	1	2.279 ns	0.0204 ns	1.00

IndexOfAnyAsciiByte	main	1	1.787 ns	0.0053 ns	1.00
IndexOfAnyAsciiByte	pr	1	1.788 ns	0.0083 ns	1.00

IndexOfAnyByte	main	1	1.789 ns	0.0050 ns	1.00
IndexOfAnyByte	pr	1	1.790 ns	0.0051 ns	1.00

IndexOfAnyAsciiChar	main	16	2.784 ns	0.0247 ns	1.00
IndexOfAnyAsciiChar	pr	16	2.782 ns	0.0252 ns	1.00

IndexOfAnyAsciiByte	main	16	2.696 ns	0.0117 ns	1.00
IndexOfAnyAsciiByte	pr	16	2.364 ns	0.0084 ns	0.88

IndexOfAnyByte	main	16	3.887 ns	0.0196 ns	1.00
IndexOfAnyByte	pr	16	3.488 ns	0.0214 ns	0.90

IndexOfAnyAsciiChar	main	17	3.348 ns	0.0125 ns	1.00
IndexOfAnyAsciiChar	pr	17	3.103 ns	0.0639 ns	0.94

IndexOfAnyAsciiByte	main	17	3.122 ns	0.0109 ns	1.00
IndexOfAnyAsciiByte	pr	17	3.061 ns	0.0124 ns	0.98

IndexOfAnyByte	main	17	5.465 ns	0.0762 ns	1.00
IndexOfAnyByte	pr	17	4.102 ns	0.0258 ns	0.75

IndexOfAnyAsciiChar	main	32	3.381 ns	0.0098 ns	1.00
IndexOfAnyAsciiChar	pr	32	3.042 ns	0.0095 ns	0.90

IndexOfAnyAsciiByte	main	32	3.261 ns	0.0063 ns	1.00
IndexOfAnyAsciiByte	pr	32	3.074 ns	0.0138 ns	0.94

IndexOfAnyByte	main	32	5.538 ns	0.0131 ns	1.00
IndexOfAnyByte	pr	32	4.149 ns	0.0197 ns	0.75

IndexOfAnyAsciiChar	main	1000	60.429 ns	0.2208 ns	1.00
IndexOfAnyAsciiChar	pr	1000	34.636 ns	0.1453 ns	0.57

IndexOfAnyAsciiByte	main	1000	49.191 ns	0.1576 ns	1.00
IndexOfAnyAsciiByte	pr	1000	34.826 ns	0.4630 ns	0.71

IndexOfAnyByte	main	1000	83.976 ns	0.2974 ns	1.00
IndexOfAnyByte	pr	1000	45.647 ns	0.1477 ns	0.54

IndexOfAnyAsciiChar	main	100000	5,865.958 ns	23.1353 ns	1.00
IndexOfAnyAsciiChar	pr	100000	3,880.814 ns	35.5076 ns	0.66

IndexOfAnyAsciiByte	main	100000	4,753.754 ns	15.7070 ns	1.00
IndexOfAnyAsciiByte	pr	100000	3,215.513 ns	10.5250 ns	0.68

IndexOfAnyByte	main	100000	7,691.448 ns	38.5081 ns	1.00
IndexOfAnyByte	pr	100000	4,488.097 ns	41.8751 ns	0.58

Chars

Method	Length	Mean	Error	StdDev
IndexOfAnyAsciiChar	100000	3.936 us	0.0525 us	0.0491 us
IndexOf	100000	4.925 us	0.0186 us	0.0174 us
IndexOfAny2Values	100000	5.754 us	0.0196 us	0.0174 us
IndexOfAny3Values	100000	6.161 us	0.0257 us	0.0228 us
IndexOfAny4Values	100000	7.011 us	0.0162 us	0.0135 us
IndexOfAny5Values	100000	7.743 us	0.0363 us	0.0303 us
IndexOfAnyInRange_Char	100000	5.196 us	0.0143 us	0.0120 us

Bytes

Method	Length	Mean	Error	StdDev
IndexOfAnyAsciiByte	100000	3.189 us	0.0171 us	0.0151 us
IndexOfAnyByte	100000	4.484 us	0.0259 us	0.0229 us
IndexOf	100000	1.971 us	0.0186 us	0.0174 us
IndexOfAny2Values	100000	2.233 us	0.0093 us	0.0078 us
IndexOfAny3Values	100000	2.628 us	0.0081 us	0.0076 us
IndexOfAny4Values	100000	2.929 us	0.0155 us	0.0145 us
IndexOfAny5Values	100000	3.270 us	0.0102 us	0.0090 us
IndexOfAnyInRange_Byte	100000	2.187 us	0.0090 us	0.0084 us

ghost · 2022-11-26T01:52:48Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Easier to review while ignoring whitespace

This change makes IndexOfAnyValues noticeably faster than a regular IndexOf(char) without #78861 :)

Method	Toolchain	Length	Mean	Error	Ratio
IndexOfAnyAsciiChar	main	1	2.270 ns	0.0145 ns	1.00
IndexOfAnyAsciiChar	pr	1	2.279 ns	0.0204 ns	1.00

IndexOfAnyAsciiByte	main	1	1.787 ns	0.0053 ns	1.00
IndexOfAnyAsciiByte	pr	1	1.788 ns	0.0083 ns	1.00

IndexOfAnyByte	main	1	1.789 ns	0.0050 ns	1.00
IndexOfAnyByte	pr	1	1.790 ns	0.0051 ns	1.00

IndexOfAnyAsciiChar	main	16	2.784 ns	0.0247 ns	1.00
IndexOfAnyAsciiChar	pr	16	2.782 ns	0.0252 ns	1.00

IndexOfAnyAsciiByte	main	16	2.696 ns	0.0117 ns	1.00
IndexOfAnyAsciiByte	pr	16	2.364 ns	0.0084 ns	0.88

IndexOfAnyByte	main	16	3.887 ns	0.0196 ns	1.00
IndexOfAnyByte	pr	16	3.488 ns	0.0214 ns	0.90

IndexOfAnyAsciiChar	main	17	3.348 ns	0.0125 ns	1.00
IndexOfAnyAsciiChar	pr	17	3.103 ns	0.0639 ns	0.94

IndexOfAnyAsciiByte	main	17	3.122 ns	0.0109 ns	1.00
IndexOfAnyAsciiByte	pr	17	3.061 ns	0.0124 ns	0.98

IndexOfAnyByte	main	17	5.465 ns	0.0762 ns	1.00
IndexOfAnyByte	pr	17	4.102 ns	0.0258 ns	0.75

IndexOfAnyAsciiChar	main	32	3.381 ns	0.0098 ns	1.00
IndexOfAnyAsciiChar	pr	32	3.042 ns	0.0095 ns	0.90

IndexOfAnyAsciiByte	main	32	3.261 ns	0.0063 ns	1.00
IndexOfAnyAsciiByte	pr	32	3.074 ns	0.0138 ns	0.94

IndexOfAnyByte	main	32	5.538 ns	0.0131 ns	1.00
IndexOfAnyByte	pr	32	4.149 ns	0.0197 ns	0.75

IndexOfAnyAsciiChar	main	1000	60.429 ns	0.2208 ns	1.00
IndexOfAnyAsciiChar	pr	1000	34.636 ns	0.1453 ns	0.57

IndexOfAnyAsciiByte	main	1000	49.191 ns	0.1576 ns	1.00
IndexOfAnyAsciiByte	pr	1000	34.826 ns	0.4630 ns	0.71

IndexOfAnyByte	main	1000	83.976 ns	0.2974 ns	1.00
IndexOfAnyByte	pr	1000	45.647 ns	0.1477 ns	0.54

IndexOfAnyAsciiChar	main	100000	5,865.958 ns	23.1353 ns	1.00
IndexOfAnyAsciiChar	pr	100000	3,880.814 ns	35.5076 ns	0.66

IndexOfAnyAsciiByte	main	100000	4,753.754 ns	15.7070 ns	1.00
IndexOfAnyAsciiByte	pr	100000	3,215.513 ns	10.5250 ns	0.68

IndexOfAnyByte	main	100000	7,691.448 ns	38.5081 ns	1.00
IndexOfAnyByte	pr	100000	4,488.097 ns	41.8751 ns	0.58

Chars

Method	Length	Mean	Error	StdDev
IndexOfAnyAsciiChar	100000	3.936 us	0.0525 us	0.0491 us
IndexOf	100000	4.925 us	0.0186 us	0.0174 us
IndexOfAny2Values	100000	5.754 us	0.0196 us	0.0174 us
IndexOfAny3Values	100000	6.161 us	0.0257 us	0.0228 us
IndexOfAny4Values	100000	7.011 us	0.0162 us	0.0135 us
IndexOfAny5Values	100000	7.743 us	0.0363 us	0.0303 us
IndexOfAnyInRange_Char	100000	5.196 us	0.0143 us	0.0120 us

Bytes

Method	Length	Mean	Error	StdDev
IndexOfAnyAsciiByte	100000	3.189 us	0.0171 us	0.0151 us
IndexOfAnyByte	100000	4.484 us	0.0259 us	0.0229 us
IndexOf	100000	1.971 us	0.0186 us	0.0174 us
IndexOfAny2Values	100000	2.233 us	0.0093 us	0.0078 us
IndexOfAny3Values	100000	2.628 us	0.0081 us	0.0076 us
IndexOfAny4Values	100000	2.929 us	0.0155 us	0.0145 us
IndexOfAny5Values	100000	3.270 us	0.0102 us	0.0090 us
IndexOfAnyInRange_Byte	100000	2.187 us	0.0090 us	0.0084 us

Author:	MihaZupan
Assignees:	MihaZupan
Labels:	`area-System.Memory`, `tenet-performance`
Milestone:	8.0.0

dakersnar · 2023-01-05T16:18:41Z

This change makes IndexOfAnyValues noticeably faster than a regular IndexOf(char) without #78861 :)

For context, does this PR have any relation to the changes made in #78861, or is this a completely independent optimization?

MihaZupan · 2023-01-05T16:20:34Z

Completely independent, I just happened to have worked on both at a similar point in time.

MihaZupan · 2023-01-17T16:03:44Z

Any thoughts on this PR @dotnet/area-system-memory?

dakersnar · 2023-01-18T16:25:28Z

Sorry, this has been on my TODO list. Can you provide a similar explanation to what you did here: #78861 (comment)?

MihaZupan · 2023-01-18T16:47:40Z

This file (IndexOfAnyAsciiSearcher.cs) already contains a bunch of vectorization code for Ssse3 and Arm64 through Vector128.

This PR effectively copy-pastes the existing Vector128 logic to add Vector256 paths. Everything is pretty much the same, just processing 2x chars/bytes per iteration.

Taking IndexOfAnyVectorized as an example. Currently it's written as

if (searchSpaceLength > 16)
{
    // Process the input with 2 Vector128s (16 chars) at a time.
}

// Process the last 1-16 chars with 2 overlapped Vector128s.

return NotFound;

and after this PR, it's written as

if (searchSpaceLength > 16)
{
    if (Avx2.IsSupported)
    {
        if (searchSpaceLength > 32)
        {
            // Process the input with 2 Vector256s (32 chars) at a time.
        }

        // Process the last 1-32 chars with 2 overlapped Vector256s.
        
        return NotFound;
    }
    else
    {
        // Process the input with 2 Vector128s (16 chars) at a time.
    }
}

// Process the last 1-16 chars with 2 overlapped Vector128s.

return NotFound;

dakersnar

Looks good and thank you for the explanation. I assume all the helper methods are just copy pasted from the Vector128 implementations?

dakersnar · 2023-01-18T17:43:42Z

src/libraries/System.Memory/tests/Span/IndexOfAnyValues.cs

@@ -360,7 +360,7 @@ static int LastIndexOfAnyExceptReferenceImpl(ReadOnlySpan<char> searchSpace, Rea
        private static class IndexOfAnyValuesTestHelper
        {
            private const int MaxNeedleLength = 10;
-            private const int MaxHaystackLength = 40;
+            private const int MaxHaystackLength = 100;


Where does this number come from?

These are used by the test when running a few million random inputs through the implementation to check random edge cases.
This number should be large enough so that we test all sorts of code paths - e.g. 100 is enough for

// Process the input with 2 Vector256s (32 chars) at a time. + // Process the input with 2 Vector256s (32 chars) at a time. + // Process the last 1-32 chars with 2 overlapped Vector256s.

so if the "2 Vector256s at a time" step left something broken, we'd catch it.

On the other hand it doesn't make sense to increase it too much as we'd just end up testing fewer variants of short inputs - and there's nothing in the code that would behave fundamentally differently for inputs of 1k vs 10k characters.

MihaZupan · 2023-01-18T18:35:58Z

I assume all the helper methods are just copy pasted from the Vector128 implementations?

Yes*, with the exception that we call this FixUpPackedVector256Mask helper method that accounts for how AVX2 does packing (it's more like 2x 128bit instead of 1x 256bit). We do the same thing in PackedSpanHelpers.

MihaZupan · 2023-01-18T18:42:59Z

The build failure is known according to Build Analysis

MihaZupan added area-System.Memory tenet-performance Performance related issue labels Nov 26, 2022

MihaZupan added this to the 8.0.0 milestone Nov 26, 2022

MihaZupan requested a review from stephentoub November 26, 2022 01:52

MihaZupan self-assigned this Nov 26, 2022

This was referenced Nov 27, 2022

Vectorized HttpCharacters (and used IndexOfAnyValues in other places found) dotnet/aspnetcore#45300

Merged

Use IndexOfAnyValues in Uri.EscapeString helper #79024

Merged

Add AVX2 support to IndexOfAnyValues

56059b6

MihaZupan force-pushed the indexofanyvalues-avx2 branch from 7522873 to 56059b6 Compare November 30, 2022 04:43

lewing requested a review from radekdoulik December 20, 2022 19:15

MihaZupan requested a review from a team January 4, 2023 23:35

dakersnar approved these changes Jan 18, 2023

View reviewed changes

MihaZupan merged commit 4e0195e into dotnet:main Jan 18, 2023

mdh1418 pushed a commit to mdh1418/runtime that referenced this pull request Jan 24, 2023

Add AVX2 support to IndexOfAnyValues (dotnet#78863)

8d85077

EgorBo mentioned this pull request Jan 24, 2023

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock #81105

Closed

ghost locked as resolved and limited conversation to collaborators Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX2 support to IndexOfAnyValues #78863

Add AVX2 support to IndexOfAnyValues #78863

MihaZupan commented Nov 26, 2022 •

edited

Loading

ghost commented Nov 26, 2022

Easier to review while ignoring whitespace

dakersnar commented Jan 5, 2023

MihaZupan commented Jan 5, 2023

MihaZupan commented Jan 17, 2023

dakersnar commented Jan 18, 2023

MihaZupan commented Jan 18, 2023

dakersnar left a comment

dakersnar Jan 18, 2023

MihaZupan Jan 18, 2023

MihaZupan commented Jan 18, 2023 •

edited

Loading

MihaZupan commented Jan 18, 2023

Add AVX2 support to IndexOfAnyValues #78863

Add AVX2 support to IndexOfAnyValues #78863

Conversation

MihaZupan commented Nov 26, 2022 • edited Loading

Easier to review while ignoring whitespace

ghost commented Nov 26, 2022

Easier to review while ignoring whitespace

dakersnar commented Jan 5, 2023

MihaZupan commented Jan 5, 2023

MihaZupan commented Jan 17, 2023

dakersnar commented Jan 18, 2023

MihaZupan commented Jan 18, 2023

dakersnar left a comment

Choose a reason for hiding this comment

dakersnar Jan 18, 2023

Choose a reason for hiding this comment

MihaZupan Jan 18, 2023

Choose a reason for hiding this comment

MihaZupan commented Jan 18, 2023 • edited Loading

MihaZupan commented Jan 18, 2023

MihaZupan commented Nov 26, 2022 •

edited

Loading

MihaZupan commented Jan 18, 2023 •

edited

Loading