Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move SearchValues scalar loops into IndexOfAnyAsciiSearcher #91937

Merged
merged 1 commit into from
Sep 19, 2023

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Sep 12, 2023

This PR moves the scalar loops into the core worker methods. This reduces the amount of code on each call site and makes it easier for us to make further changes like adding Avx512 support.

The code that's inlined into IndexOfAny callers is now just a call to the worker method instead of length < 8 ? call scalar : call vectorized.

Example call site diff
 ; Assembly listing for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
 ; Emitting BLENDED_CODE for X64 with AVX512 - Unix
 ; FullOpts code
 
 G_M48088_IG01:
        push     rbp
-       sub      rsp, 16
-       lea      rbp, [rsp+0x10]
-						;; size=10 bbWeight=1 PerfScore 1.75
+       mov      rbp, rsp
+						;; size=4 bbWeight=1 PerfScore 1.25
 G_M48088_IG02:
        mov      rdx, 0xD1FFAB1E      ; const ptr
-       mov      rax, gword ptr [rdx]
-       mov      rdx, rax
-       mov      bword ptr [rbp-0x10], rdi
-       mov      dword ptr [rbp-0x04], esi
-       cmp      esi, 8
-       jge      SHORT G_M48088_IG04
-						;; size=28 bbWeight=1 PerfScore 5.75
-G_M48088_IG03:
-       mov      rdi, rdx
-       mov      rsi, bword ptr [rbp-0x10]
-       mov      edx, dword ptr [rbp-0x04]
-       mov      rax, 0xD1FFAB1E      ; code for System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
-       call     [rax]System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
-       jmp      SHORT G_M48088_IG05
-						;; size=24 bbWeight=0.50 PerfScore 3.75
-G_M48088_IG04:
+       mov      rdx, gword ptr [rdx]
        add      rdx, 8
-       mov      rdi, bword ptr [rbp-0x10]
-       mov      esi, dword ptr [rbp-0x04]
        mov      rax, 0xD1FFAB1E      ; code for System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
        call     [rax]System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
-						;; size=23 bbWeight=0.50 PerfScore 2.75
-G_M48088_IG05:
        nop      
-						;; size=1 bbWeight=1 PerfScore 0.25
-G_M48088_IG06:
-       add      rsp, 16
+						;; size=30 bbWeight=1 PerfScore 6.00
+G_M48088_IG03:
        pop      rbp
        ret      
-						;; size=6 bbWeight=1 PerfScore 1.75
+						;; size=2 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 92, prolog size 10, PerfScore 25.20, instruction count 25, allocated bytes for code 92 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 36, prolog size 4, PerfScore 12.35, instruction count 10, allocated bytes for code 36 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)

Overall seems to be a slight improvement for Regex

Regex benchmark results

Perf_Regex_Industry_SliceSlice

Toolchain Options Mean Error Ratio
main Compiled 473.1 ms 6.91 ms 1.00
pr Compiled 455.7 ms 1.64 ms 0.96
main IgnoreCase, Compiled 709.4 ms 1.97 ms 1.00
pr IgnoreCase, Compiled 709.2 ms 2.12 ms 1.00

Perf_Regex_Industry_RustLang_Sherlock

Toolchain Pattern Options Mean Error Ratio
main .* Compiled 798,549.12 ns 5,683.298 ns 1.00
pr .* Compiled 823,663.43 ns 9,133.668 ns 1.03
main (?i)Holmes Compiled 85,117.48 ns 697.361 ns 1.00
pr (?i)Holmes Compiled 85,117.21 ns 713.527 ns 1.00
main (?i)Sher[a-z]+|Hol[a-z]+ Compiled 703,678.45 ns 4,730.359 ns 1.00
pr (?i)Sher[a-z]+|Hol[a-z]+ Compiled 701,771.58 ns 5,629.408 ns 1.00
main (?i)Sherlock Compiled 56,515.81 ns 547.958 ns 1.00
pr (?i)Sherlock Compiled 56,315.56 ns 268.100 ns 1.00
main (?i)Sherlock Holmes Compiled 56,270.02 ns 473.305 ns 1.00
pr (?i)Sherlock Holmes Compiled 55,911.93 ns 406.218 ns 0.99
main (?i)Sherlock|Holmes|Watson Compiled 826,406.58 ns 8,737.846 ns 1.00
pr (?i)Sherlock|Holmes|Watson Compiled 819,068.84 ns 9,151.341 ns 0.99
main (?i)Sherlock|(...)er|John|Baker [49] Compiled 1,736,080.19 ns 11,954.994 ns 1.00
pr (?i)Sherlock|(...)er|John|Baker [49] Compiled 1,693,613.21 ns 12,089.156 ns 0.98
main (?i)the Compiled 408,989.03 ns 2,985.069 ns 1.00
pr (?i)the Compiled 416,752.52 ns 2,813.352 ns 1.02
main (?m)^Sherlock(...)rlock Holmes$ [37] Compiled 49,308.97 ns 534.313 ns 1.00
pr (?m)^Sherlock(...)rlock Holmes$ [37] Compiled 49,340.07 ns 556.603 ns 1.00
main (?s).* Compiled 53.05 ns 0.476 ns 1.00
pr (?s).* Compiled 52.97 ns 0.200 ns 1.00
main [^\\n]* Compiled 813,008.75 ns 8,003.114 ns 1.00
pr [^\\n]* Compiled 806,846.97 ns 5,795.755 ns 0.99
main [a-q][^u-z]{13}x Compiled 35,882.13 ns 588.960 ns 1.00
pr [a-q][^u-z]{13}x Compiled 36,172.07 ns 338.241 ns 1.01
main [a-zA-Z]+ing Compiled 3,534,267.01 ns 21,696.203 ns 1.00
pr [a-zA-Z]+ing Compiled 3,520,875.87 ns 29,773.484 ns 1.00
main \b\w+n\b Compiled 8,315,633.51 ns 106,259.166 ns 1.00
pr \b\w+n\b Compiled 8,075,193.70 ns 114,927.130 ns 0.97
main \p{L} Compiled 10,449,981.19 ns 114,976.724 ns 1.00
pr \p{L} Compiled 10,313,895.58 ns 81,089.130 ns 0.99
main \p{Ll} Compiled 10,180,677.31 ns 190,903.367 ns 1.00
pr \p{Ll} Compiled 10,283,964.06 ns 226,219.747 ns 1.01
main \p{Lu} Compiled 491,024.59 ns 16,920.356 ns 1.00
pr \p{Lu} Compiled 479,731.83 ns 3,335.590 ns 0.98
main \s[a-zA-Z]{0,12}ing\s Compiled 3,868,198.99 ns 7,235.956 ns 1.00
pr \s[a-zA-Z]{0,12}ing\s Compiled 3,893,177.63 ns 20,065.318 ns 1.01
main \w+ Compiled 5,106,160.01 ns 179,273.894 ns 1.00
pr \w+ Compiled 4,981,989.76 ns 52,194.406 ns 0.98
main \w+\s+Holmes Compiled 3,027,514.95 ns 7,991.676 ns 1.00
pr \w+\s+Holmes Compiled 3,067,955.66 ns 24,257.065 ns 1.01
main \w+\s+Holmes\s+\w+ Compiled 3,063,962.72 ns 55,800.505 ns 1.00
pr \w+\s+Holmes\s+\w+ Compiled 3,023,497.71 ns 7,488.420 ns 0.99
main aei Compiled 53,782.69 ns 245.684 ns 1.00
pr aei Compiled 53,429.59 ns 235.059 ns 0.99
main aqj Compiled 41,417.83 ns 480.492 ns 1.00
pr aqj Compiled 41,417.82 ns 366.348 ns 1.00
main Holmes Compiled 62,375.99 ns 398.244 ns 1.00
pr Holmes Compiled 60,441.77 ns 1,310.037 ns 0.97
main Holmes.{0,25}(...).{0,25}Holmes [39] Compiled 65,651.09 ns 447.743 ns 1.00
pr Holmes.{0,25}(...).{0,25}Holmes [39] Compiled 66,178.52 ns 251.919 ns 1.01
main Sher[a-z]+|Hol[a-z]+ Compiled 69,989.95 ns 314.014 ns 1.00
pr Sher[a-z]+|Hol[a-z]+ Compiled 70,662.48 ns 748.390 ns 1.01
main Sherlock Compiled 47,965.71 ns 1,501.142 ns 1.00
pr Sherlock Compiled 46,314.21 ns 497.869 ns 0.97
main Sherlock Holmes Compiled 47,004.90 ns 492.846 ns 1.00
pr Sherlock Holmes Compiled 47,470.82 ns 828.347 ns 1.01
main Sherlock\s+Holmes Compiled 49,807.98 ns 669.379 ns 1.00
pr Sherlock\s+Holmes Compiled 48,251.21 ns 1,189.570 ns 0.97
main Sherlock|Holmes Compiled 68,393.54 ns 1,048.419 ns 1.00
pr Sherlock|Holmes Compiled 68,868.75 ns 1,157.104 ns 1.01
main Sherlock|Holmes|Watson Compiled 92,706.84 ns 1,291.153 ns 1.00
pr Sherlock|Holmes|Watson Compiled 89,613.81 ns 316.457 ns 0.97
main Sherlock|Holm(...)er|John|Baker [45] Compiled 190,483.91 ns 2,410.551 ns 1.00
pr Sherlock|Holm(...)er|John|Baker [45] Compiled 186,409.86 ns 938.477 ns 0.98
main Sherlock|Street Compiled 38,216.45 ns 186.622 ns 1.00
pr Sherlock|Street Compiled 38,127.93 ns 192.193 ns 1.00
main the Compiled 328,277.13 ns 1,606.118 ns 1.00
pr the Compiled 324,914.77 ns 953.162 ns 0.99
main The Compiled 70,545.06 ns 329.177 ns 1.00
pr The Compiled 71,840.54 ns 1,164.975 ns 1.02
main the\s+\w+ Compiled 444,779.24 ns 4,650.472 ns 1.00
pr the\s+\w+ Compiled 443,691.85 ns 4,464.689 ns 1.00
main zqj Compiled 46,169.68 ns 547.976 ns 1.00
pr zqj Compiled 45,307.75 ns 450.939 ns 0.98

Perf_Regex_Industry_BoostDocs_Simple

Toolchain Id Options Mean Error Ratio
main 0 Compiled 33.32 ns 0.133 ns 1.00
pr 0 Compiled 33.41 ns 0.168 ns 1.00
main 1 Compiled 56.37 ns 0.475 ns 1.00
pr 1 Compiled 55.19 ns 0.208 ns 0.98
main 2 Compiled 63.36 ns 0.222 ns 1.00
pr 2 Compiled 65.02 ns 1.170 ns 1.02
main 3 Compiled 92.81 ns 0.900 ns 1.00
pr 3 Compiled 94.12 ns 0.797 ns 1.01
main 4 Compiled 82.25 ns 0.578 ns 1.00
pr 4 Compiled 84.74 ns 0.503 ns 1.03
main 5 Compiled 84.34 ns 1.879 ns 1.00
pr 5 Compiled 83.15 ns 1.085 ns 0.99
main 6 Compiled 37.36 ns 0.138 ns 1.00
pr 6 Compiled 37.42 ns 0.397 ns 1.00
main 7 Compiled 37.20 ns 0.173 ns 1.00
pr 7 Compiled 36.77 ns 0.125 ns 0.99
main 8 Compiled 37.29 ns 0.141 ns 1.00
pr 8 Compiled 36.89 ns 0.109 ns 0.99
main 9 Compiled 35.88 ns 0.300 ns 1.00
pr 9 Compiled 35.32 ns 0.102 ns 0.98
main 10 Compiled 37.01 ns 0.389 ns 1.00
pr 10 Compiled 36.72 ns 0.299 ns 0.99
main 11 Compiled 36.26 ns 0.321 ns 1.00
pr 11 Compiled 36.70 ns 0.426 ns 1.01
main 12 Compiled 41.44 ns 0.502 ns 1.00
pr 12 Compiled 40.61 ns 0.408 ns 0.98
main 13 Compiled 41.33 ns 0.438 ns 1.00
pr 13 Compiled 40.55 ns 0.528 ns 0.98

Perf_Regex_Industry_Mariomkas

Method Toolchain Pattern Options Mean Error Ratio
Ctor main (?:(?:250-5]?[0-9][0-9]) [87] Compiled 23.50 μs 0.554 μs 1.00
Ctor pr (?:(?:250-5]?[0-9][0-9]) [87] Compiled 22.99 μs 0.209 μs 0.98
Count main (?:(?:250-5]?[0-9][0-9]) [87] Compiled 3,335.45 μs 34.881 μs 1.00
Count pr (?:(?:250-5]?[0-9][0-9]) [87] Compiled 3,328.27 μs 27.286 μs 1.00
Ctor main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 20.95 μs 0.316 μs 1.00
Ctor pr [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 20.55 μs 0.184 μs 0.98
Count main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 1,264.20 μs 15.433 μs 1.00
Count pr [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 1,273.16 μs 13.892 μs 1.01
Ctor main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 16.57 μs 0.235 μs 1.00
Ctor pr [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 16.25 μs 0.281 μs 0.98
Count main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 510.35 μs 8.704 μs 1.00
Count pr [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 512.56 μs 7.421 μs 1.00

Perf_Regex_Industry_Leipzig

Toolchain Pattern Options Mean Error Ratio
main .{0,2}(Tom|Sawyer|Huckleberry|Finn) Compiled 257.041 ms 5.7381 ms 1.00
pr .{0,2}(Tom|Sawyer|Huckleberry|Finn) Compiled 246.693 ms 1.9807 ms 0.96
main .{2,4}(Tom|Sawyer|Huckleberry|Finn) Compiled 302.817 ms 2.4238 ms 1.00
pr .{2,4}(Tom|Sawyer|Huckleberry|Finn) Compiled 328.873 ms 36.9452 ms 1.09
main (?i)Tom|Sawyer|Huckleberry|Finn Compiled 23.502 ms 0.2097 ms 1.00
pr (?i)Tom|Sawyer|Huckleberry|Finn Compiled 22.848 ms 0.2486 ms 0.97
main (?i)Twain Compiled 3.744 ms 0.0451 ms 1.00
pr (?i)Twain Compiled 3.674 ms 0.0263 ms 0.98
main ([A-Za-z]awyer|[A-Za-z]inn)\s Compiled 11.619 ms 0.1049 ms 1.00
pr ([A-Za-z]awyer|[A-Za-z]inn)\s Compiled 11.584 ms 0.1061 ms 1.00
main [a-z]shing Compiled 2.657 ms 0.0258 ms 1.00
pr [a-z]shing Compiled 2.667 ms 0.0129 ms 1.00
main \p{Sm} Compiled 2.477 ms 0.0326 ms 1.00
pr \p{Sm} Compiled 2.470 ms 0.0341 ms 1.00
main Huck[a-zA-Z]+|Saw[a-zA-Z]+ Compiled 2.586 ms 0.0433 ms 1.00
pr Huck[a-zA-Z]+|Saw[a-zA-Z]+ Compiled 2.730 ms 0.0408 ms 1.06
main Tom.{10,25}river|river.{10,25}Tom Compiled 7.973 ms 0.0958 ms 1.00
pr Tom.{10,25}river|river.{10,25}Tom Compiled 7.964 ms 0.1412 ms 1.00
main Tom|Sawyer|Huckleberry|Finn Compiled 4.381 ms 0.0452 ms 1.00
pr Tom|Sawyer|Huckleberry|Finn Compiled 4.468 ms 0.0550 ms 1.02
main Twain Compiled 2.666 ms 0.0548 ms 1.00
pr Twain Compiled 2.645 ms 0.0357 ms 0.99

@MihaZupan MihaZupan added this to the 9.0.0 milestone Sep 12, 2023
@MihaZupan MihaZupan self-assigned this Sep 12, 2023
@ghost
Copy link

ghost commented Sep 12, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR moves the scalar loops into the core worker methods. This reduces the amount of code on each call site and makes it easier for us to make further changes like adding Avx512 support.

The code that's inlined into IndexOfAny callers is now just a call to the worker method instead of length < 8 ? call scalar : call vectorized.

Example call site diff
 ; Assembly listing for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
 ; Emitting BLENDED_CODE for X64 with AVX512 - Unix
 ; FullOpts code
 
 G_M48088_IG01:
        push     rbp
-       sub      rsp, 16
-       lea      rbp, [rsp+0x10]
-						;; size=10 bbWeight=1 PerfScore 1.75
+       mov      rbp, rsp
+						;; size=4 bbWeight=1 PerfScore 1.25
 G_M48088_IG02:
        mov      rdx, 0xD1FFAB1E      ; const ptr
-       mov      rax, gword ptr [rdx]
-       mov      rdx, rax
-       mov      bword ptr [rbp-0x10], rdi
-       mov      dword ptr [rbp-0x04], esi
-       cmp      esi, 8
-       jge      SHORT G_M48088_IG04
-						;; size=28 bbWeight=1 PerfScore 5.75
-G_M48088_IG03:
-       mov      rdi, rdx
-       mov      rsi, bword ptr [rbp-0x10]
-       mov      edx, dword ptr [rbp-0x04]
-       mov      rax, 0xD1FFAB1E      ; code for System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
-       call     [rax]System.Buffers.AsciiCharSearchValues`1[System.Buffers.IndexOfAnyAsciiSearcher+Default]:IndexOfAnyScalar[System.Buffers.IndexOfAnyAsciiSearcher+Negate](byref,int):int:this
-       jmp      SHORT G_M48088_IG05
-						;; size=24 bbWeight=0.50 PerfScore 3.75
-G_M48088_IG04:
+       mov      rdx, gword ptr [rdx]
        add      rdx, 8
-       mov      rdi, bword ptr [rbp-0x10]
-       mov      esi, dword ptr [rbp-0x04]
        mov      rax, 0xD1FFAB1E      ; code for System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
        call     [rax]System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyVectorized[System.Buffers.IndexOfAnyAsciiSearcher+Negate,System.Buffers.IndexOfAnyAsciiSearcher+Default](byref,int,byref):int
-						;; size=23 bbWeight=0.50 PerfScore 2.75
-G_M48088_IG05:
        nop      
-						;; size=1 bbWeight=1 PerfScore 0.25
-G_M48088_IG06:
-       add      rsp, 16
+						;; size=30 bbWeight=1 PerfScore 6.00
+G_M48088_IG03:
        pop      rbp
        ret      
-						;; size=6 bbWeight=1 PerfScore 1.75
+						;; size=2 bbWeight=1 PerfScore 1.50
 
-; Total bytes of code 92, prolog size 10, PerfScore 25.20, instruction count 25, allocated bytes for code 92 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)
+; Total bytes of code 36, prolog size 4, PerfScore 12.35, instruction count 10, allocated bytes for code 36 (MethodHash=586b4427) for method System.Buffers.Text.Base64+Base64CharValidatable:IndexOfAnyExcept(System.ReadOnlySpan`1[ushort]):int (FullOpts)

Overall seems to be a slight improvement for Regex

Regex benchmark results

Perf_Regex_Industry_SliceSlice

Toolchain Options Mean Error Ratio
main Compiled 473.1 ms 6.91 ms 1.00
pr Compiled 455.7 ms 1.64 ms 0.96
main IgnoreCase, Compiled 709.4 ms 1.97 ms 1.00
pr IgnoreCase, Compiled 709.2 ms 2.12 ms 1.00

Perf_Regex_Industry_RustLang_Sherlock

Toolchain Pattern Options Mean Error Ratio
main .* Compiled 798,549.12 ns 5,683.298 ns 1.00
pr .* Compiled 823,663.43 ns 9,133.668 ns 1.03
main (?i)Holmes Compiled 85,117.48 ns 697.361 ns 1.00
pr (?i)Holmes Compiled 85,117.21 ns 713.527 ns 1.00
main (?i)Sher[a-z]+ Hol[a-z]+ Compiled 703,678.45 ns 4,730.359 ns
pr (?i)Sher[a-z]+ Hol[a-z]+ Compiled 701,771.58 ns 5,629.408 ns
main (?i)Sherlock Compiled 56,515.81 ns 547.958 ns 1.00
pr (?i)Sherlock Compiled 56,315.56 ns 268.100 ns 1.00
main (?i)Sherlock Holmes Compiled 56,270.02 ns 473.305 ns 1.00
pr (?i)Sherlock Holmes Compiled 55,911.93 ns 406.218 ns 0.99
main (?i)Sherlock Holmes Watson Compiled 826,406.58 ns
pr (?i)Sherlock Holmes Watson Compiled 819,068.84 ns
main (?i)Sherlock (...)er John Baker [49] Compiled
pr (?i)Sherlock (...)er John Baker [49] Compiled
main (?i)the Compiled 408,989.03 ns 2,985.069 ns 1.00
pr (?i)the Compiled 416,752.52 ns 2,813.352 ns 1.02
main (?m)^Sherlock(...)rlock Holmes$ [37] Compiled 49,308.97 ns 534.313 ns 1.00
pr (?m)^Sherlock(...)rlock Holmes$ [37] Compiled 49,340.07 ns 556.603 ns 1.00
main (?s).* Compiled 53.05 ns 0.476 ns 1.00
pr (?s).* Compiled 52.97 ns 0.200 ns 1.00
main [^\\n]* Compiled 813,008.75 ns 8,003.114 ns 1.00
pr [^\\n]* Compiled 806,846.97 ns 5,795.755 ns 0.99
main [a-q][^u-z]{13}x Compiled 35,882.13 ns 588.960 ns 1.00
pr [a-q][^u-z]{13}x Compiled 36,172.07 ns 338.241 ns 1.01
main [a-zA-Z]+ing Compiled 3,534,267.01 ns 21,696.203 ns 1.00
pr [a-zA-Z]+ing Compiled 3,520,875.87 ns 29,773.484 ns 1.00
main \b\w+n\b Compiled 8,315,633.51 ns 106,259.166 ns 1.00
pr \b\w+n\b Compiled 8,075,193.70 ns 114,927.130 ns 0.97
main \p{L} Compiled 10,449,981.19 ns 114,976.724 ns 1.00
pr \p{L} Compiled 10,313,895.58 ns 81,089.130 ns 0.99
main \p{Ll} Compiled 10,180,677.31 ns 190,903.367 ns 1.00
pr \p{Ll} Compiled 10,283,964.06 ns 226,219.747 ns 1.01
main \p{Lu} Compiled 491,024.59 ns 16,920.356 ns 1.00
pr \p{Lu} Compiled 479,731.83 ns 3,335.590 ns 0.98
main \s[a-zA-Z]{0,12}ing\s Compiled 3,868,198.99 ns 7,235.956 ns 1.00
pr \s[a-zA-Z]{0,12}ing\s Compiled 3,893,177.63 ns 20,065.318 ns 1.01
main \w+ Compiled 5,106,160.01 ns 179,273.894 ns 1.00
pr \w+ Compiled 4,981,989.76 ns 52,194.406 ns 0.98
main \w+\s+Holmes Compiled 3,027,514.95 ns 7,991.676 ns 1.00
pr \w+\s+Holmes Compiled 3,067,955.66 ns 24,257.065 ns 1.01
main \w+\s+Holmes\s+\w+ Compiled 3,063,962.72 ns 55,800.505 ns 1.00
pr \w+\s+Holmes\s+\w+ Compiled 3,023,497.71 ns 7,488.420 ns 0.99
main aei Compiled 53,782.69 ns 245.684 ns 1.00
pr aei Compiled 53,429.59 ns 235.059 ns 0.99
main aqj Compiled 41,417.83 ns 480.492 ns 1.00
pr aqj Compiled 41,417.82 ns 366.348 ns 1.00
main Holmes Compiled 62,375.99 ns 398.244 ns 1.00
pr Holmes Compiled 60,441.77 ns 1,310.037 ns 0.97
main Holmes.{0,25}(...).{0,25}Holmes [39] Compiled 65,651.09 ns 447.743 ns 1.00
pr Holmes.{0,25}(...).{0,25}Holmes [39] Compiled 66,178.52 ns 251.919 ns 1.01
main Sher[a-z]+ Hol[a-z]+ Compiled 69,989.95 ns 314.014 ns
pr Sher[a-z]+ Hol[a-z]+ Compiled 70,662.48 ns 748.390 ns
main Sherlock Compiled 47,965.71 ns 1,501.142 ns 1.00
pr Sherlock Compiled 46,314.21 ns 497.869 ns 0.97
main Sherlock Holmes Compiled 47,004.90 ns 492.846 ns 1.00
pr Sherlock Holmes Compiled 47,470.82 ns 828.347 ns 1.01
main Sherlock\s+Holmes Compiled 49,807.98 ns 669.379 ns 1.00
pr Sherlock\s+Holmes Compiled 48,251.21 ns 1,189.570 ns 0.97
main Sherlock Holmes Compiled 68,393.54 ns 1,048.419 ns
pr Sherlock Holmes Compiled 68,868.75 ns 1,157.104 ns
main Sherlock Holmes Watson Compiled 92,706.84 ns
pr Sherlock Holmes Watson Compiled 89,613.81 ns
main Sherlock Holm(...)er John Baker [45] Compiled
pr Sherlock Holm(...)er John Baker [45] Compiled
main Sherlock Street Compiled 38,216.45 ns 186.622 ns
pr Sherlock Street Compiled 38,127.93 ns 192.193 ns
main the Compiled 328,277.13 ns 1,606.118 ns 1.00
pr the Compiled 324,914.77 ns 953.162 ns 0.99
main The Compiled 70,545.06 ns 329.177 ns 1.00
pr The Compiled 71,840.54 ns 1,164.975 ns 1.02
main the\s+\w+ Compiled 444,779.24 ns 4,650.472 ns 1.00
pr the\s+\w+ Compiled 443,691.85 ns 4,464.689 ns 1.00
main zqj Compiled 46,169.68 ns 547.976 ns 1.00
pr zqj Compiled 45,307.75 ns 450.939 ns 0.98

Perf_Regex_Industry_BoostDocs_Simple

Toolchain Id Options Mean Error Ratio
main 0 Compiled 33.32 ns 0.133 ns 1.00
pr 0 Compiled 33.41 ns 0.168 ns 1.00
main 1 Compiled 56.37 ns 0.475 ns 1.00
pr 1 Compiled 55.19 ns 0.208 ns 0.98
main 2 Compiled 63.36 ns 0.222 ns 1.00
pr 2 Compiled 65.02 ns 1.170 ns 1.02
main 3 Compiled 92.81 ns 0.900 ns 1.00
pr 3 Compiled 94.12 ns 0.797 ns 1.01
main 4 Compiled 82.25 ns 0.578 ns 1.00
pr 4 Compiled 84.74 ns 0.503 ns 1.03
main 5 Compiled 84.34 ns 1.879 ns 1.00
pr 5 Compiled 83.15 ns 1.085 ns 0.99
main 6 Compiled 37.36 ns 0.138 ns 1.00
pr 6 Compiled 37.42 ns 0.397 ns 1.00
main 7 Compiled 37.20 ns 0.173 ns 1.00
pr 7 Compiled 36.77 ns 0.125 ns 0.99
main 8 Compiled 37.29 ns 0.141 ns 1.00
pr 8 Compiled 36.89 ns 0.109 ns 0.99
main 9 Compiled 35.88 ns 0.300 ns 1.00
pr 9 Compiled 35.32 ns 0.102 ns 0.98
main 10 Compiled 37.01 ns 0.389 ns 1.00
pr 10 Compiled 36.72 ns 0.299 ns 0.99
main 11 Compiled 36.26 ns 0.321 ns 1.00
pr 11 Compiled 36.70 ns 0.426 ns 1.01
main 12 Compiled 41.44 ns 0.502 ns 1.00
pr 12 Compiled 40.61 ns 0.408 ns 0.98
main 13 Compiled 41.33 ns 0.438 ns 1.00
pr 13 Compiled 40.55 ns 0.528 ns 0.98

Perf_Regex_Industry_Mariomkas

Method Toolchain Pattern Options Mean Error Ratio
Ctor main (?:(?:250-5]?[0-9][0-9]) [87] Compiled 23.50 μs 0.554 μs 1.00
Ctor pr (?:(?:250-5]?[0-9][0-9]) [87] Compiled 22.99 μs 0.209 μs 0.98
Count main (?:(?:250-5]?[0-9][0-9]) [87] Compiled 3,335.45 μs 34.881 μs 1.00
Count pr (?:(?:250-5]?[0-9][0-9]) [87] Compiled 3,328.27 μs 27.286 μs 1.00
Ctor main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 20.95 μs 0.316 μs 1.00
Ctor pr [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 20.55 μs 0.184 μs 0.98
Count main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 1,264.20 μs 15.433 μs 1.00
Count pr [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] Compiled 1,273.16 μs 13.892 μs 1.01
Ctor main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 16.57 μs 0.235 μs 1.00
Ctor pr [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 16.25 μs 0.281 μs 0.98
Count main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 510.35 μs 8.704 μs 1.00
Count pr [\w\.+-]+@[\w\.-]+\.[\w\.-]+ Compiled 512.56 μs 7.421 μs 1.00

Perf_Regex_Industry_Leipzig

Toolchain Pattern Options Mean Error Ratio
main .{0,2}(Tom Sawyer Huckleberry Finn) Compiled
pr .{0,2}(Tom Sawyer Huckleberry Finn) Compiled
main .{2,4}(Tom Sawyer Huckleberry Finn) Compiled
pr .{2,4}(Tom Sawyer Huckleberry Finn) Compiled
main (?i)Tom Sawyer Huckleberry Finn Compiled
pr (?i)Tom Sawyer Huckleberry Finn Compiled
main (?i)Twain Compiled 3.744 ms 0.0451 ms 1.00
pr (?i)Twain Compiled 3.674 ms 0.0263 ms 0.98
main ([A-Za-z]awyer [A-Za-z]inn)\s Compiled 11.619 ms 0.1049 ms
pr ([A-Za-z]awyer [A-Za-z]inn)\s Compiled 11.584 ms 0.1061 ms
main [a-z]shing Compiled 2.657 ms 0.0258 ms 1.00
pr [a-z]shing Compiled 2.667 ms 0.0129 ms 1.00
main \p{Sm} Compiled 2.477 ms 0.0326 ms 1.00
pr \p{Sm} Compiled 2.470 ms 0.0341 ms 1.00
main Huck[a-zA-Z]+ Saw[a-zA-Z]+ Compiled 2.586 ms 0.0433 ms
pr Huck[a-zA-Z]+ Saw[a-zA-Z]+ Compiled 2.730 ms 0.0408 ms
main Tom.{10,25}river river.{10,25}Tom Compiled 7.973 ms 0.0958 ms
pr Tom.{10,25}river river.{10,25}Tom Compiled 7.964 ms 0.1412 ms
main Tom Sawyer Huckleberry Finn Compiled
pr Tom Sawyer Huckleberry Finn Compiled
main Twain Compiled 2.666 ms 0.0548 ms 1.00
pr Twain Compiled 2.645 ms 0.0357 ms 0.99
Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Buffers

Milestone: 9.0.0

@stephentoub
Copy link
Member

Do any of those benchmarks deal with really small inputs? That's where we'd expect to see a regression, right?

@MihaZupan
Copy link
Member Author

Seeing numbers along the lines of

Method Toolchain Length Mean Ratio
IndexOfAny main 1 1.919 ns 1.00
IndexOfAny pr 1 3.324 ns 1.74
IndexOfAny main 7 7.145 ns 1.00
IndexOfAny pr 7 7.517 ns 1.05
IndexOfAny main 8 4.040 ns 1.00
IndexOfAny pr 8 3.392 ns 0.84

@stephentoub stephentoub merged commit cc89f38 into dotnet:main Sep 19, 2023
170 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants