Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Ascii.Equals for small values #93191

Closed
wants to merge 5 commits into from
Closed

Conversation

yesmey
Copy link
Contributor

@yesmey yesmey commented Oct 8, 2023

An attempt to optimize Ascii.Equals for smaller values

Changes include:

  • 64bit register simd
  • Use better widening for 128bit vector (same as the 256/512 path)
  • Mask the ascii check into the same comparison for 256/512
  • Changed Sse2.UnpackLow to Vector128.WidenLower and removed an unused Vector64 path (Vector128.WidenLower will use vpmovzxbw when AVX2 is available)

Now that the code is shared across the different vector sizes now, it should be easy to convert to ISimdVector :) but it doesn't have any widening methods yet.

Benchmarks:

BenchmarkDotNet v0.13.10-nightly.20231019.90, Windows 11 (10.0.22631.2506/23H2/2023Update/SunValley3)
AMD Ryzen 7 7800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-alpha.1.23553.1
  [Host]     : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
  Job-DWACHG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-UKFQFX : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms  
LaunchCount=3  MaxIterationCount=20  MemoryRandomization=True  
MinIterationCount=15  WarmupCount=1  

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Equals_Bytes Job-DWACHG main 6 2.807 ns 0.0136 ns 0.0258 ns 2.801 ns 2.767 ns 2.879 ns 1.00 0.00 - NA
Equals_Bytes Job-UKFQFX PR 6 2.857 ns 0.0152 ns 0.0289 ns 2.848 ns 2.817 ns 2.945 ns 1.02 0.01 - NA
Equals_Chars Job-DWACHG main 6 3.426 ns 0.1181 ns 0.2642 ns 3.334 ns 3.014 ns 4.005 ns 1.00 0.00 - NA
Equals_Chars Job-UKFQFX PR 6 1.926 ns 0.0083 ns 0.0157 ns 1.920 ns 1.913 ns 2.005 ns 0.56 0.04 - NA
Equals_Bytes_Chars Job-DWACHG main 6 3.456 ns 0.0905 ns 0.2025 ns 3.389 ns 3.200 ns 4.054 ns 1.00 0.00 - NA
Equals_Bytes_Chars Job-UKFQFX PR 6 2.914 ns 0.0304 ns 0.0578 ns 2.896 ns 2.848 ns 3.104 ns 0.84 0.05 - NA
Equals_DifferentCase_Bytes Job-DWACHG main 6 1.330 ns 0.0036 ns 0.0069 ns 1.329 ns 1.322 ns 1.348 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes Job-UKFQFX PR 6 1.330 ns 0.0038 ns 0.0073 ns 1.328 ns 1.322 ns 1.346 ns 1.00 0.01 - NA
Equals_DifferentCase_Chars Job-DWACHG main 6 1.333 ns 0.0048 ns 0.0091 ns 1.327 ns 1.323 ns 1.353 ns 1.00 0.00 - NA
Equals_DifferentCase_Chars Job-UKFQFX PR 6 1.333 ns 0.0043 ns 0.0081 ns 1.330 ns 1.324 ns 1.348 ns 1.00 0.01 - NA
Equals_DifferentCase_Bytes_Chars Job-DWACHG main 6 1.330 ns 0.0035 ns 0.0066 ns 1.326 ns 1.324 ns 1.344 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes_Chars Job-UKFQFX PR 6 1.327 ns 0.0031 ns 0.0059 ns 1.325 ns 1.321 ns 1.346 ns 1.00 0.01 - NA
Equals_Bytes Job-DWACHG main 9 4.017 ns 0.0563 ns 0.1199 ns 3.987 ns 3.863 ns 4.306 ns 1.00 0.00 - NA
Equals_Bytes Job-UKFQFX PR 9 1.930 ns 0.0054 ns 0.0103 ns 1.928 ns 1.917 ns 1.948 ns 0.48 0.01 - NA
Equals_Chars Job-DWACHG main 9 2.121 ns 0.0045 ns 0.0086 ns 2.118 ns 2.111 ns 2.144 ns 1.00 0.00 - NA
Equals_Chars Job-UKFQFX PR 9 2.324 ns 0.0054 ns 0.0103 ns 2.321 ns 2.313 ns 2.347 ns 1.10 0.01 - NA
Equals_Bytes_Chars Job-DWACHG main 9 4.408 ns 0.0069 ns 0.0132 ns 4.404 ns 4.391 ns 4.436 ns 1.00 0.00 - NA
Equals_Bytes_Chars Job-UKFQFX PR 9 2.135 ns 0.0073 ns 0.0139 ns 2.131 ns 2.118 ns 2.198 ns 0.48 0.00 - NA
Equals_DifferentCase_Bytes Job-DWACHG main 9 1.326 ns 0.0033 ns 0.0062 ns 1.324 ns 1.319 ns 1.344 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes Job-UKFQFX PR 9 1.333 ns 0.0041 ns 0.0077 ns 1.332 ns 1.323 ns 1.350 ns 1.01 0.01 - NA
Equals_DifferentCase_Chars Job-DWACHG main 9 1.528 ns 0.0040 ns 0.0075 ns 1.525 ns 1.521 ns 1.550 ns 1.00 0.00 - NA
Equals_DifferentCase_Chars Job-UKFQFX PR 9 1.533 ns 0.0069 ns 0.0132 ns 1.529 ns 1.519 ns 1.586 ns 1.00 0.01 - NA
Equals_DifferentCase_Bytes_Chars Job-DWACHG main 9 1.339 ns 0.0064 ns 0.0121 ns 1.340 ns 1.321 ns 1.360 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes_Chars Job-UKFQFX PR 9 1.560 ns 0.0140 ns 0.0267 ns 1.562 ns 1.521 ns 1.603 ns 1.16 0.02 - NA
Equals_Bytes Job-DWACHG main 15 5.272 ns 0.0369 ns 0.0703 ns 5.238 ns 5.204 ns 5.463 ns 1.00 0.00 - NA
Equals_Bytes Job-UKFQFX PR 15 1.927 ns 0.0034 ns 0.0064 ns 1.925 ns 1.919 ns 1.944 ns 0.37 0.00 - NA
Equals_Chars Job-DWACHG main 15 2.139 ns 0.0276 ns 0.0558 ns 2.125 ns 2.117 ns 2.501 ns 1.00 0.00 - NA
Equals_Chars Job-UKFQFX PR 15 2.318 ns 0.0044 ns 0.0083 ns 2.315 ns 2.310 ns 2.345 ns 1.08 0.03 - NA
Equals_Bytes_Chars Job-DWACHG main 15 6.786 ns 0.0070 ns 0.0132 ns 6.780 ns 6.773 ns 6.826 ns 1.00 0.00 - NA
Equals_Bytes_Chars Job-UKFQFX PR 15 2.145 ns 0.0072 ns 0.0137 ns 2.139 ns 2.128 ns 2.177 ns 0.32 0.00 - NA
Equals_DifferentCase_Bytes Job-DWACHG main 15 1.329 ns 0.0031 ns 0.0059 ns 1.327 ns 1.324 ns 1.345 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes Job-UKFQFX PR 15 1.331 ns 0.0038 ns 0.0071 ns 1.329 ns 1.323 ns 1.349 ns 1.00 0.01 - NA
Equals_DifferentCase_Chars Job-DWACHG main 15 1.654 ns 0.0809 ns 0.1635 ns 1.542 ns 1.522 ns 2.009 ns 1.00 0.00 - NA
Equals_DifferentCase_Chars Job-UKFQFX PR 15 1.533 ns 0.0080 ns 0.0151 ns 1.527 ns 1.520 ns 1.603 ns 0.93 0.09 - NA
Equals_DifferentCase_Bytes_Chars Job-DWACHG main 15 1.347 ns 0.0092 ns 0.0175 ns 1.345 ns 1.324 ns 1.380 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes_Chars Job-UKFQFX PR 15 1.548 ns 0.0098 ns 0.0186 ns 1.538 ns 1.525 ns 1.606 ns 1.15 0.02 - NA
Equals_Bytes Job-DWACHG main 17 2.122 ns 0.0054 ns 0.0102 ns 2.120 ns 2.108 ns 2.148 ns 1.00 0.00 - NA
Equals_Bytes Job-UKFQFX PR 17 2.322 ns 0.0058 ns 0.0110 ns 2.317 ns 2.310 ns 2.349 ns 1.09 0.01 - NA
Equals_Chars Job-DWACHG main 17 2.192 ns 0.0500 ns 0.0952 ns 2.127 ns 2.117 ns 2.345 ns 1.00 0.00 - NA
Equals_Chars Job-UKFQFX PR 17 2.323 ns 0.0054 ns 0.0102 ns 2.319 ns 2.313 ns 2.348 ns 1.06 0.04 - NA
Equals_Bytes_Chars Job-DWACHG main 17 2.715 ns 0.0056 ns 0.0107 ns 2.712 ns 2.703 ns 2.743 ns 1.00 0.00 - NA
Equals_Bytes_Chars Job-UKFQFX PR 17 2.327 ns 0.0080 ns 0.0152 ns 2.322 ns 2.313 ns 2.376 ns 0.86 0.01 - NA
Equals_DifferentCase_Bytes Job-DWACHG main 17 1.528 ns 0.0040 ns 0.0077 ns 1.525 ns 1.521 ns 1.547 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes Job-UKFQFX PR 17 1.530 ns 0.0068 ns 0.0130 ns 1.525 ns 1.521 ns 1.591 ns 1.00 0.01 - NA
Equals_DifferentCase_Chars Job-DWACHG main 17 1.557 ns 0.0093 ns 0.0177 ns 1.556 ns 1.522 ns 1.589 ns 1.00 0.00 - NA
Equals_DifferentCase_Chars Job-UKFQFX PR 17 1.569 ns 0.0127 ns 0.0242 ns 1.566 ns 1.527 ns 1.622 ns 1.01 0.01 - NA
Equals_DifferentCase_Bytes_Chars Job-DWACHG main 17 1.729 ns 0.0043 ns 0.0082 ns 1.727 ns 1.720 ns 1.746 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes_Chars Job-UKFQFX PR 17 1.732 ns 0.0053 ns 0.0100 ns 1.729 ns 1.720 ns 1.761 ns 1.00 0.01 - NA
Equals_Bytes Job-DWACHG main 128 2.514 ns 0.1013 ns 0.2159 ns 2.422 ns 2.281 ns 2.968 ns 1.00 0.00 - NA
Equals_Bytes Job-UKFQFX PR 128 2.502 ns 0.0410 ns 0.0917 ns 2.476 ns 2.378 ns 2.730 ns 1.00 0.08 - NA
Equals_Chars Job-DWACHG main 128 4.921 ns 0.1212 ns 0.2712 ns 5.028 ns 4.173 ns 5.306 ns 1.00 0.00 - NA
Equals_Chars Job-UKFQFX PR 128 6.005 ns 0.2843 ns 0.6359 ns 6.013 ns 4.854 ns 9.322 ns 1.22 0.14 - NA
Equals_Bytes_Chars Job-DWACHG main 128 3.044 ns 0.0704 ns 0.1575 ns 2.973 ns 2.868 ns 3.463 ns 1.00 0.00 - NA
Equals_Bytes_Chars Job-UKFQFX PR 128 3.815 ns 0.6698 ns 1.4981 ns 3.271 ns 2.903 ns 8.177 ns 1.25 0.50 - NA
Equals_DifferentCase_Bytes Job-DWACHG main 128 1.545 ns 0.0173 ns 0.0337 ns 1.538 ns 1.523 ns 1.748 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes Job-UKFQFX PR 128 1.535 ns 0.0049 ns 0.0094 ns 1.534 ns 1.524 ns 1.557 ns 0.99 0.02 - NA
Equals_DifferentCase_Chars Job-DWACHG main 128 1.578 ns 0.0125 ns 0.0238 ns 1.582 ns 1.534 ns 1.619 ns 1.00 0.00 - NA
Equals_DifferentCase_Chars Job-UKFQFX PR 128 1.576 ns 0.0120 ns 0.0228 ns 1.579 ns 1.532 ns 1.638 ns 1.00 0.02 - NA
Equals_DifferentCase_Bytes_Chars Job-DWACHG main 128 1.958 ns 0.0162 ns 0.0307 ns 1.953 ns 1.910 ns 2.032 ns 1.00 0.00 - NA
Equals_DifferentCase_Bytes_Chars Job-UKFQFX PR 128 1.954 ns 0.0267 ns 0.0515 ns 1.934 ns 1.890 ns 2.081 ns 1.00 0.03 - NA
														  |

Benchmark code: just added different sizes to System.Text.Perf_Ascii in dotnet/performance
dotnet run -c Release -f net8.0 --filter System.Text.Perf_Ascii.Equals_* --memoryRandomization --launchCount 5 --corerun ... --artifacts ...

- 64bit swar
- Improve 128 widening on avx and remove unused Vector64 widen
- Mask the ascii check into the same cmp
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 8, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 8, 2023
@ghost
Copy link

ghost commented Oct 20, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

An attempt to optimize Ascii.Equals for smaller values

Changes include:

  • 64bit register simd
  • Use better widening for 128bit vector (same as the 256/512 path)
  • Mask the ascii check into the same comparison for 256/512
  • Changed Sse2.UnpackLow to Vector128.WidenLower and removed an unused Vector64 path (Vector128.WidenLower will use vpmovzxbw when AVX2 is available)

Now that the code is shared across the different vector sizes now, it should be easy to convert to ISimdVector :) but it doesn't have any widening methods yet.

Benchmarks:

BenchmarkDotNet v0.13.9-nightly.20230908.70, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2)
AMD Ryzen 7 7800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-alpha.1.23504.14
  [Host]     : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
  Job-GFFDUB : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-NUJPVG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms  
LaunchCount=5  MaxIterationCount=20  MemoryRandomization=True  
MinIterationCount=15  WarmupCount=1  

Method Job Toolchain Size Mean Error StdDev Median Ratio
Equals_Bytes Job-GFFDUB main 7 3.164 ns 0.0174 ns 0.0443 ns 3.158 ns 1.00
Equals_Bytes Job-NUJPVG PR 7 3.310 ns 0.0426 ns 0.1115 ns 3.274 ns 1.05
Equals_Chars Job-GFFDUB main 7 3.616 ns 0.0146 ns 0.0369 ns 3.601 ns 1.00
Equals_Chars Job-NUJPVG PR 7 1.766 ns 0.0031 ns 0.0077 ns 1.762 ns 0.49
Equals_Bytes_Chars Job-GFFDUB main 7 3.174 ns 0.0076 ns 0.0191 ns 3.176 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 7 3.257 ns 0.0066 ns 0.0166 ns 3.257 ns 1.03
Equals_DifferentCase_Bytes Job-GFFDUB main 7 1.356 ns 0.0031 ns 0.0078 ns 1.355 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 7 1.355 ns 0.0028 ns 0.0072 ns 1.352 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 7 1.357 ns 0.0030 ns 0.0077 ns 1.355 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 7 1.356 ns 0.0036 ns 0.0091 ns 1.352 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 7 1.353 ns 0.0030 ns 0.0076 ns 1.349 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 7 1.364 ns 0.0157 ns 0.0412 ns 1.353 ns 1.01
Equals_Bytes Job-GFFDUB main 9 4.063 ns 0.0459 ns 0.1310 ns 4.037 ns 1.00
Equals_Bytes Job-NUJPVG PR 9 1.765 ns 0.0026 ns 0.0067 ns 1.763 ns 0.43
Equals_Chars Job-GFFDUB main 9 2.158 ns 0.0057 ns 0.0144 ns 2.151 ns 1.00
Equals_Chars Job-NUJPVG PR 9 2.154 ns 0.0032 ns 0.0081 ns 2.151 ns 1.00
Equals_Bytes_Chars Job-GFFDUB main 9 3.961 ns 0.0303 ns 0.0767 ns 3.934 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 9 2.167 ns 0.0032 ns 0.0081 ns 2.166 ns 0.55
Equals_DifferentCase_Bytes Job-GFFDUB main 9 1.353 ns 0.0026 ns 0.0065 ns 1.351 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 9 1.354 ns 0.0029 ns 0.0072 ns 1.351 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 9 1.555 ns 0.0034 ns 0.0086 ns 1.551 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 9 1.553 ns 0.0024 ns 0.0061 ns 1.551 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 9 1.354 ns 0.0028 ns 0.0070 ns 1.351 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 9 1.565 ns 0.0024 ns 0.0060 ns 1.564 ns 1.16
Equals_Bytes Job-GFFDUB main 15 5.284 ns 0.0313 ns 0.0792 ns 5.258 ns 1.00
Equals_Bytes Job-NUJPVG PR 15 1.762 ns 0.0014 ns 0.0036 ns 1.761 ns 0.33
Equals_Chars Job-GFFDUB main 15 2.156 ns 0.0040 ns 0.0102 ns 2.152 ns 1.00
Equals_Chars Job-NUJPVG PR 15 2.155 ns 0.0043 ns 0.0110 ns 2.149 ns 1.00
Equals_Bytes_Chars Job-GFFDUB main 15 5.329 ns 0.0601 ns 0.1572 ns 5.284 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 15 2.167 ns 0.0033 ns 0.0084 ns 2.164 ns 0.41
Equals_DifferentCase_Bytes Job-GFFDUB main 15 1.356 ns 0.0044 ns 0.0111 ns 1.353 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 15 1.355 ns 0.0022 ns 0.0055 ns 1.354 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 15 1.555 ns 0.0032 ns 0.0081 ns 1.553 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 15 1.559 ns 0.0148 ns 0.0388 ns 1.550 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 15 1.331 ns 0.0187 ns 0.0472 ns 1.351 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 15 1.565 ns 0.0023 ns 0.0059 ns 1.564 ns 1.18
Equals_Bytes Job-GFFDUB main 17 2.154 ns 0.0040 ns 0.0101 ns 2.151 ns 1.00
Equals_Bytes Job-NUJPVG PR 17 2.154 ns 0.0047 ns 0.0119 ns 2.149 ns 1.00
Equals_Chars Job-GFFDUB main 17 2.193 ns 0.0312 ns 0.0793 ns 2.152 ns 1.00
Equals_Chars Job-NUJPVG PR 17 2.152 ns 0.0072 ns 0.0183 ns 2.147 ns 0.98
Equals_Bytes_Chars Job-GFFDUB main 17 2.747 ns 0.0029 ns 0.0074 ns 2.746 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 17 2.355 ns 0.0058 ns 0.0146 ns 2.350 ns 0.86
Equals_DifferentCase_Bytes Job-GFFDUB main 17 1.552 ns 0.0028 ns 0.0070 ns 1.549 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 17 1.551 ns 0.0023 ns 0.0057 ns 1.548 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 17 1.568 ns 0.0074 ns 0.0186 ns 1.564 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 17 1.570 ns 0.0067 ns 0.0169 ns 1.564 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 17 1.748 ns 0.0029 ns 0.0072 ns 1.747 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 17 1.756 ns 0.0041 ns 0.0104 ns 1.751 ns 1.00
Equals_Bytes Job-GFFDUB main 31 2.151 ns 0.0038 ns 0.0096 ns 2.149 ns 1.00
Equals_Bytes Job-NUJPVG PR 31 2.154 ns 0.0032 ns 0.0081 ns 2.150 ns 1.00
Equals_Chars Job-GFFDUB main 31 2.278 ns 0.0397 ns 0.1004 ns 2.340 ns 1.00
Equals_Chars Job-NUJPVG PR 31 2.151 ns 0.0050 ns 0.0127 ns 2.148 ns 0.95
Equals_Bytes_Chars Job-GFFDUB main 31 3.346 ns 0.0095 ns 0.0240 ns 3.335 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 31 2.351 ns 0.0079 ns 0.0199 ns 2.343 ns 0.70
Equals_DifferentCase_Bytes Job-GFFDUB main 31 1.551 ns 0.0023 ns 0.0059 ns 1.549 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 31 1.548 ns 0.0024 ns 0.0061 ns 1.549 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 31 1.563 ns 0.0058 ns 0.0146 ns 1.563 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 31 1.567 ns 0.0052 ns 0.0132 ns 1.565 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 31 1.744 ns 0.0031 ns 0.0079 ns 1.742 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 31 1.758 ns 0.0056 ns 0.0141 ns 1.753 ns 1.01
Equals_Bytes Job-GFFDUB main 33 2.236 ns 0.0352 ns 0.0889 ns 2.194 ns 1.00
Equals_Bytes Job-NUJPVG PR 33 2.148 ns 0.0081 ns 0.0204 ns 2.141 ns 0.96
Equals_Chars Job-GFFDUB main 33 2.755 ns 0.0436 ns 0.1272 ns 2.765 ns 1.00
Equals_Chars Job-NUJPVG PR 33 2.714 ns 0.0392 ns 0.1157 ns 2.743 ns 0.99
Equals_Bytes_Chars Job-GFFDUB main 33 2.838 ns 0.0834 ns 0.2108 ns 2.713 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 33 2.597 ns 0.0246 ns 0.0639 ns 2.600 ns 0.92
Equals_DifferentCase_Bytes Job-GFFDUB main 33 1.541 ns 0.0027 ns 0.0067 ns 1.538 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 33 1.562 ns 0.0589 ns 0.1542 ns 1.542 ns 1.01
Equals_DifferentCase_Chars Job-GFFDUB main 33 1.652 ns 0.0124 ns 0.0314 ns 1.652 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 33 1.653 ns 0.0136 ns 0.0343 ns 1.660 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 33 1.958 ns 0.0124 ns 0.0313 ns 1.945 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 33 1.837 ns 0.0129 ns 0.0326 ns 1.839 ns 0.94
Equals_Bytes Job-GFFDUB main 63 2.292 ns 0.0318 ns 0.0826 ns 2.333 ns 1.00
Equals_Bytes Job-NUJPVG PR 63 2.183 ns 0.0339 ns 0.0888 ns 2.168 ns 0.95
Equals_Chars Job-GFFDUB main 63 2.731 ns 0.0204 ns 0.0516 ns 2.737 ns 1.00
Equals_Chars Job-NUJPVG PR 63 2.810 ns 0.1053 ns 0.2936 ns 2.760 ns 1.04
Equals_Bytes_Chars Job-GFFDUB main 63 2.736 ns 0.0199 ns 0.0503 ns 2.737 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 63 2.642 ns 0.0146 ns 0.0368 ns 2.644 ns 0.97
Equals_DifferentCase_Bytes Job-GFFDUB main 63 1.544 ns 0.0032 ns 0.0081 ns 1.543 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 63 1.544 ns 0.0028 ns 0.0070 ns 1.541 ns 1.00
Equals_DifferentCase_Chars Job-GFFDUB main 63 1.682 ns 0.0131 ns 0.0332 ns 1.682 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 63 1.680 ns 0.0181 ns 0.0457 ns 1.677 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 63 1.980 ns 0.0129 ns 0.0327 ns 1.972 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 63 1.843 ns 0.0142 ns 0.0360 ns 1.844 ns 0.93
Equals_Bytes Job-GFFDUB main 65 2.339 ns 0.0335 ns 0.0989 ns 2.314 ns 1.00
Equals_Bytes Job-NUJPVG PR 65 2.630 ns 0.2313 ns 0.6820 ns 2.446 ns 1.13
Equals_Chars Job-GFFDUB main 65 4.042 ns 0.3730 ns 1.0997 ns 3.850 ns 1.00
Equals_Chars Job-NUJPVG PR 65 3.904 ns 0.1097 ns 0.3234 ns 3.885 ns 1.00
Equals_Bytes_Chars Job-GFFDUB main 65 3.704 ns 0.6477 ns 1.8791 ns 3.161 ns 1.00
Equals_Bytes_Chars Job-NUJPVG PR 65 3.315 ns 0.2402 ns 0.7082 ns 3.216 ns 0.99
Equals_DifferentCase_Bytes Job-GFFDUB main 65 1.550 ns 0.0077 ns 0.0196 ns 1.547 ns 1.00
Equals_DifferentCase_Bytes Job-NUJPVG PR 65 1.592 ns 0.0402 ns 0.1052 ns 1.547 ns 1.03
Equals_DifferentCase_Chars Job-GFFDUB main 65 1.665 ns 0.0163 ns 0.0421 ns 1.664 ns 1.00
Equals_DifferentCase_Chars Job-NUJPVG PR 65 1.669 ns 0.0117 ns 0.0296 ns 1.665 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-GFFDUB main 65 1.949 ns 0.0217 ns 0.0587 ns 1.931 ns 1.00
Equals_DifferentCase_Bytes_Chars Job-NUJPVG PR 65 2.052 ns 0.1793 ns 0.5288 ns 1.978 ns 1.06
																																						  |

Benchmark code: just added different sizes to System.Text.Perf_Ascii in dotnet/performance
dotnet run -c Release -f net8.0 --filter System.Text.Perf_Ascii.Equals_* --memoryRandomization --launchCount 5 --corerun ... --artifacts ...

Author: yesmey
Assignees: -
Labels:

area-System.Text.Encoding, community-contribution

Milestone: -

@adamsitnik
Copy link
Member

@gfoidl would you be interested in reviewing this PR?

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bug in the case Vector128.IsHardwareAccelerated = false -- see comments for rationale.

@adamsitnik adamsitnik added the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 6, 2023
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 9, 2023
Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming CI will pass --> LGTM

@yesmey
Copy link
Contributor Author

yesmey commented Nov 9, 2023

Updated benchmarks. Note that the regressions comes mainly from different codegen when returning the bool condition vs true/false. See comment #93191 (comment)
Example of regression:
Before: https://godbolt.org/z/Gnsc84Gsf
After: https://godbolt.org/z/8dMhhPMTn

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Asii changes LGTM, but please solve the merge conflict and address test comments before I hit the merge button.

Thank you for your contribution @yesmey !

@gfoidl big thanks for your review!

{
yield return new object[] { new string(i, i), string.Create(i, i, (destination, iteration) =>
if (chr != '?')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was valuable, please restore it

Suggested change
if (chr != '?')
if (chr != '?') // ASCIIEncoding maps invalid ASCII to ?

public static IEnumerable<object[]> ValidAsciiInputs
{
get
{
yield return new object[] { "test" };

for (char textLength = (char)0; textLength <= 127; textLength++)
foreach (int textLength in BufferLengths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a lot of test cases, I am not sure how long it's going to take to run all of them with debug builds.

Before your change, this test was covering strings that were 0 to 127 chars long. Now the max length is reduced to 33. I am not convinced that this is the right thing to do, please revert this particular change or convince me that it's the right thing to do.

@@ -55,15 +75,18 @@ public static IEnumerable<object[]> DifferentInputs
{
yield return new object[] { "tak", "nie" };

for (char i = (char)1; i <= 127; i++)
foreach (int textLength in BufferLengths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I don't see benefits of this change.

char left = i;
char right = char.IsAsciiLetterUpper(left) ? char.ToLower(left) : char.IsAsciiLetterLower(left) ? char.ToUpper(left) : left;
yield return new object[] { new string(left, i), new string(right, i) };
for (char chr = (char)0; chr <= 127; chr++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value of this change, but we should decrease the number of test cases. How about just focusing on Ascii letters? Because for other characters we would be duplicating other tests work.

@adamsitnik adamsitnik added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 11, 2023
@ghost ghost added the no-recent-activity label Dec 25, 2023
@ghost
Copy link

ghost commented Dec 25, 2023

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Jan 8, 2024

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

@ghost ghost closed this Jan 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 8, 2024
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants