Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequenceReader performance; elide bounds checks and book-keeping #82765

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

mgravell
Copy link
Member

@mgravell mgravell commented Feb 28, 2023

context: https://twitter.com/marcgravell/status/1629955240895078405 (read back for @davidfowl mention)

no direct side-by-side benchmark (what's a good way to arrange that here? I'm all ears), but in a proof-of-concept benchmark (see link above), achieved ~0.7 vs existing (note: that number was in-part based on the now removed ref T changes)

purpose: improved performance in SequenceReader

approach:

  • reduced book-keeping by usually only manipulating a single index
  • remove bounds-checking (since we're checking the bounds ourselves) via Unsafe.Add(ref T, int)
  • when possible (net7+) store the current span's ref T directly; otherwise, store the ReadOnlySpan<T> like before, and use JIT-friendly accessor to expose the same API to the implementation code
  • deconstruct SequencePosition for packing purposes
  • elide bounds checks in search by slicing for [0,len) loop
  • simplify position calculation
  • in search loop, avoid copying out the T value (of unknown size); use the ref instead
  • don't store/maintain "next" position; refactor next buffer fetch (removes need to retain "next")
  • reduce size to 72 bytes from 96 bytes (note: in theory we don't actually need _currentPositionInteger (by obtaining the outer span and initializing _currentSpanIndex accordingly), but because of padding that wouldn't help unless we were able to split out the _currentSpan into int plus ref T via unsafe add etc, which would allow us to drop to 64 bytes)
  • remove bounds checks in tryget/trypeek via hoist/uint check
  • use non-short-circuiting & in Advance, since we expect both sides to pass in the happy path (and both are safe)

benchmarks are here

(all payloads are 1024 bytes, split into chunks via SegmentLength; if this is negative, it is a vector, rather than a multi-segment)

typical results: https://raw.githubusercontent.com/mgravell/SequenceReaderBench/main/results.md

some similar benchmarks with ref T + int usage (unsafe, etc) don't show significant improvements in any cases

@ghost
Copy link

ghost commented Feb 28, 2023

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

context: https://twitter.com/marcgravell/status/1629955240895078405 (read back for @davidfowl mention)

purpose: improved performance in SequenceReader

approach:

  • reduced book-keeping by usually only manipulating a single index
  • remove bounds-checking (since we're checking the bounds ourselves) via Unsafe.AddRef
  • when possible (net7+) store the current span's ref T directly
Author: mgravell
Assignees: mgravell
Labels:

area-System.Memory

Milestone: -

@mgravell
Copy link
Member Author

@stephentoub well, I won't pretend not to be disappointed - however, thoughts on just reducing the book-keeping etc?

@stephentoub
Copy link
Member

thoughts on just reducing the book-keeping etc?

If there are safe ways of making it faster/smaller/etc., that's great, let's do it. And as I noted, I expect there are ways of writing the code differently such that bounds checks can be eliminated even without using Unsafe.

@mgravell mgravell changed the title SequenceReader performance; elide bounds checks and book-keeping by using ref-oriented approach SequenceReader performance; elide bounds checks and book-keeping Mar 1, 2023
@mgravell
Copy link
Member Author

mgravell commented Mar 1, 2023

@stephentoub fair enough; I have switched back from the unsafe to indexed, but improved a bunch of other related things :)

@mgravell
Copy link
Member Author

mgravell commented Mar 1, 2023

disclosure: I'm not currently able to run the tests locally (although it builds cleanly):

Failed to launch testhost with error: System.IO.FileNotFoundException: Path does not exist: C:\Code\runtime\sequencereader\artifacts\bin\testhost\net8.0-windows-Debug-x64\dotnet.exe
File name: 'C:\Code\runtime\sequencereader\artifacts\bin\testhost\net8.0-windows-Debug-x64\dotnet.exe'

I've tried about 30 things, but I can't get that working right now, and I can't find an idiot-proof guide to making it happy.

@stephentoub
Copy link
Member

disclosure: I'm not currently able to run the tests locally (although it builds cleanly):

How are you building and running tests?

@mgravell
Copy link
Member Author

mgravell commented Mar 1, 2023

locally, with .net8 preview installed; at command-line, dotnet build in ...\src\libraries\System.Memory\src and and ...\src\libraries\System.Memory\tests - and dotnet test in ...\src\libraries\System.Memory\tests - fails as above; also using IDE via \src\libraries\System.Memory\System.Memory.sln, but fails with same error in output window

in root, build.cmd works in my main branch, but does not generate the missing artifacts; oddly build.cmd does not work in my branch, failing with a build.ninja failure about a missing C compiler; which is weird because it works everywhere else, including the coreclr build

@stephentoub
Copy link
Member

Make sure you've done a clean build of the repo from root first, e.g.

git clean -xdf
build clr+libs -rc release

and only then try to iterate on a specific library

cd src\libraries\System.Memory\src
dotnet build
cd ..\tests
dotnet build /t:tests

Note as well that using dotnet like that in the subdirectory is going to use whatever sdk you have installed, whereas the root build via the build script will use a local sdk that's downloaded for the repo. If you want to use that one in the subdirectories as well, you can refer to it directly via the dotnet.cmd/sh that's in the repo's root directory.

@mgravell
Copy link
Member Author

mgravell commented Mar 1, 2023

done all that (using /t:test - tests wasn't found); no change - it is still trying to use dotnet from ...\artifacts\bin\testhost\net8.0-windows-Debug-x64, which is empty, regardless of dotnet vs dotnet.cmd; the dotnet.cmd from the root seems to be using the program-files version of preview1, so... yeah, no idea what is going on; just:

'"C:\Code\runtime\sequencereader\artifacts\bin\testhost\net8.0-windows-Debug-x64\dotnet.exe"' is not recognized as an internal or external command, operable program or batch file.

(because it does not exist)

@mgravell
Copy link
Member Author

mgravell commented Mar 2, 2023

interestingly, TryPeek seems to suffer a weird regression only on net7 (fine on net6 and net8); I propose to ignore net7, and I'll re-run the benchmarks only on net8 since this doesn't feel like a backport thing (I'm very intrigued if anyone knows what this regression may be - presumably a JIT change around branches):

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public readonly bool TryPeek(out T value) {
            AssertValidPosition(); // note: [Conditional("DEBUG")]
            if (_currentSpanIndex == _currentSpan.Length) // only true at EOF due to eager read
            {
                value = default;
                return false;
            }
            value = _currentSpan[_currentSpanIndex];
            return true;
        }

results:

|        Method |      Job |  Runtime | SegmentLength |       Mean |    Error |   StdDev |
|-------------- |--------- |--------- |-------------- |-----------:|---------:|---------:|
| SystemTryPeek | .NET 6.0 | .NET 6.0 |         -1024 |   747.2 ns | 14.42 ns | 14.81 ns |
| CustomTryPeek | .NET 6.0 | .NET 6.0 |         -1024 |   687.9 ns |  9.98 ns |  9.33 ns |
| SystemTryPeek | .NET 6.0 | .NET 6.0 |             4 |   749.9 ns | 11.27 ns |  9.99 ns |
| CustomTryPeek | .NET 6.0 | .NET 6.0 |             4 |   692.4 ns | 11.89 ns | 10.54 ns |
| SystemTryPeek | .NET 6.0 | .NET 6.0 |           128 |   742.4 ns | 10.89 ns | 10.19 ns |
| CustomTryPeek | .NET 6.0 | .NET 6.0 |           128 |   690.3 ns |  9.60 ns |  8.98 ns |
| SystemTryPeek | .NET 6.0 | .NET 6.0 |          1024 |   743.5 ns |  5.95 ns |  5.56 ns |
| CustomTryPeek | .NET 6.0 | .NET 6.0 |          1024 |   682.0 ns |  3.82 ns |  3.57 ns |
|               |          |          |               |            |          |          |
| SystemTryPeek | .NET 7.0 | .NET 7.0 |         -1024 |   679.6 ns |  8.15 ns |  7.23 ns |
| CustomTryPeek | .NET 7.0 | .NET 7.0 |         -1024 | 1,121.2 ns | 12.48 ns |  9.74 ns |
| SystemTryPeek | .NET 7.0 | .NET 7.0 |             4 |   683.3 ns | 11.10 ns | 10.38 ns |
| CustomTryPeek | .NET 7.0 | .NET 7.0 |             4 | 1,123.8 ns |  9.98 ns |  8.33 ns |
| SystemTryPeek | .NET 7.0 | .NET 7.0 |           128 |   685.8 ns |  9.32 ns |  8.72 ns |
| CustomTryPeek | .NET 7.0 | .NET 7.0 |           128 | 1,118.5 ns | 10.64 ns |  9.95 ns |
| SystemTryPeek | .NET 7.0 | .NET 7.0 |          1024 |   679.0 ns |  6.24 ns |  5.84 ns |
| CustomTryPeek | .NET 7.0 | .NET 7.0 |          1024 | 1,116.6 ns |  6.57 ns |  5.49 ns |
|               |          |          |               |            |          |          |
| SystemTryPeek | .NET 8.0 | .NET 8.0 |         -1024 |   704.7 ns | 13.94 ns | 25.50 ns |
| CustomTryPeek | .NET 8.0 | .NET 8.0 |         -1024 |   630.5 ns |  8.47 ns |  7.92 ns |
| SystemTryPeek | .NET 8.0 | .NET 8.0 |             4 |   690.5 ns |  9.48 ns |  8.86 ns |
| CustomTryPeek | .NET 8.0 | .NET 8.0 |             4 |   625.4 ns |  5.49 ns |  4.59 ns |
| SystemTryPeek | .NET 8.0 | .NET 8.0 |           128 |   686.4 ns |  8.86 ns |  8.29 ns |
| CustomTryPeek | .NET 8.0 | .NET 8.0 |           128 |   624.5 ns |  5.68 ns |  5.31 ns |
| SystemTryPeek | .NET 8.0 | .NET 8.0 |          1024 |   696.3 ns | 12.07 ns | 10.70 ns |
| CustomTryPeek | .NET 8.0 | .NET 8.0 |          1024 |   631.1 ns |  8.28 ns |  7.74 ns |

(I also yanked the _currentSpan.Length into a field, to see if it was not inlining the .Length, but: no difference)

@stephentoub
Copy link
Member

I'm not sure why there's a .NET 7 regression, but regardless of target, I suggest trying pulling the fields into locals prior to doing the comparison; you can do it in a way that'll result in smaller code and no bounds check:
SharpLab

@mgravell
Copy link
Member Author

mgravell commented Mar 2, 2023

@stephentoub ironically despite the demonstrably better opcode, that tweak actually seems to react negatively in tests, on net7 and net8 - again like I'm hitting some JIT regression; if you look at net6, it looks great! but:

  • on net7, both approaches regress vs baseline, the hoist+uint regressing less (i.e. the better of the two)
  • on net8, my pre-hoist version improves, and the hoist+uint regresses significantly

should I flag these somewhere else? I can put a fully runnable version somewhere

|         Method |      Job |  Runtime | SegmentLength |       Mean |    Error |   StdDev | Ratio | RatioSD |
|--------------- |--------- |--------- |-------------- |-----------:|---------:|---------:|------:|--------:|
|  SystemTryPeek | .NET 6.0 | .NET 6.0 |         -1024 |   743.3 ns | 11.42 ns | 10.69 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 6.0 | .NET 6.0 |         -1024 |   675.4 ns |  1.86 ns |  1.55 ns |  0.91 |    0.01 |
| CustomTryPeek2 | .NET 6.0 | .NET 6.0 |         -1024 |   622.8 ns |  5.77 ns |  5.12 ns |  0.84 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 6.0 | .NET 6.0 |             4 |   734.8 ns |  8.49 ns |  7.53 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 6.0 | .NET 6.0 |             4 |   673.0 ns |  1.06 ns |  0.99 ns |  0.92 |    0.01 |
| CustomTryPeek2 | .NET 6.0 | .NET 6.0 |             4 |   620.6 ns |  5.98 ns |  5.00 ns |  0.84 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 6.0 | .NET 6.0 |           128 |   744.8 ns | 10.64 ns |  9.95 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 6.0 | .NET 6.0 |           128 |   681.8 ns |  8.88 ns |  8.31 ns |  0.92 |    0.02 |
| CustomTryPeek2 | .NET 6.0 | .NET 6.0 |           128 |   629.0 ns |  9.74 ns |  9.11 ns |  0.84 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 6.0 | .NET 6.0 |          1024 |   732.0 ns |  3.34 ns |  2.96 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 6.0 | .NET 6.0 |          1024 |   678.1 ns |  4.33 ns |  4.05 ns |  0.93 |    0.01 |
| CustomTryPeek2 | .NET 6.0 | .NET 6.0 |          1024 |   618.9 ns |  2.27 ns |  1.89 ns |  0.85 |    0.00 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 7.0 | .NET 7.0 |         -1024 |   672.4 ns |  2.22 ns |  1.97 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 7.0 | .NET 7.0 |         -1024 | 1,112.8 ns |  9.43 ns |  8.82 ns |  1.65 |    0.01 |
| CustomTryPeek2 | .NET 7.0 | .NET 7.0 |         -1024 |   891.5 ns |  2.13 ns |  1.89 ns |  1.33 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 7.0 | .NET 7.0 |             4 |   673.8 ns |  3.40 ns |  2.84 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 7.0 | .NET 7.0 |             4 | 1,143.1 ns | 22.19 ns | 24.66 ns |  1.69 |    0.03 |
| CustomTryPeek2 | .NET 7.0 | .NET 7.0 |             4 |   895.9 ns | 10.14 ns |  8.99 ns |  1.33 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 7.0 | .NET 7.0 |           128 |   676.0 ns |  4.09 ns |  3.63 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 7.0 | .NET 7.0 |           128 | 1,123.0 ns | 17.91 ns | 16.76 ns |  1.66 |    0.03 |
| CustomTryPeek2 | .NET 7.0 | .NET 7.0 |           128 |   895.3 ns |  6.10 ns |  5.41 ns |  1.32 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 7.0 | .NET 7.0 |          1024 |   682.5 ns |  8.38 ns |  7.84 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 7.0 | .NET 7.0 |          1024 | 1,125.3 ns | 18.56 ns | 17.36 ns |  1.65 |    0.03 |
| CustomTryPeek2 | .NET 7.0 | .NET 7.0 |          1024 |   899.3 ns |  9.17 ns |  8.57 ns |  1.32 |    0.02 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 8.0 | .NET 8.0 |         -1024 |   670.7 ns |  1.06 ns |  0.94 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 8.0 | .NET 8.0 |         -1024 |   619.8 ns |  4.84 ns |  4.29 ns |  0.92 |    0.01 |
| CustomTryPeek2 | .NET 8.0 | .NET 8.0 |         -1024 | 1,116.0 ns | 13.13 ns | 10.96 ns |  1.66 |    0.02 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 8.0 | .NET 8.0 |             4 |   673.0 ns |  1.18 ns |  0.98 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 8.0 | .NET 8.0 |             4 |   625.0 ns |  7.32 ns |  6.85 ns |  0.93 |    0.01 |
| CustomTryPeek2 | .NET 8.0 | .NET 8.0 |             4 | 1,131.7 ns | 11.41 ns | 10.67 ns |  1.68 |    0.01 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 8.0 | .NET 8.0 |           128 |   683.7 ns |  6.76 ns |  6.32 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 8.0 | .NET 8.0 |           128 |   633.2 ns |  9.56 ns |  8.94 ns |  0.93 |    0.01 |
| CustomTryPeek2 | .NET 8.0 | .NET 8.0 |           128 | 1,132.7 ns | 17.05 ns | 15.11 ns |  1.66 |    0.03 |
|                |          |          |               |            |          |          |       |         |
|  SystemTryPeek | .NET 8.0 | .NET 8.0 |          1024 |   683.4 ns | 10.02 ns |  9.37 ns |  1.00 |    0.00 |
|  CustomTryPeek | .NET 8.0 | .NET 8.0 |          1024 |   629.9 ns | 10.97 ns | 10.26 ns |  0.92 |    0.02 |
| CustomTryPeek2 | .NET 8.0 | .NET 8.0 |          1024 | 1,136.0 ns | 21.21 ns | 20.84 ns |  1.66 |    0.03 |

fortunately "read" is fine with the hoist everywhere:

|         Method |      Job |  Runtime | SegmentLength |     Mean |     Error |    StdDev | Ratio | RatioSD |
|--------------- |--------- |--------- |-------------- |---------:|----------:|----------:|------:|--------:|
|  SystemTryRead | .NET 6.0 | .NET 6.0 |         -1024 | 1.596 us | 0.0177 us | 0.0166 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 6.0 | .NET 6.0 |         -1024 | 1.557 us | 0.0088 us | 0.0083 us |  0.98 |    0.01 |
| CustomTryRead2 | .NET 6.0 | .NET 6.0 |         -1024 | 1.344 us | 0.0014 us | 0.0011 us |  0.84 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 6.0 | .NET 6.0 |             4 | 4.028 us | 0.0630 us | 0.0590 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 6.0 | .NET 6.0 |             4 | 2.628 us | 0.0301 us | 0.0281 us |  0.65 |    0.01 |
| CustomTryRead2 | .NET 6.0 | .NET 6.0 |             4 | 2.552 us | 0.0234 us | 0.0219 us |  0.63 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 6.0 | .NET 6.0 |           128 | 1.697 us | 0.0128 us | 0.0120 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 6.0 | .NET 6.0 |           128 | 1.870 us | 0.0169 us | 0.0150 us |  1.10 |    0.01 |
| CustomTryRead2 | .NET 6.0 | .NET 6.0 |           128 | 1.454 us | 0.0031 us | 0.0026 us |  0.86 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 6.0 | .NET 6.0 |          1024 | 1.480 us | 0.0132 us | 0.0124 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 6.0 | .NET 6.0 |          1024 | 1.657 us | 0.0063 us | 0.0056 us |  1.12 |    0.01 |
| CustomTryRead2 | .NET 6.0 | .NET 6.0 |          1024 | 1.345 us | 0.0074 us | 0.0062 us |  0.91 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 7.0 | .NET 7.0 |         -1024 | 1.560 us | 0.0111 us | 0.0099 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 7.0 | .NET 7.0 |         -1024 | 1.574 us | 0.0051 us | 0.0042 us |  1.01 |    0.01 |
| CustomTryRead2 | .NET 7.0 | .NET 7.0 |         -1024 | 1.147 us | 0.0227 us | 0.0212 us |  0.74 |    0.02 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 7.0 | .NET 7.0 |             4 | 3.971 us | 0.0777 us | 0.0864 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 7.0 | .NET 7.0 |             4 | 2.501 us | 0.0488 us | 0.0432 us |  0.63 |    0.02 |
| CustomTryRead2 | .NET 7.0 | .NET 7.0 |             4 | 2.628 us | 0.0496 us | 0.0509 us |  0.66 |    0.02 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 7.0 | .NET 7.0 |           128 | 1.681 us | 0.0065 us | 0.0060 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 7.0 | .NET 7.0 |           128 | 1.734 us | 0.0033 us | 0.0026 us |  1.03 |    0.00 |
| CustomTryRead2 | .NET 7.0 | .NET 7.0 |           128 | 1.232 us | 0.0029 us | 0.0026 us |  0.73 |    0.00 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 7.0 | .NET 7.0 |          1024 | 1.566 us | 0.0080 us | 0.0071 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 7.0 | .NET 7.0 |          1024 | 1.722 us | 0.0031 us | 0.0027 us |  1.10 |    0.01 |
| CustomTryRead2 | .NET 7.0 | .NET 7.0 |          1024 | 1.138 us | 0.0183 us | 0.0172 us |  0.73 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 8.0 | .NET 8.0 |         -1024 | 1.385 us | 0.0260 us | 0.0319 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 8.0 | .NET 8.0 |         -1024 | 1.365 us | 0.0217 us | 0.0203 us |  0.98 |    0.02 |
| CustomTryRead2 | .NET 8.0 | .NET 8.0 |         -1024 | 1.135 us | 0.0056 us | 0.0052 us |  0.81 |    0.02 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 8.0 | .NET 8.0 |             4 | 3.519 us | 0.0283 us | 0.0251 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 8.0 | .NET 8.0 |             4 | 2.917 us | 0.0302 us | 0.0267 us |  0.83 |    0.01 |
| CustomTryRead2 | .NET 8.0 | .NET 8.0 |             4 | 2.295 us | 0.0203 us | 0.0180 us |  0.65 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 8.0 | .NET 8.0 |           128 | 1.482 us | 0.0193 us | 0.0181 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 8.0 | .NET 8.0 |           128 | 1.453 us | 0.0068 us | 0.0060 us |  0.98 |    0.01 |
| CustomTryRead2 | .NET 8.0 | .NET 8.0 |           128 | 1.256 us | 0.0039 us | 0.0033 us |  0.85 |    0.01 |
|                |          |          |               |          |           |           |       |         |
|  SystemTryRead | .NET 8.0 | .NET 8.0 |          1024 | 1.348 us | 0.0107 us | 0.0095 us |  1.00 |    0.00 |
|  CustomTryRead | .NET 8.0 | .NET 8.0 |          1024 | 1.350 us | 0.0058 us | 0.0055 us |  1.00 |    0.01 |
| CustomTryRead2 | .NET 8.0 | .NET 8.0 |          1024 | 1.131 us | 0.0027 us | 0.0026 us |  0.84 |    0.01 |

@adamsitnik
Copy link
Member

@mgravell Is my understanding correct that it improves perf on .NET 6 and 7, but for unknown reason regresses on 8? Is there any way I could help you with getting this PR merged (beside the code review ofc)? I could run it on my hardware for .NET 8 main vs PR and get the trace files/disassembly if needed.

@adamsitnik adamsitnik self-assigned this May 22, 2023
@adamsitnik adamsitnik added the tenet-performance Performance related issue label May 22, 2023
@jozkee
Copy link
Member

jozkee commented Aug 14, 2023

I managed to rebase this PR on top of main and compared the results against main. There are very nice improvements with the exception of TryRead and TryReadTo which shows a regression, especially on the small segment length case. We may want to hold-off this change until 9.0.

Results
BenchmarkDotNet v0.13.7-nightly.20230717.35, Windows 11 (10.0.22621.2134/22H2/2022Update/SunValley2)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.100-preview.7.23376.3
  [Host]     : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2
  Job-VDEYSP : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-JFPICV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  


|               Method |        Job |  Toolchain | SegmentLength |        Mean |     Error |    StdDev |      Median |         Min |         Max | Ratio | MannWhitney(3ms) | RatioSD | Allocated | Alloc Ratio |
|--------------------- |----------- |----------- |-------------- |------------:|----------:|----------:|------------:|------------:|------------:|------:|----------------- |--------:|----------:|------------:|
| SystemAdvancePastAny | Job-VDEYSP | feature    |         -1024 |   235.90 ns |  2.218 ns |  1.852 ns |   236.46 ns |   232.99 ns |   238.75 ns |  0.55 |             Same |    0.01 |         - |          NA |
| SystemAdvancePastAny | Job-JFPICV | main       |         -1024 |   432.11 ns |  2.496 ns |  2.335 ns |   432.02 ns |   428.73 ns |   436.98 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
| SystemAdvancePastAny | Job-VDEYSP | feature    |             4 | 1,675.89 ns |  4.426 ns |  3.923 ns | 1,674.76 ns | 1,671.46 ns | 1,684.04 ns |  0.94 |             Same |    0.00 |         - |          NA |
| SystemAdvancePastAny | Job-JFPICV | main       |             4 | 1,785.57 ns |  7.547 ns |  6.690 ns | 1,785.14 ns | 1,775.78 ns | 1,798.89 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
| SystemAdvancePastAny | Job-VDEYSP | feature    |           128 |   313.67 ns |  3.434 ns |  3.212 ns |   313.32 ns |   310.05 ns |   320.11 ns |  0.62 |             Same |    0.01 |         - |          NA |
| SystemAdvancePastAny | Job-JFPICV | main       |           128 |   507.31 ns |  1.234 ns |  1.093 ns |   507.05 ns |   506.04 ns |   509.61 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
| SystemAdvancePastAny | Job-VDEYSP | feature    |          1024 |   233.57 ns |  1.798 ns |  1.682 ns |   232.56 ns |   232.03 ns |   237.22 ns |  0.54 |             Same |    0.00 |         - |          NA |
| SystemAdvancePastAny | Job-JFPICV | main       |          1024 |   435.17 ns |  0.879 ns |  0.822 ns |   434.92 ns |   434.27 ns |   436.68 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemAdvance | Job-VDEYSP | feature    |         -1024 |   442.82 ns |  6.444 ns |  6.027 ns |   441.80 ns |   436.09 ns |   453.74 ns |  1.00 |             Same |    0.01 |         - |          NA |
|        SystemAdvance | Job-JFPICV | main       |         -1024 |   440.75 ns |  1.499 ns |  1.171 ns |   440.77 ns |   438.91 ns |   442.45 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemAdvance | Job-VDEYSP | feature    |             4 | 1,939.14 ns |  6.711 ns |  6.278 ns | 1,938.32 ns | 1,930.90 ns | 1,949.90 ns |  0.97 |             Same |    0.01 |         - |          NA |
|        SystemAdvance | Job-JFPICV | main       |             4 | 1,992.90 ns |  6.847 ns |  6.070 ns | 1,993.70 ns | 1,985.12 ns | 2,006.13 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemAdvance | Job-VDEYSP | feature    |           128 |   514.84 ns |  1.342 ns |  1.190 ns |   514.68 ns |   513.30 ns |   517.49 ns |  0.96 |             Same |    0.00 |         - |          NA |
|        SystemAdvance | Job-JFPICV | main       |           128 |   537.23 ns |  1.004 ns |  0.839 ns |   537.03 ns |   536.11 ns |   538.77 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemAdvance | Job-VDEYSP | feature    |          1024 |   439.22 ns |  1.397 ns |  1.238 ns |   438.82 ns |   437.83 ns |   441.31 ns |  0.99 |             Same |    0.00 |         - |          NA |
|        SystemAdvance | Job-JFPICV | main       |          1024 |   443.68 ns |  1.115 ns |  1.043 ns |   443.44 ns |   442.62 ns |   445.81 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|       SystemPosition | Job-VDEYSP | feature    |         -1024 |   224.71 ns |  0.797 ns |  0.746 ns |   224.38 ns |   224.01 ns |   226.05 ns |  0.33 |             Same |    0.00 |         - |          NA |
|       SystemPosition | Job-JFPICV | main       |         -1024 |   681.10 ns |  2.445 ns |  2.287 ns |   681.14 ns |   677.98 ns |   686.19 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|       SystemPosition | Job-VDEYSP | feature    |             4 |   450.55 ns |  1.117 ns |  1.045 ns |   450.19 ns |   449.03 ns |   452.32 ns |  0.25 |             Same |    0.00 |         - |          NA |
|       SystemPosition | Job-JFPICV | main       |             4 | 1,772.73 ns | 14.692 ns | 13.743 ns | 1,779.07 ns | 1,745.56 ns | 1,790.50 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|       SystemPosition | Job-VDEYSP | feature    |           128 |   227.25 ns |  0.687 ns |  0.643 ns |   226.90 ns |   226.65 ns |   228.37 ns |  0.13 |             Same |    0.00 |         - |          NA |
|       SystemPosition | Job-JFPICV | main       |           128 | 1,728.64 ns |  3.115 ns |  2.914 ns | 1,728.06 ns | 1,725.46 ns | 1,734.62 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|       SystemPosition | Job-VDEYSP | feature    |          1024 |   227.99 ns |  0.721 ns |  0.675 ns |   227.68 ns |   227.40 ns |   229.30 ns |  0.33 |             Same |    0.00 |         - |          NA |
|       SystemPosition | Job-JFPICV | main       |          1024 |   682.82 ns |  2.620 ns |  2.451 ns |   682.41 ns |   679.96 ns |   687.86 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryPeek | Job-VDEYSP | feature    |         -1024 |   434.19 ns |  0.631 ns |  0.527 ns |   434.01 ns |   433.71 ns |   435.55 ns |  0.95 |             Same |    0.01 |         - |          NA |
|        SystemTryPeek | Job-JFPICV | main       |         -1024 |   456.50 ns |  8.852 ns |  8.694 ns |   454.83 ns |   448.30 ns |   478.09 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryPeek | Job-VDEYSP | feature    |             4 |   443.23 ns |  2.940 ns |  2.606 ns |   442.49 ns |   440.53 ns |   448.91 ns |  0.99 |             Same |    0.02 |         - |          NA |
|        SystemTryPeek | Job-JFPICV | main       |             4 |   449.71 ns |  7.907 ns |  7.396 ns |   447.17 ns |   440.26 ns |   462.55 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryPeek | Job-VDEYSP | feature    |           128 |   444.70 ns |  2.642 ns |  2.206 ns |   444.63 ns |   441.52 ns |   448.45 ns |  0.98 |             Same |    0.01 |         - |          NA |
|        SystemTryPeek | Job-JFPICV | main       |           128 |   452.82 ns |  5.928 ns |  5.545 ns |   453.32 ns |   442.99 ns |   462.30 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryPeek | Job-VDEYSP | feature    |          1024 |   438.19 ns |  1.099 ns |  1.028 ns |   438.31 ns |   436.65 ns |   439.89 ns |  0.94 |             Same |    0.01 |         - |          NA |
|        SystemTryPeek | Job-JFPICV | main       |          1024 |   468.28 ns |  7.365 ns |  6.889 ns |   471.56 ns |   459.28 ns |   478.39 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|      SystemTryReadTo | Job-VDEYSP | feature    |         -1024 |    85.16 ns |  0.258 ns |  0.241 ns |    85.06 ns |    84.88 ns |    85.60 ns |  1.04 |             Same |    0.00 |         - |          NA |
|      SystemTryReadTo | Job-JFPICV | main       |         -1024 |    82.04 ns |  0.482 ns |  0.451 ns |    82.10 ns |    80.68 ns |    82.57 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|      SystemTryReadTo | Job-VDEYSP | feature    |             4 | 3,482.61 ns | 12.050 ns | 11.272 ns | 3,481.28 ns | 3,461.31 ns | 3,504.64 ns |  1.82 |             Same |    0.00 |         - |          NA |
|      SystemTryReadTo | Job-JFPICV | main       |             4 | 1,913.75 ns |  4.098 ns |  3.633 ns | 1,911.78 ns | 1,910.57 ns | 1,921.80 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|      SystemTryReadTo | Job-VDEYSP | feature    |           128 |   168.54 ns |  0.508 ns |  0.450 ns |   168.56 ns |   167.83 ns |   169.15 ns |  1.33 |             Same |    0.02 |         - |          NA |
|      SystemTryReadTo | Job-JFPICV | main       |           128 |   126.69 ns |  1.687 ns |  1.578 ns |   126.93 ns |   123.68 ns |   129.13 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|      SystemTryReadTo | Job-VDEYSP | feature    |          1024 |    87.80 ns |  0.252 ns |  0.236 ns |    87.79 ns |    87.50 ns |    88.17 ns |  1.36 |             Same |    0.01 |         - |          NA |
|      SystemTryReadTo | Job-JFPICV | main       |          1024 |    64.73 ns |  0.420 ns |  0.392 ns |    64.69 ns |    64.24 ns |    65.47 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryRead | Job-VDEYSP | feature    |         -1024 |   446.80 ns |  1.273 ns |  1.128 ns |   446.39 ns |   445.46 ns |   448.88 ns |  0.49 |             Same |    0.00 |         - |          NA |
|        SystemTryRead | Job-JFPICV | main       |         -1024 |   906.08 ns |  2.883 ns |  2.556 ns |   905.21 ns |   903.20 ns |   912.33 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryRead | Job-VDEYSP | feature    |             4 | 1,739.64 ns |  4.863 ns |  4.311 ns | 1,739.64 ns | 1,732.15 ns | 1,747.22 ns |  1.08 |             Same |    0.00 |         - |          NA |
|        SystemTryRead | Job-JFPICV | main       |             4 | 1,609.45 ns |  4.496 ns |  3.986 ns | 1,609.09 ns | 1,603.75 ns | 1,615.48 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryRead | Job-VDEYSP | feature    |           128 |   535.58 ns |  8.214 ns |  6.859 ns |   539.07 ns |   520.98 ns |   543.78 ns |  0.56 |             Same |    0.01 |         - |          NA |
|        SystemTryRead | Job-JFPICV | main       |           128 |   962.53 ns |  2.895 ns |  2.708 ns |   962.00 ns |   958.22 ns |   967.11 ns |  1.00 |             Base |    0.00 |         - |          NA |
|                      |            |            |               |             |           |           |             |             |             |       |                  |         |           |             |
|        SystemTryRead | Job-VDEYSP | feature    |          1024 |   457.04 ns |  3.138 ns |  2.620 ns |   457.66 ns |   453.96 ns |   462.61 ns |  0.50 |             Same |    0.01 |         - |          NA |
|        SystemTryRead | Job-JFPICV | main       |          1024 |   911.87 ns | 14.463 ns | 13.529 ns |   906.81 ns |   897.46 ns |   936.71 ns |  1.00 |             Base |    0.00 |         - |          NA |

@jozkee jozkee modified the milestones: 8.0.0, 9.0.0 Aug 14, 2023
@jozkee
Copy link
Member

jozkee commented Aug 17, 2023

@EgorBo @tannergooding I'm trying to understand the reason of a regression (modified in this PR) on SequenceReader.TryRead when the segments are small i.e: the longer the loop, the bigger the regression. Any advice you could give here of which may be causing it?

The benchmark in question:

[Benchmark]
public int SystemTryRead()
{
    var reader = new System.Buffers.SequenceReader<int>(payload);
    int count = 0;
    while (reader.TryRead(out _))
    {
        count++;
    }
    return count;
}

New TryRead (inlined in SystemTryRead benchmark)

Disassembly:
; System.Memory.SequenceReaderBenchmark.SystemTryRead()
       push      r15
       push      r14
       push      r13
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,0A0
       vzeroupper
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+20],xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       mov       rax,0FFFFFFFFFFFFFFA0
M00_L00:
       vmovdqa   xmmword ptr [rsp+rax+0A0],xmm4
       vmovdqa   xmmword ptr [rsp+rax+0B0],xmm4
       vmovdqa   xmmword ptr [rsp+rax+0C0],xmm4
       add       rax,30
       jne       short M00_L00
       vmovdqu   xmm0,xmmword ptr [rcx+10]
       vmovdqu   xmmword ptr [rsp+40],xmm0
       mov       rdx,[rcx+20]
       mov       [rsp+50],rdx
       vmovdqu   xmm0,xmmword ptr [rsp+40]
       vmovdqu   xmmword ptr [rsp+78],xmm0
       mov       rcx,[rsp+50]
       mov       [rsp+88],rcx
       mov       rcx,[rsp+40]
       mov       edx,[rsp+50]
       and       edx,7FFFFFFF
       mov       [rsp+58],rcx
       mov       [rsp+70],edx
       mov       r8,[rsp+40]
       test      r8,r8
       je        near ptr M00_L17
       mov       ebx,[rsp+50]
       mov       esi,[rsp+54]
       xor       edi,edi
       cmp       r8,[rsp+48]
       setne     dil
       mov       ecx,ebx
       or        ecx,esi
       jl        near ptr M00_L15
       mov       rax,r8
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rax],rcx
       jne       near ptr M00_L10
M00_L01:
       mov       rcx,[rax+18]
       mov       ebp,[rax+20]
       mov       r14d,[rax+24]
       xor       r15d,r15d
       xor       r13d,r13d
       test      rcx,rcx
       je        short M00_L03
       mov       rax,[rcx]
       test      dword ptr [rax],80000000
       je        near ptr M00_L11
       lea       r15,[rcx+10]
       mov       r13d,[rcx+8]
M00_L02:
       and       ebp,7FFFFFFF
       mov       ecx,ebp
       mov       eax,r14d
       add       rax,rcx
       mov       edx,r13d
       cmp       rax,rdx
       ja        near ptr M00_L13
       lea       r15,[r15+rcx*4]
       mov       r13d,r14d
M00_L03:
       test      edi,edi
       je        near ptr M00_L12
       cmp       ebx,r13d
       ja        near ptr M00_L13
       mov       ecx,ebx
       lea       rcx,[r15+rcx*4]
       sub       r13d,ebx
       mov       [rsp+30],rcx
       mov       [rsp+38],r13d
M00_L04:
       vmovdqu   xmm0,xmmword ptr [rsp+30]
       vmovdqu   xmmword ptr [rsp+90],xmm0
       xor       ecx,ecx
       mov       [rsp+74],ecx
       mov       [rsp+68],rcx
       mov       rcx,[rsp+40]
       cmp       rcx,[rsp+48]
       je        near ptr M00_L18
       mov       qword ptr [rsp+60],0FFFFFFFFFFFFFFFF
       cmp       dword ptr [rsp+98],0
       je        near ptr M00_L16
M00_L05:
       xor       edi,edi
M00_L06:
       mov       ecx,[rsp+74]
       mov       rax,[rsp+90]
       mov       edx,[rsp+98]
       cmp       ecx,edx
       jae       short M00_L08
       mov       r8d,ecx
       mov       eax,[rax+r8*4]
       inc       ecx
       mov       [rsp+74],ecx
       cmp       ecx,edx
       je        short M00_L09
M00_L07:
       inc       edi
       jmp       short M00_L06
M00_L08:
       mov       eax,edi
       add       rsp,0A0
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r13
       pop       r14
       pop       r15
       ret
M00_L09:
       lea       rcx,[rsp+58]
       call      qword ptr [7FFB2D246100]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       jmp       short M00_L07
M00_L10:
       mov       rdx,r8
       call      qword ptr [7FFB2CD343F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       jmp       near ptr M00_L01
M00_L11:
       lea       rdx,[rsp+20]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       r15,[rsp+20]
       mov       r13d,[rsp+28]
       jmp       near ptr M00_L02
M00_L12:
       sub       esi,ebx
       mov       eax,ebx
       mov       ecx,esi
       add       rax,rcx
       mov       ecx,r13d
       cmp       rax,rcx
       jbe       short M00_L14
M00_L13:
       call      qword ptr [7FFB2D0657B8]
       int       3
M00_L14:
       mov       ecx,ebx
       lea       rcx,[r15+rcx*4]
       mov       [rsp+30],rcx
       mov       [rsp+38],esi
       jmp       near ptr M00_L04
M00_L15:
       lea       rcx,[rsp+40]
       lea       rdx,[rsp+30]
       mov       r9d,edi
       call      qword ptr [7FFB2D0E4300]
       jmp       near ptr M00_L04
M00_L16:
       lea       rcx,[rsp+58]
       call      qword ptr [7FFB2D246100]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       jmp       near ptr M00_L05
M00_L17:
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+30],xmm0
       jmp       near ptr M00_L04
M00_L18:
       movsxd    rcx,dword ptr [rsp+98]
       mov       [rsp+60],rcx
       jmp       near ptr M00_L05
; Total bytes of code 632
; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,38
       xor       eax,eax
       mov       [rsp+28],rax
       mov       rbx,rcx
       mov       esi,[rbx+40]
       lea       rdx,[rbx+20]
       mov       rcx,rbx
       lea       rdi,[rcx+38]
       mov       rbp,[rdx+8]
       mov       r14d,[rdx+14]
       and       r14d,7FFFFFFF
       mov       r15d,1
       mov       rdx,[rcx]
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       call      qword ptr [7FFB2CD34360]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfClass(Void*, System.Object)
       mov       r13,rax
       test      r13,r13
       je        short M01_L04
M01_L00:
       cmp       r13,rbp
       je        short M01_L04
       mov       r13,[r13+8]
       mov       rcx,rbx
       mov       rdx,r13
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       test      r13,r13
       je        short M01_L04
       mov       rcx,[r13+18]
       mov       r12d,[r13+20]
       mov       r15d,[r13+24]
       xor       eax,eax
       xor       edx,edx
       test      rcx,rcx
       je        short M01_L02
       mov       rax,[rcx]
       test      dword ptr [rax],80000000
       je        near ptr M01_L11
       lea       rax,[rcx+10]
       mov       edx,[rcx+8]
M01_L01:
       and       r12d,7FFFFFFF
       mov       ecx,r12d
       mov       r8d,r15d
       add       r8,rcx
       mov       edx,edx
       cmp       r8,rdx
       ja        near ptr M01_L09
       lea       rax,[rax+rcx*4]
       mov       edx,r15d
M01_L02:
       mov       [rdi],rax
       mov       [rdi+8],edx
       cmp       r13,rbp
       je        short M01_L07
M01_L03:
       cmp       dword ptr [rdi+8],0
       je        short M01_L08
       xor       r15d,r15d
M01_L04:
       test      r15d,r15d
       jne       short M01_L05
       xor       eax,eax
       mov       [rbx+1C],eax
       mov       [rbx+18],eax
       movsxd    rax,esi
       add       [rbx+10],rax
       mov       eax,1
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M01_L05:
       cmp       r15d,2
       je        short M01_L10
       mov       [rbx+1C],esi
M01_L06:
       xor       eax,eax
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M01_L07:
       mov       eax,r14d
       cmp       eax,[rdi+8]
       ja        short M01_L09
       mov       rax,[rdi]
       mov       [rdi],rax
       mov       [rdi+8],r14d
       jmp       short M01_L03
M01_L08:
       mov       r15d,2
       jmp       near ptr M01_L00
M01_L09:
       call      qword ptr [7FFB2D0657B8]
       int       3
M01_L10:
       xor       eax,eax
       mov       [rbx+1C],eax
       mov       [rbx+18],eax
       movsxd    rax,esi
       add       [rbx+10],rax
       jmp       short M01_L06
M01_L11:
       lea       rdx,[rsp+28]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       rax,[rsp+28]
       mov       edx,[rsp+30]
       jmp       near ptr M01_L01
; Total bytes of code 364

Old TryRead

Disassembly:
; System.Memory.SequenceReaderBenchmark.SystemTryRead()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,0D8
       vzeroupper
       xor       eax,eax
       mov       [rsp+38],rax
       vxorps    xmm4,xmm4,xmm4
       mov       rax,0FFFFFFFFFFFFFF70
M00_L00:
       vmovdqa   xmmword ptr [rsp+rax+0D0],xmm4
       vmovdqa   xmmword ptr [rsp+rax+0E0],xmm4
       vmovdqa   xmmword ptr [rsp+rax+0F0],xmm4
       add       rax,30
       jne       short M00_L00
       mov       [rsp+0D0],rax
       mov       rbx,[rcx+10]
       mov       rsi,[rcx+18]
       mov       edi,[rcx+20]
       mov       ebp,[rcx+24]
       mov       [rsp+0B0],rbx
       mov       [rsp+0B8],rsi
       mov       [rsp+0C0],edi
       mov       [rsp+0C4],ebp
       mov       ecx,edi
       and       ecx,7FFFFFFF
       mov       [rsp+90],rbx
       mov       [rsp+98],ecx
       mov       qword ptr [rsp+78],0FFFFFFFFFFFFFFFF
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+0A0],xmm0
       mov       rcx,rbx
       test      rcx,rcx
       je        near ptr M00_L16
       xor       r14d,r14d
       cmp       rbx,rsi
       setne     r14b
       test      edi,edi
       jl        near ptr M00_L26
       test      ebp,ebp
       jl        near ptr M00_L25
       mov       rcx,rbx
       mov       rdx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rcx],rdx
       jne       near ptr M00_L04
M00_L01:
       mov       rcx,rbx
       mov       rdx,[rcx+18]
       mov       r15d,[rcx+20]
       mov       r13d,[rcx+24]
       xor       r12d,r12d
       xor       ecx,ecx
       mov       [rsp+30],rdx
       test      rdx,rdx
       je        short M00_L03
       mov       rcx,[rdx]
       test      dword ptr [rcx],80000000
       je        near ptr M00_L23
       lea       r12,[rdx+10]
       mov       ecx,[rdx+8]
M00_L02:
       and       r15d,7FFFFFFF
       mov       edx,r15d
       mov       eax,r13d
       add       rax,rdx
       mov       ecx,ecx
       cmp       rax,rcx
       ja        near ptr M00_L21
       lea       r12,[r12+rdx*4]
       mov       ecx,r13d
M00_L03:
       mov       [rsp+68],r12
       mov       [rsp+70],ecx
       test      r14d,r14d
       je        near ptr M00_L24
       cmp       edi,[rsp+70]
       ja        near ptr M00_L21
       mov       rcx,[rsp+68]
       mov       edx,edi
       lea       rcx,[rcx+rdx*4]
       mov       edx,[rsp+70]
       sub       edx,edi
       mov       [rsp+68],rcx
       mov       [rsp+70],edx
       mov       rcx,[rbx+8]
       mov       [rsp+0A0],rcx
       xor       ecx,ecx
       mov       [rsp+0A8],ecx
       jmp       near ptr M00_L16
M00_L04:
       mov       rcx,rdx
       mov       rdx,rbx
       call      qword ptr [7FFB2CD543F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       jmp       near ptr M00_L01
M00_L05:
       inc       r15d
M00_L06:
       cmp       byte ptr [rsp+8C],0
       je        near ptr M00_L22
       mov       rcx,[rsp+0C8]
       mov       edx,[rsp+0D0]
       mov       eax,[rsp+88]
       mov       r8d,eax
       cmp       r8d,edx
       jae       near ptr M00_L34
       mov       edx,r8d
       mov       ecx,[rcx+rdx*4]
       inc       eax
       mov       [rsp+88],eax
       mov       rcx,[rsp+80]
       inc       rcx
       mov       [rsp+80],rcx
       mov       ecx,[rsp+88]
       mov       edx,[rsp+0D0]
       cmp       ecx,edx
       jl        short M00_L05
       mov       rcx,[rsp+0B0]
       mov       r13,[rsp+0B8]
       cmp       rcx,r13
       je        near ptr M00_L18
       jmp       short M00_L09
M00_L07:
       mov       eax,1
M00_L08:
       mov       [rsp+0A0],rbp
       xor       edx,edx
       mov       [rsp+0A8],edx
       test      eax,eax
       je        near ptr M00_L18
       jmp       near ptr M00_L14
M00_L09:
       mov       r12,[rsp+0A0]
       mov       esi,[rsp+0A8]
       mov       ebx,esi
       mov       r13,[rsp+0B8]
       mov       ecx,[rsp+0C0]
       mov       edi,[rsp+0C4]
       mov       rdx,[rsp+0A0]
       xor       ebp,ebp
       test      rdx,rdx
       je        near ptr M00_L28
       sar       ecx,1F
       mov       eax,edi
       sar       eax,1F
       lea       ecx,[rax+rcx*2]
       mov       eax,ecx
       neg       eax
       and       edi,7FFFFFFF
       test      eax,eax
       jne       near ptr M00_L30
       mov       rax,rdx
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rax],rcx
       jne       near ptr M00_L15
M00_L10:
       cmp       rax,r13
       je        near ptr M00_L29
       mov       rbp,[rax+8]
       test      rbp,rbp
       je        near ptr M00_L20
       mov       rcx,[rax+18]
       mov       r13d,[rax+20]
       mov       r14d,[rax+24]
       cmp       esi,r14d
       ja        near ptr M00_L19
       add       r13d,esi
       sub       r14d,esi
       jmp       near ptr M00_L07
M00_L11:
       xor       edx,edx
       xor       eax,eax
       test      rcx,rcx
       je        short M00_L13
       mov       rdx,[rcx]
       test      dword ptr [rdx],80000000
       je        near ptr M00_L33
       lea       rdx,[rcx+10]
       mov       eax,[rcx+8]
M00_L12:
       and       r13d,7FFFFFFF
       mov       ecx,r13d
       mov       r8d,r14d
       add       r8,rcx
       mov       eax,eax
       cmp       r8,rax
       ja        near ptr M00_L21
       lea       rdx,[rdx+rcx*4]
       mov       eax,r14d
M00_L13:
       mov       [rsp+0C8],rdx
       mov       [rsp+0D0],eax
       xor       edx,edx
       mov       [rsp+88],edx
       jmp       near ptr M00_L05
M00_L14:
       mov       [rsp+90],r12
       mov       [rsp+98],ebx
       test      r14d,r14d
       jle       near ptr M00_L32
       jmp       short M00_L11
M00_L15:
       call      qword ptr [7FFB2CD543F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       jmp       near ptr M00_L10
M00_L16:
       vmovdqu   xmm0,xmmword ptr [rsp+68]
       vmovdqu   xmmword ptr [rsp+0C8],xmm0
       xor       ecx,ecx
       cmp       dword ptr [rsp+70],0
       setg      cl
       mov       [rsp+8C],cl
       cmp       byte ptr [rsp+8C],0
       je        near ptr M00_L27
M00_L17:
       xor       r15d,r15d
       jmp       near ptr M00_L06
M00_L18:
       mov       byte ptr [rsp+8C],0
       jmp       near ptr M00_L05
M00_L19:
       mov       ecx,21
       call      qword ptr [7FFB2D085B18]
       int       3
M00_L20:
       call      qword ptr [7FFB2D264F18]
       int       3
M00_L21:
       call      qword ptr [7FFB2D0857B8]
       int       3
M00_L22:
       mov       eax,r15d
       add       rsp,0D8
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M00_L23:
       lea       rdx,[rsp+58]
       mov       rcx,[rsp+30]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       r12,[rsp+58]
       mov       ecx,[rsp+60]
       jmp       near ptr M00_L02
M00_L24:
       sub       ebp,edi
       mov       edx,edi
       mov       ecx,ebp
       add       rdx,rcx
       mov       ecx,[rsp+70]
       cmp       rdx,rcx
       ja        short M00_L21
       mov       rdx,[rsp+68]
       mov       ecx,edi
       lea       rdx,[rdx+rcx*4]
       mov       [rsp+68],rdx
       mov       [rsp+70],ebp
       jmp       near ptr M00_L16
M00_L25:
       test      r14d,r14d
       jne       short M00_L20
       mov       rdx,rbx
       mov       rcx,offset MT_System.Int32[]
       call      qword ptr [7FFB2CD54390]
       mov       ecx,ebp
       and       ecx,7FFFFFFF
       sub       ecx,edi
       mov       edx,edi
       mov       r8d,ecx
       add       rdx,r8
       mov       r8d,[rbx+8]
       cmp       rdx,r8
       ja        near ptr M00_L21
       mov       edx,edi
       lea       rdx,[rbx+rdx*4+10]
       mov       [rsp+68],rdx
       mov       [rsp+70],ecx
       jmp       near ptr M00_L16
M00_L26:
       mov       [rsp+20],r14d
       lea       rcx,[rsp+68]
       mov       rdx,rbx
       mov       r8d,edi
       mov       r9d,ebp
       call      qword ptr [7FFB2D1044B0]
       jmp       near ptr M00_L16
M00_L27:
       cmp       rbx,rsi
       je        near ptr M00_L17
       mov       byte ptr [rsp+8C],1
       lea       rcx,[rsp+78]
       call      qword ptr [7FFB2D266250]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].GetNextSpan()
       jmp       near ptr M00_L17
M00_L28:
       xor       ecx,ecx
       xor       r13d,r13d
       xor       r14d,r14d
       xor       eax,eax
       jmp       near ptr M00_L08
M00_L29:
       mov       rcx,[rax+18]
       mov       r13d,[rax+20]
       mov       r14d,[rax+24]
       sub       edi,esi
       mov       edx,esi
       mov       eax,edi
       add       rdx,rax
       mov       eax,r14d
       cmp       rdx,rax
       ja        near ptr M00_L19
       add       esi,r13d
       mov       r13d,esi
       mov       r14d,edi
       jmp       near ptr M00_L07
M00_L30:
       cmp       rdx,r13
       jne       near ptr M00_L20
       cmp       eax,1
       jne       short M00_L31
       mov       rcx,offset MT_System.Int32[]
       call      qword ptr [7FFB2CD54390]
       mov       r14d,edi
       sub       r14d,esi
       mov       edx,esi
       mov       ecx,r14d
       add       rdx,rcx
       mov       ecx,[rax+8]
       cmp       rdx,rcx
       ja        near ptr M00_L21
       mov       rcx,rax
       mov       r13d,esi
       jmp       near ptr M00_L07
M00_L31:
       mov       rcx,offset MT_System.Buffers.MemoryManager`1[[System.Int32, System.Private.CoreLib]]
       call      qword ptr [7FFB2CD543D8]; System.Runtime.CompilerServices.CastHelpers.ChkCastClass(Void*, System.Object)
       mov       rcx,rax
       lea       rdx,[rsp+48]
       mov       rax,[rax]
       mov       rax,[rax+40]
       call      qword ptr [rax+20]
       mov       r14d,edi
       sub       r14d,esi
       mov       edx,esi
       mov       ecx,r14d
       add       rdx,rcx
       mov       ecx,[rsp+54]
       cmp       rdx,rcx
       ja        near ptr M00_L21
       mov       rcx,[rsp+48]
       mov       r13d,esi
       add       r13d,[rsp+50]
       jmp       near ptr M00_L07
M00_L32:
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+0C8],xmm0
       xor       edx,edx
       mov       [rsp+88],edx
       jmp       near ptr M00_L09
M00_L33:
       lea       rdx,[rsp+38]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       rdx,[rsp+38]
       mov       eax,[rsp+40]
       jmp       near ptr M00_L12
M00_L34:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3
; Total bytes of code 1477
; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].GetNextSpan()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,48
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+20],xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       mov       rbx,rcx
       mov       rcx,[rbx+38]
       mov       rsi,[rbx+40]
       cmp       rcx,rsi
       je        near ptr M02_L07
M02_L00:
       mov       rdx,[rbx+28]
       mov       rdi,rdx
       mov       ebp,[rbx+30]
       mov       rsi,[rbx+40]
       mov       ecx,[rbx+48]
       mov       r14d,[rbx+4C]
       lea       r15,[rbx+28]
       xor       r13d,r13d
       test      rdx,rdx
       je        near ptr M02_L08
       sar       ecx,1F
       mov       eax,r14d
       sar       eax,1F
       lea       ecx,[rax+rcx*2]
       mov       r12d,ecx
       neg       r12d
       mov       eax,[r15+8]
       mov       [rsp+40],eax
       and       r14d,7FFFFFFF
       test      r12d,r12d
       jne       near ptr M02_L11
       mov       r12,rdx
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [r12],rcx
       jne       near ptr M02_L06
M02_L01:
       cmp       r12,rsi
       je        near ptr M02_L10
       mov       r13,[r12+8]
       test      r13,r13
       je        near ptr M02_L14
       mov       rsi,[r12+18]
       mov       r14d,[r12+20]
       mov       r12d,[r12+24]
       mov       eax,[rsp+40]
       cmp       eax,r12d
       ja        near ptr M02_L13
       add       r14d,eax
       sub       r12d,eax
M02_L02:
       mov       dword ptr [rsp+44],1
M02_L03:
       mov       rcx,r15
       mov       rdx,r13
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       xor       ecx,ecx
       mov       [r15+8],ecx
       cmp       dword ptr [rsp+44],0
       je        near ptr M02_L07
       lea       rcx,[rbx+18]
       mov       rdx,rdi
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       mov       [rbx+20],ebp
       test      r12d,r12d
       jle       near ptr M02_L09
       xor       edx,edx
       xor       ecx,ecx
       test      rsi,rsi
       je        short M02_L05
       mov       rdx,[rsi]
       test      dword ptr [rdx],80000000
       je        near ptr M02_L15
       lea       rdx,[rsi+10]
       mov       ecx,[rsi+8]
M02_L04:
       and       r14d,7FFFFFFF
       mov       eax,r14d
       mov       r8d,r12d
       add       r8,rax
       mov       ecx,ecx
       cmp       r8,rcx
       ja        near ptr M02_L16
       lea       rdx,[rdx+rax*4]
       mov       ecx,r12d
M02_L05:
       mov       [rbx+50],rdx
       mov       [rbx+58],ecx
       xor       edx,edx
       mov       [rbx+10],edx
       add       rsp,48
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M02_L06:
       call      qword ptr [7FFB2CD543F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       mov       r12,rax
       jmp       near ptr M02_L01
M02_L07:
       mov       byte ptr [rbx+14],0
       add       rsp,48
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M02_L08:
       xor       esi,esi
       xor       r14d,r14d
       xor       r12d,r12d
       xor       eax,eax
       xor       edx,edx
       mov       [rsp+44],edx
       jmp       near ptr M02_L03
M02_L09:
       xor       ecx,ecx
       mov       [rbx+50],rcx
       mov       [rbx+58],rcx
       mov       [rbx+10],ecx
       jmp       near ptr M02_L00
M02_L10:
       mov       rsi,[r12+18]
       mov       edx,[r12+20]
       mov       r12d,[r12+24]
       mov       eax,[rsp+40]
       sub       r14d,eax
       mov       ecx,eax
       mov       r8d,r14d
       add       rcx,r8
       mov       r8d,r12d
       cmp       rcx,r8
       ja        near ptr M02_L13
       add       eax,edx
       mov       r12d,r14d
       mov       r14d,eax
       jmp       near ptr M02_L02
M02_L11:
       cmp       rdx,rsi
       jne       near ptr M02_L14
       cmp       r12d,1
       jne       short M02_L12
       mov       rcx,offset MT_System.Int32[]
       call      qword ptr [7FFB2CD54390]
       mov       esi,[rsp+40]
       mov       r12d,r14d
       sub       r12d,esi
       mov       edx,esi
       mov       ecx,r12d
       add       rdx,rcx
       mov       ecx,[rax+8]
       cmp       rdx,rcx
       ja        near ptr M02_L16
       mov       r14d,esi
       mov       rsi,rax
       jmp       near ptr M02_L02
M02_L12:
       mov       rcx,offset MT_System.Buffers.MemoryManager`1[[System.Int32, System.Private.CoreLib]]
       call      qword ptr [7FFB2CD543D8]; System.Runtime.CompilerServices.CastHelpers.ChkCastClass(Void*, System.Object)
       mov       rcx,rax
       lea       rdx,[rsp+30]
       mov       rax,[rax]
       mov       rax,[rax+40]
       call      qword ptr [rax+20]
       mov       esi,[rsp+40]
       mov       r12d,r14d
       sub       r12d,esi
       mov       ecx,esi
       mov       eax,r12d
       add       rcx,rax
       mov       eax,[rsp+3C]
       cmp       rcx,rax
       ja        short M02_L16
       mov       rcx,[rsp+30]
       mov       r14d,esi
       add       r14d,[rsp+38]
       mov       rsi,rcx
       jmp       near ptr M02_L02
M02_L13:
       mov       ecx,21
       call      qword ptr [7FFB2D085B18]
       int       3
M02_L14:
       call      qword ptr [7FFB2D264F18]
       int       3
M02_L15:
       lea       rdx,[rsp+20]
       mov       rcx,rsi
       mov       rax,[rsi]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       rdx,[rsp+20]
       mov       ecx,[rsp+28]
       jmp       near ptr M02_L04
M02_L16:
       call      qword ptr [7FFB2D0857B8]
       int       3
; Total bytes of code 706

@jozkee
Copy link
Member

jozkee commented Aug 17, 2023

TryReadTo is also affected with a 1.8x slowness.

[Benchmark]
public void SystemTryReadTo()
{
    var reader = new System.Buffers.SequenceReader<int>(payload);
    if (reader.TryReadTo(out ReadOnlySpan<int> _, 255)) Throw();
}

New TryReadTo

Disassembly:
; System.Memory.SequenceReaderBenchmark.SystemTryReadTo()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,198
       vzeroupper
       xor       eax,eax
       mov       [rsp+28],rax
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       mov       rax,0FFFFFFFFFFFFFEB0
M00_L00:
       vmovdqa   xmmword ptr [rsp+rax+190],xmm4
       vmovdqa   xmmword ptr [rsp+rax+1A0],xmm4
       vmovdqa   xmmword ptr [rsp+rax+1B0],xmm4
       add       rax,30
       jne       short M00_L00
       mov       [rsp+190],rax
       vmovdqu   xmm0,xmmword ptr [rcx+10]
       vmovdqu   xmmword ptr [rsp+128],xmm0
       mov       rdx,[rcx+20]
       mov       [rsp+138],rdx
       vmovdqu   xmm0,xmmword ptr [rsp+128]
       vmovdqu   xmmword ptr [rsp+170],xmm0
       mov       rcx,[rsp+138]
       mov       [rsp+180],rcx
       mov       rcx,[rsp+128]
       mov       edx,[rsp+138]
       and       edx,7FFFFFFF
       mov       [rsp+150],rcx
       mov       [rsp+168],edx
       mov       r8,[rsp+128]
       test      r8,r8
       je        near ptr M00_L26
       mov       ebx,[rsp+138]
       mov       ebp,[rsp+13C]
       xor       r14d,r14d
       cmp       r8,[rsp+130]
       setne     r14b
       mov       ecx,ebx
       or        ecx,ebp
       jl        near ptr M00_L14
       mov       rax,r8
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rax],rcx
       jne       near ptr M00_L11
M00_L01:
       mov       rcx,[rax+18]
       mov       r15d,[rax+20]
       mov       r13d,[rax+24]
       xor       r12d,r12d
       xor       r8d,r8d
       test      rcx,rcx
       je        short M00_L03
       mov       r8,[rcx]
       test      dword ptr [r8],80000000
       je        near ptr M00_L12
       lea       r12,[rcx+10]
       mov       r8d,[rcx+8]
M00_L02:
       and       r15d,7FFFFFFF
       mov       ecx,r15d
       mov       edx,r13d
       add       rdx,rcx
       mov       r8d,r8d
       cmp       rdx,r8
       ja        near ptr M00_L17
       lea       r12,[r12+rcx*4]
       mov       r8d,r13d
M00_L03:
       test      r14d,r14d
       je        near ptr M00_L13
       cmp       ebx,r8d
       ja        near ptr M00_L17
       mov       ecx,ebx
       lea       rcx,[r12+rcx*4]
       sub       r8d,ebx
       mov       [rsp+118],rcx
       mov       [rsp+120],r8d
M00_L04:
       vmovdqu   xmm0,xmmword ptr [rsp+118]
       vmovdqu   xmmword ptr [rsp+188],xmm0
       xor       ecx,ecx
       mov       [rsp+16C],ecx
       mov       [rsp+160],rcx
       mov       rcx,[rsp+128]
       cmp       rcx,[rsp+130]
       je        near ptr M00_L27
       mov       qword ptr [rsp+158],0FFFFFFFFFFFFFFFF
       cmp       dword ptr [rsp+190],0
       je        near ptr M00_L15
M00_L05:
       mov       ecx,[rsp+16C]
       mov       r15d,[rsp+190]
       cmp       ecx,r15d
       ja        near ptr M00_L17
       mov       r8,[rsp+188]
       mov       edx,ecx
       lea       r13,[r8+rdx*4]
       sub       r15d,ecx
       mov       rcx,r13
       mov       r8d,r15d
       mov       edx,0FF
       call      qword ptr [7FFB2D27CF18]; System.SpanHelpers.NonPackedIndexOfValueType[[System.Int32, System.Private.CoreLib],[System.SpanHelpers+DontNegate`1[[System.Int32, System.Private.CoreLib]], System.Private.CoreLib]](Int32 ByRef, Int32, Int32)
       cmp       eax,0FFFFFFFF
       jne       near ptr M00_L16
       mov       r14d,[rsp+190]
       sub       r14d,[rsp+16C]
       lea       rdi,[rsp+88]
       lea       rsi,[rsp+150]
       mov       ecx,9
       rep movsq
       test      r14d,r14d
       jle       short M00_L06
       movsxd    rdx,r14d
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276FD0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].Advance(Int64)
M00_L06:
       lea       rcx,[rsp+188]
       lea       rdx,[rsp+78]
       mov       r8d,[rsp+16C]
       call      qword ptr [7FFB2CD5FA98]; System.ReadOnlySpan`1[[System.Int32, System.Private.CoreLib]].Slice(Int32)
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276E20]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].get_End()
       test      eax,eax
       jne       short M00_L08
M00_L07:
       vmovdqu   xmm0,xmmword ptr [rsp+78]
       vmovdqu   xmmword ptr [rsp+48],xmm0
       lea       rcx,[rsp+48]
       mov       edx,0FF
       call      qword ptr [7FFB2D27C300]; System.MemoryExtensions.IndexOf[[System.Int32, System.Private.CoreLib]](System.ReadOnlySpan`1<Int32>, Int32)
       cmp       eax,0FFFFFFFF
       jne       near ptr M00_L21
       mov       edx,[rsp+80]
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276FE8]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].AdvanceCurrentSpan(Int64)
       vmovdqu   xmm0,xmmword ptr [rsp+188]
       vmovdqu   xmmword ptr [rsp+78],xmm0
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276E20]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].get_End()
       test      eax,eax
       je        short M00_L07
M00_L08:
       lea       rdi,[rsp+150]
       lea       rsi,[rsp+88]
       mov       ecx,9
       rep movsq
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+0F0],xmm0
       vmovdqu   xmmword ptr [rsp+0F8],xmm0
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+140],xmm0
       xor       esi,esi
M00_L09:
       test      esi,esi
       jne       near ptr M00_L25
M00_L10:
       add       rsp,198
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M00_L11:
       mov       rdx,r8
       call      qword ptr [7FFB2CD543F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       jmp       near ptr M00_L01
M00_L12:
       lea       rdx,[rsp+108]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       r12,[rsp+108]
       mov       r8d,[rsp+110]
       jmp       near ptr M00_L02
M00_L13:
       sub       ebp,ebx
       mov       ecx,ebx
       mov       edx,ebp
       add       rcx,rdx
       mov       edx,r8d
       cmp       rcx,rdx
       ja        short M00_L17
       mov       ecx,ebx
       lea       rcx,[r12+rcx*4]
       mov       [rsp+118],rcx
       mov       [rsp+120],ebp
       jmp       near ptr M00_L04
M00_L14:
       lea       rcx,[rsp+128]
       lea       rdx,[rsp+118]
       mov       r9d,r14d
       call      qword ptr [7FFB2D0C50B0]
       jmp       near ptr M00_L04
M00_L15:
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276FA0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       jmp       near ptr M00_L05
M00_L16:
       lea       rbx,[rsp+140]
       test      eax,eax
       je        short M00_L19
       cmp       eax,r15d
       jbe       short M00_L18
M00_L17:
       call      qword ptr [7FFB2D0857B8]
       int       3
M00_L18:
       mov       ecx,eax
       jmp       short M00_L20
M00_L19:
       xor       r13d,r13d
       xor       ecx,ecx
M00_L20:
       mov       [rbx],r13
       mov       [rbx+8],ecx
       mov       ecx,[rsp+16C]
       lea       ecx,[rcx+rax+1]
       mov       [rsp+16C],ecx
       mov       ecx,[rsp+16C]
       cmp       ecx,[rsp+190]
       jne       near ptr M00_L25
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276FA0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       jmp       near ptr M00_L25
M00_L21:
       test      eax,eax
       jle       short M00_L22
       movsxd    rdx,eax
       lea       rcx,[rsp+150]
       call      qword ptr [7FFB2D276FE8]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].AdvanceCurrentSpan(Int64)
M00_L22:
       mov       rbx,[rsp+88]
       mov       esi,[rsp+0A0]
       add       esi,[rsp+0A4]
       lea       rcx,[rsp+150]
       lea       rdx,[rsp+68]
       call      qword ptr [7FFB2D276E68]
       mov       [rsp+38],rbx
       mov       [rsp+40],esi
       vmovdqu   xmm0,xmmword ptr [rsp+68]
       vmovdqu   xmmword ptr [rsp+28],xmm0
       lea       r8,[rsp+38]
       lea       r9,[rsp+28]
       lea       rcx,[rsp+170]
       lea       rdx,[rsp+0F0]
       call      qword ptr [7FFB2D0C4F60]
       lea       rcx,[rsp+150]
       mov       edx,1
       call      qword ptr [7FFB2D276FD0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].Advance(Int64)
       lea       rbx,[rsp+140]
       mov       rcx,[rsp+0F0]
       cmp       rcx,[rsp+0F8]
       je        short M00_L23
       lea       rcx,[rsp+0F0]
       call      qword ptr [7FFB2D27DE78]
       mov       rdx,rax
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+58],xmm0
       lea       rcx,[rsp+58]
       call      qword ptr [7FFB2CD5F870]
       vmovdqu   xmm0,xmmword ptr [rsp+58]
       vmovdqu   xmmword ptr [rsp+0D0],xmm0
       jmp       short M00_L24
M00_L23:
       lea       rcx,[rsp+0F0]
       lea       rdx,[rsp+0E0]
       call      qword ptr [7FFB2D0C5068]
       lea       rcx,[rsp+0E0]
       lea       rdx,[rsp+0D0]
       call      qword ptr [7FFB2D275B48]; System.ReadOnlyMemory`1[[System.Int32, System.Private.CoreLib]].get_Span()
M00_L24:
       mov       rax,[rsp+0D0]
       mov       [rbx],rax
       mov       eax,[rsp+0D8]
       mov       [rbx+8],eax
       mov       esi,1
       jmp       near ptr M00_L09
M00_L25:
       call      qword ptr [7FFB2D0C5368]
       jmp       near ptr M00_L10
M00_L26:
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+118],xmm0
       jmp       near ptr M00_L04
M00_L27:
       movsxd    rcx,dword ptr [rsp+190]
       mov       [rsp+158],rcx
       jmp       near ptr M00_L05
; Total bytes of code 1407
; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].TryGetNextSpan()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,38
       xor       eax,eax
       mov       [rsp+28],rax
       mov       rbx,rcx
       mov       esi,[rbx+40]
       lea       rdx,[rbx+20]
       mov       rcx,rbx
       lea       rdi,[rcx+38]
       mov       rbp,[rdx+8]
       mov       r14d,[rdx+14]
       and       r14d,7FFFFFFF
       mov       r15d,1
       mov       rdx,[rcx]
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       call      qword ptr [7FFB2CD54360]; System.Runtime.CompilerServices.CastHelpers.IsInstanceOfClass(Void*, System.Object)
       mov       r13,rax
       test      r13,r13
       je        short M08_L04
M08_L00:
       cmp       r13,rbp
       je        short M08_L04
       mov       r13,[r13+8]
       mov       rcx,rbx
       mov       rdx,r13
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       test      r13,r13
       je        short M08_L04
       mov       rcx,[r13+18]
       mov       r12d,[r13+20]
       mov       r15d,[r13+24]
       xor       eax,eax
       xor       edx,edx
       test      rcx,rcx
       je        short M08_L02
       mov       rax,[rcx]
       test      dword ptr [rax],80000000
       je        near ptr M08_L08
       lea       rax,[rcx+10]
       mov       edx,[rcx+8]
M08_L01:
       and       r12d,7FFFFFFF
       mov       ecx,r12d
       mov       r8d,r15d
       add       r8,rcx
       mov       edx,edx
       cmp       r8,rdx
       ja        near ptr M08_L10
       lea       rax,[rax+rcx*4]
       mov       edx,r15d
M08_L02:
       mov       [rdi],rax
       mov       [rdi+8],edx
       cmp       r13,rbp
       je        short M08_L05
M08_L03:
       cmp       dword ptr [rdi+8],0
       je        short M08_L09
       xor       r15d,r15d
M08_L04:
       test      r15d,r15d
       jne       short M08_L06
       xor       eax,eax
       mov       [rbx+1C],eax
       mov       [rbx+18],eax
       movsxd    rax,esi
       add       [rbx+10],rax
       mov       eax,1
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M08_L05:
       mov       eax,r14d
       cmp       eax,[rdi+8]
       ja        short M08_L10
       mov       rax,[rdi]
       mov       [rdi],rax
       mov       [rdi+8],r14d
       jmp       short M08_L03
M08_L06:
       cmp       r15d,2
       je        short M08_L11
       mov       [rbx+1C],esi
M08_L07:
       xor       eax,eax
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M08_L08:
       lea       rdx,[rsp+28]
       mov       rax,[rcx]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       rax,[rsp+28]
       mov       edx,[rsp+30]
       jmp       near ptr M08_L01
M08_L09:
       mov       r15d,2
       jmp       near ptr M08_L00
M08_L10:
       call      qword ptr [7FFB2D0857B8]
       int       3
M08_L11:
       xor       eax,eax
       mov       [rbx+1C],eax
       mov       [rbx+18],eax
       movsxd    rax,esi
       add       [rbx+10],rax
       jmp       short M08_L07
; Total bytes of code 364

Old TryReadTo

Disassembly:
; System.Memory.SequenceReaderBenchmark.SystemTryReadTo()
       push      r15
       push      r14
       push      r13
       push      r12
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,208
       vzeroupper
       vxorps    xmm4,xmm4,xmm4
       vmovdqa   xmmword ptr [rsp+30],xmm4
       vmovdqa   xmmword ptr [rsp+40],xmm4
       mov       rax,0FFFFFFFFFFFFFE50
M00_L00:
       vmovdqa   xmmword ptr [rsp+rax+200],xmm4
       vmovdqa   xmmword ptr [rsp+rax+210],xmm4
       vmovdqa   xmmword ptr [rsp+rax+220],xmm4
       add       rax,30
       jne       short M00_L00
       mov       [rsp+200],rax
       mov       rbx,[rcx+10]
       mov       rbp,[rcx+18]
       mov       r14d,[rcx+20]
       mov       r15d,[rcx+24]
       mov       [rsp+1E0],rbx
       mov       [rsp+1E8],rbp
       mov       [rsp+1F0],r14d
       mov       [rsp+1F4],r15d
       mov       ecx,r14d
       and       ecx,7FFFFFFF
       mov       [rsp+1C0],rbx
       mov       [rsp+1C8],ecx
       mov       qword ptr [rsp+1A8],0FFFFFFFFFFFFFFFF
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+1D0],xmm0
       mov       rcx,rbx
       test      rcx,rcx
       je        near ptr M00_L10
       xor       r13d,r13d
       cmp       rbx,rbp
       setne     r13b
       test      r14d,r14d
       jl        near ptr M00_L33
       test      r15d,r15d
       jl        near ptr M00_L31
       mov       rcx,rbx
       mov       rdx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rcx],rdx
       jne       near ptr M00_L04
M00_L01:
       mov       rcx,rbx
       mov       r8,[rcx+18]
       mov       r12d,[rcx+20]
       mov       esi,[rcx+24]
       xor       ecx,ecx
       xor       edx,edx
       test      r8,r8
       je        short M00_L03
       mov       rcx,[r8]
       test      dword ptr [rcx],80000000
       je        near ptr M00_L29
       lea       rcx,[r8+10]
       mov       edx,[r8+8]
M00_L02:
       and       r12d,7FFFFFFF
       mov       r8d,r12d
       mov       eax,esi
       add       rax,r8
       mov       edx,edx
       cmp       rax,rdx
       ja        near ptr M00_L21
       lea       rcx,[rcx+r8*4]
       mov       edx,esi
M00_L03:
       mov       [rsp+188],rcx
       mov       [rsp+190],edx
       test      r13d,r13d
       je        near ptr M00_L30
       cmp       r14d,[rsp+190]
       ja        near ptr M00_L21
       mov       rcx,[rsp+188]
       mov       r8d,r14d
       lea       rcx,[rcx+r8*4]
       mov       r8d,[rsp+190]
       sub       r8d,r14d
       mov       [rsp+188],rcx
       mov       [rsp+190],r8d
       mov       rcx,[rbx+8]
       mov       [rsp+1D0],rcx
       xor       ecx,ecx
       mov       [rsp+1D8],ecx
       jmp       near ptr M00_L10
M00_L04:
       mov       rcx,rdx
       mov       rdx,rbx
       call      qword ptr [7FFB2CD243F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       jmp       near ptr M00_L01
M00_L05:
       mov       edx,r8d
       lea       rcx,[rcx+rdx*4]
       sub       esi,r8d
       cmp       byte ptr [rsp+1BC],0
       je        short M00_L07
M00_L06:
       mov       r8d,esi
       mov       edx,0FF
       call      qword ptr [7FFB2D24D7A0]; System.SpanHelpers.NonPackedIndexOfValueType[[System.Int32, System.Private.CoreLib],[System.SpanHelpers+DontNegate`1[[System.Int32, System.Private.CoreLib]], System.Private.CoreLib]](Int32 ByRef, Int32, Int32)
       cmp       eax,0FFFFFFFF
       jne       near ptr M00_L22
       mov       edx,esi
       lea       rcx,[rsp+1A8]
       call      qword ptr [7FFB2D247120]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].AdvanceCurrentSpan(Int64)
       mov       rcx,[rsp+1F8]
       mov       esi,[rsp+200]
       cmp       byte ptr [rsp+1BC],0
       jne       short M00_L06
M00_L07:
       lea       rdi,[rsp+1A8]
       lea       rsi,[rsp+0E0]
       mov       ecx,0C
       rep movsq
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+160],xmm0
       vmovdqu   xmmword ptr [rsp+168],xmm0
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+198],xmm0
       xor       esi,esi
M00_L08:
       test      esi,esi
       jne       near ptr M00_L28
M00_L09:
       add       rsp,208
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r12
       pop       r13
       pop       r14
       pop       r15
       ret
M00_L10:
       vmovdqu   xmm0,xmmword ptr [rsp+188]
       vmovdqu   xmmword ptr [rsp+1F8],xmm0
       xor       ecx,ecx
       cmp       dword ptr [rsp+190],0
       setg      cl
       mov       [rsp+1BC],cl
       cmp       byte ptr [rsp+1BC],0
       je        near ptr M00_L34
M00_L11:
       mov       rcx,[rsp+1F8]
       mov       r12d,[rsp+200]
       mov       r8d,[rsp+1B8]
       cmp       r8d,r12d
       ja        near ptr M00_L21
       mov       edx,r8d
       lea       rsi,[rcx+rdx*4]
       sub       r12d,r8d
       mov       rcx,rsi
       mov       r8d,r12d
       mov       edx,0FF
       call      qword ptr [7FFB2D24D7A0]; System.SpanHelpers.NonPackedIndexOfValueType[[System.Int32, System.Private.CoreLib],[System.SpanHelpers+DontNegate`1[[System.Int32, System.Private.CoreLib]], System.Private.CoreLib]](Int32 ByRef, Int32, Int32)
       cmp       eax,0FFFFFFFF
       jne       near ptr M00_L35
       mov       r13d,[rsp+200]
       sub       r13d,[rsp+1B8]
       lea       rdi,[rsp+0E0]
       lea       rsi,[rsp+1A8]
       mov       ecx,0C
       rep movsq
       test      r13d,r13d
       jle       near ptr M00_L17
       movsxd    r13,r13d
       test      r13,0FFFFFFFF80000000
       jne       short M00_L12
       mov       ecx,[rsp+200]
       sub       ecx,[rsp+1B8]
       cmp       ecx,r13d
       jg        near ptr M00_L38
M00_L12:
       test      r13,r13
       jl        near ptr M00_L39
       mov       rcx,r13
       add       rcx,[rsp+1B0]
       mov       [rsp+1B0],rcx
       cmp       byte ptr [rsp+1BC],0
       je        near ptr M00_L16
M00_L13:
       mov       ecx,[rsp+200]
       sub       ecx,[rsp+1B8]
       movsxd    rdx,ecx
       cmp       rdx,r13
       jg        near ptr M00_L19
       mov       edx,ecx
       add       edx,[rsp+1B8]
       mov       [rsp+1B8],edx
       movsxd    rcx,ecx
       sub       r13,rcx
       vmovdqu   xmm0,xmmword ptr [rsp+1E0]
       vmovdqu   xmmword ptr [rsp+90],xmm0
       mov       rcx,[rsp+1F0]
       mov       [rsp+0A0],rcx
       mov       rcx,[rsp+90]
       cmp       rcx,[rsp+98]
       je        near ptr M00_L18
M00_L14:
       mov       rsi,[rsp+1D0]
       mov       edi,[rsp+1D8]
       vmovdqu   xmm0,xmmword ptr [rsp+1E0]
       vmovdqu   xmmword ptr [rsp+90],xmm0
       mov       rcx,[rsp+1F0]
       mov       [rsp+0A0],rcx
       lea       rcx,[rsp+90]
       lea       rdx,[rsp+1D0]
       lea       r8,[rsp+80]
       lea       r9,[rsp+60]
       call      qword ptr [7FFB2D095050]; System.Buffers.ReadOnlySequence`1[[System.Int32, System.Private.CoreLib]].TryGetBuffer(System.SequencePosition ByRef, System.ReadOnlyMemory`1<Int32> ByRef, System.SequencePosition ByRef)
       vmovdqu   xmm0,xmmword ptr [rsp+60]
       vmovdqu   xmmword ptr [rsp+1D0],xmm0
       test      eax,eax
       je        near ptr M00_L18
       mov       [rsp+1C0],rsi
       mov       [rsp+1C8],edi
       cmp       dword ptr [rsp+8C],0
       jle       near ptr M00_L40
       lea       rcx,[rsp+80]
       lea       rdx,[rsp+70]
       call      qword ptr [7FFB2D245B48]; System.ReadOnlyMemory`1[[System.Int32, System.Private.CoreLib]].get_Span()
       vmovdqu   xmm0,xmmword ptr [rsp+70]
       vmovdqu   xmmword ptr [rsp+1F8],xmm0
       xor       ecx,ecx
       mov       [rsp+1B8],ecx
M00_L15:
       test      r13,r13
       je        short M00_L16
       cmp       byte ptr [rsp+1BC],0
       jne       near ptr M00_L13
M00_L16:
       test      r13,r13
       jne       short M00_L20
M00_L17:
       mov       rcx,[rsp+1F8]
       mov       esi,[rsp+200]
       mov       r8d,[rsp+1B8]
       cmp       r8d,esi
       ja        short M00_L21
       jmp       near ptr M00_L05
M00_L18:
       mov       byte ptr [rsp+1BC],0
       jmp       short M00_L15
M00_L19:
       mov       ecx,r13d
       add       ecx,[rsp+1B8]
       mov       [rsp+1B8],ecx
       jmp       short M00_L17
M00_L20:
       mov       rcx,[rsp+1B0]
       sub       rcx,r13
       mov       [rsp+1B0],rcx
       mov       ecx,0F
       call      qword ptr [7FFB2D245D58]
       int       3
M00_L21:
       call      qword ptr [7FFB2D0557B8]
       int       3
M00_L22:
       test      eax,eax
       jle       short M00_L23
       movsxd    rcx,eax
       mov       rax,rcx
       add       rax,[rsp+1B0]
       mov       [rsp+1B0],rax
       add       ecx,[rsp+1B8]
       mov       [rsp+1B8],ecx
       mov       ecx,[rsp+1B8]
       mov       eax,[rsp+200]
       cmp       ecx,eax
       jl        short M00_L23
       lea       rcx,[rsp+1A8]
       call      qword ptr [7FFB2D2470F0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].GetNextSpan()
M00_L23:
       vmovdqu   xmm0,xmmword ptr [rsp+1E0]
       vmovdqu   xmmword ptr [rsp+0C8],xmm0
       mov       rcx,[rsp+1F0]
       mov       [rsp+0D8],rcx
       lea       rcx,[rsp+0E0]
       lea       rdx,[rsp+0B8]
       call      qword ptr [7FFB2D246F70]
       lea       rcx,[rsp+1A8]
       lea       rdx,[rsp+0A8]
       call      qword ptr [7FFB2D246F70]
       vmovdqu   xmm0,xmmword ptr [rsp+0B8]
       vmovdqu   xmmword ptr [rsp+40],xmm0
       vmovdqu   xmm0,xmmword ptr [rsp+0A8]
       vmovdqu   xmmword ptr [rsp+30],xmm0
       lea       r8,[rsp+40]
       lea       r9,[rsp+30]
       lea       rcx,[rsp+0C8]
       lea       rdx,[rsp+160]
       call      qword ptr [7FFB2D094F60]
       mov       ecx,[rsp+200]
       sub       ecx,[rsp+1B8]
       cmp       ecx,1
       jle       short M00_L24
       mov       ecx,[rsp+1B8]
       inc       ecx
       mov       [rsp+1B8],ecx
       mov       rcx,[rsp+1B0]
       inc       rcx
       mov       [rsp+1B0],rcx
       jmp       short M00_L25
M00_L24:
       lea       rcx,[rsp+1A8]
       mov       edx,1
       call      qword ptr [7FFB2D247138]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].AdvanceToNextSpan(Int64)
M00_L25:
       lea       rbx,[rsp+198]
       mov       rcx,[rsp+160]
       cmp       rcx,[rsp+168]
       je        short M00_L26
       lea       rcx,[rsp+160]
       call      qword ptr [7FFB2D24E700]
       mov       rdx,rax
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+50],xmm0
       lea       rcx,[rsp+50]
       call      qword ptr [7FFB2CD2F870]
       vmovdqu   xmm0,xmmword ptr [rsp+50]
       vmovdqu   xmmword ptr [rsp+140],xmm0
       jmp       short M00_L27
M00_L26:
       lea       rcx,[rsp+160]
       lea       rdx,[rsp+150]
       call      qword ptr [7FFB2D095068]
       lea       rcx,[rsp+150]
       lea       rdx,[rsp+140]
       call      qword ptr [7FFB2D245B48]; System.ReadOnlyMemory`1[[System.Int32, System.Private.CoreLib]].get_Span()
M00_L27:
       mov       rax,[rsp+140]
       mov       [rbx],rax
       mov       eax,[rsp+148]
       mov       [rbx+8],eax
       mov       esi,1
       jmp       near ptr M00_L08
M00_L28:
       call      qword ptr [7FFB2D095368]
       jmp       near ptr M00_L09
M00_L29:
       lea       rdx,[rsp+178]
       mov       rcx,r8
       mov       rax,[r8]
       mov       rax,[rax+40]
       call      qword ptr [rax+28]
       mov       rcx,[rsp+178]
       mov       edx,[rsp+180]
       jmp       near ptr M00_L02
M00_L30:
       sub       r15d,r14d
       mov       eax,r14d
       mov       ecx,r15d
       add       rax,rcx
       mov       ecx,[rsp+190]
       cmp       rax,rcx
       ja        near ptr M00_L21
       mov       rax,[rsp+188]
       mov       ecx,r14d
       lea       rax,[rax+rcx*4]
       mov       [rsp+188],rax
       mov       [rsp+190],r15d
       jmp       near ptr M00_L10
M00_L31:
       test      r13d,r13d
       je        short M00_L32
       call      qword ptr [7FFB2D245DB8]
       int       3
M00_L32:
       mov       rdx,rbx
       mov       rcx,offset MT_System.Int32[]
       call      qword ptr [7FFB2CD24390]
       mov       ecx,r15d
       and       ecx,7FFFFFFF
       sub       ecx,r14d
       mov       edx,r14d
       mov       r8d,ecx
       add       rdx,r8
       mov       r8d,[rbx+8]
       cmp       rdx,r8
       ja        near ptr M00_L21
       mov       edx,r14d
       lea       rdx,[rbx+rdx*4+10]
       mov       [rsp+188],rdx
       mov       [rsp+190],ecx
       jmp       near ptr M00_L10
M00_L33:
       mov       [rsp+20],r13d
       lea       rcx,[rsp+188]
       mov       rdx,rbx
       mov       r8d,r14d
       mov       r9d,r15d
       call      qword ptr [7FFB2D095260]
       jmp       near ptr M00_L10
M00_L34:
       cmp       rbx,rbp
       je        near ptr M00_L11
       mov       byte ptr [rsp+1BC],1
       lea       rcx,[rsp+1A8]
       call      qword ptr [7FFB2D2470F0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].GetNextSpan()
       jmp       near ptr M00_L11
M00_L35:
       lea       rcx,[rsp+198]
       test      eax,eax
       je        short M00_L36
       cmp       eax,r12d
       ja        near ptr M00_L21
       mov       edx,eax
       jmp       short M00_L37
M00_L36:
       xor       esi,esi
       xor       edx,edx
M00_L37:
       mov       [rcx],rsi
       mov       [rcx+8],edx
       inc       eax
       movsxd    rcx,eax
       mov       rax,rcx
       add       rax,[rsp+1B0]
       mov       [rsp+1B0],rax
       add       ecx,[rsp+1B8]
       mov       [rsp+1B8],ecx
       mov       ecx,[rsp+1B8]
       mov       eax,[rsp+200]
       cmp       ecx,eax
       jl        near ptr M00_L28
       lea       rcx,[rsp+1A8]
       call      qword ptr [7FFB2D2470F0]; System.Buffers.SequenceReader`1[[System.Int32, System.Private.CoreLib]].GetNextSpan()
       jmp       near ptr M00_L28
M00_L38:
       mov       ecx,r13d
       add       ecx,[rsp+1B8]
       mov       [rsp+1B8],ecx
       mov       rcx,r13
       add       rcx,[rsp+1B0]
       mov       [rsp+1B0],rcx
       jmp       near ptr M00_L17
M00_L39:
       mov       ecx,0F
       call      qword ptr [7FFB2D245D58]
       int       3
M00_L40:
       vxorps    xmm0,xmm0,xmm0
       vmovdqu   xmmword ptr [rsp+1F8],xmm0
       xor       ecx,ecx
       mov       [rsp+1B8],ecx
       jmp       near ptr M00_L14
; Total bytes of code 2191
; System.Buffers.ReadOnlySequence`1[[System.Int32, System.Private.CoreLib]].TryGetBuffer(System.SequencePosition ByRef, System.ReadOnlyMemory`1<Int32> ByRef, System.SequencePosition ByRef)
       push      r15
       push      r14
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,38
       xor       eax,eax
       mov       [rsp+28],rax
       mov       rbx,r8
       mov       rsi,r9
       mov       r8,[rdx]
       xor       eax,eax
       mov       [rsi],rax
       mov       [rsi+8],rax
       test      r8,r8
       je        near ptr M04_L03
       mov       eax,[rcx+10]
       sar       eax,1F
       mov       edi,[rcx+14]
       mov       r10d,edi
       sar       r10d,1F
       lea       eax,[r10+rax*2]
       mov       ebp,eax
       neg       ebp
       mov       r14,[rcx+8]
       mov       r15d,[rdx+8]
       and       edi,7FFFFFFF
       test      ebp,ebp
       jne       near ptr M04_L05
       mov       rbp,r8
       mov       rcx,offset MT_System.Buffers.ReadOnlySequenceSegment`1[[System.Int32, System.Private.CoreLib]]
       cmp       [rbp],rcx
       jne       short M04_L02
M04_L00:
       cmp       rbp,r14
       je        short M04_L04
       mov       rdx,[rbp+8]
       test      rdx,rdx
       je        near ptr M04_L06
       mov       rcx,rsi
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       xor       ecx,ecx
       mov       [rsi+8],ecx
       mov       rdx,[rbp+18]
       mov       r14d,[rbp+20]
       mov       esi,[rbp+24]
       cmp       r15d,esi
       ja        near ptr M04_L11
       add       r14d,r15d
       sub       esi,r15d
       mov       rcx,rbx
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       mov       [rbx+8],r14d
       mov       [rbx+0C],esi
M04_L01:
       mov       eax,1
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r14
       pop       r15
       ret
M04_L02:
       mov       rdx,r8
       call      qword ptr [7FFB2CD243F0]; System.Runtime.CompilerServices.CastHelpers.ChkCastClassSpecial(Void*, System.Object)
       mov       rbp,rax
       jmp       short M04_L00
M04_L03:
       xor       eax,eax
       mov       [rbx],rax
       mov       [rbx+8],rax
       add       rsp,38
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       pop       r14
       pop       r15
       ret
M04_L04:
       mov       rdx,[rbp+18]
       mov       r14d,[rbp+20]
       mov       esi,[rbp+24]
       sub       edi,r15d
       mov       ecx,r15d
       mov       eax,edi
       add       rcx,rax
       mov       eax,esi
       cmp       rcx,rax
       ja        near ptr M04_L11
       add       r15d,r14d
       mov       rcx,rbx
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       mov       [rbx+8],r15d
       mov       [rbx+0C],edi
       jmp       short M04_L01
M04_L05:
       cmp       r8,r14
       je        short M04_L07
M04_L06:
       call      qword ptr [7FFB2D245DB8]
       int       3
M04_L07:
       cmp       ebp,1
       jne       short M04_L08
       mov       rdx,r8
       mov       rcx,offset MT_System.Int32[]
       call      qword ptr [7FFB2CD24390]
       mov       rdx,rax
       sub       edi,r15d
       mov       ecx,r15d
       mov       eax,edi
       add       rcx,rax
       mov       eax,[rdx+8]
       cmp       rcx,rax
       ja        short M04_L09
       mov       rcx,rbx
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       mov       [rbx+8],r15d
       mov       [rbx+0C],edi
       jmp       near ptr M04_L01
M04_L08:
       mov       rdx,r8
       mov       rcx,offset MT_System.Buffers.MemoryManager`1[[System.Int32, System.Private.CoreLib]]
       call      qword ptr [7FFB2CD243D8]; System.Runtime.CompilerServices.CastHelpers.ChkCastClass(Void*, System.Object)
       mov       rcx,rax
       lea       rdx,[rsp+28]
       mov       rax,[rax]
       mov       rax,[rax+40]
       call      qword ptr [rax+20]
       mov       esi,edi
       sub       esi,r15d
       mov       eax,r15d
       mov       ecx,esi
       add       rax,rcx
       mov       ecx,[rsp+34]
       cmp       rax,rcx
       jbe       short M04_L10
M04_L09:
       call      qword ptr [7FFB2D0557B8]
       int       3
M04_L10:
       mov       rdx,[rsp+28]
       mov       edi,r15d
       add       edi,[rsp+30]
       mov       rcx,rbx
       call      CORINFO_HELP_CHECKED_ASSIGN_REF
       mov       [rbx+8],edi
       mov       [rbx+0C],esi
       jmp       near ptr M04_L01
M04_L11:
       mov       ecx,21
       call      qword ptr [7FFB2D055B18]
       int       3
; Total bytes of code 477

@stephentoub
Copy link
Member

@jozkee, @mgravell, what are the next steps for this PR? Thanks.

@jozkee
Copy link
Member

jozkee commented Jan 17, 2024

@stephentoub we need to check if there's something that can be made about the TryRead and TryReadTo regressions; if not, evaluate if this a valuable trade-off.

@stephentoub
Copy link
Member

@jozkee, @mgravell, what's the plan for this? It's been sitting here for a long time. Are we going to take it or not?

@mgravell
Copy link
Member Author

I would suggest that if there's still scope for it, that we bump it to net10 - a bit late in the cycle for nuanced reader things; nobody seems massively stoked either way, though, so honestly: I'm also fine with just burning it. I can work with local similar types when needed

@stephentoub stephentoub modified the milestones: 9.0.0, 10.0.0 Jul 22, 2024
@stephentoub
Copy link
Member

Ok, let's try to either land it or close it early in 10. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants