Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace LINQ's ArrayBuilder, LargeArrayBuilder, and SparseArrayBuilder with new SegmentedArrayBuilder #96570

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

stephentoub
Copy link
Member

Following up on #90459 (comment). FYI, @neuecc.

There's a lot of code involved in these, some of which we special-case to only build into some of the targets. We can get most of the benefit with a simpler solution, which this attempts to provide. This commit removes those three types and replaces them with a SegmentedArrayBuilder. It uses stack space for the first N items (currently 8, hopefully that won't be an issue if some value type T is ridiculously large), and then uses an InlineArray of T[]s to store pool-rented arrays for each new segment it needs to allocate. Calling ToArray then produces an array of the exact right size, copying in the elements from the stack-allocated span and each of the constiuent arrays, then returning everything necessary to the pool.

The changes to ToArray also obviate the need for the old Buffer type. It existed to avoid one array allocation at the end of loading an enumerable, where we'd doubled and doubled and doubled in size, and then eventually have an array with all the data but some extra space. Now that we don't have such an array, we don't need Buffer, and can just the normal Enumerable.ToArray.

The one thing the new scheme doesn't handle as well is when there are multiple sources being added (specifically in Concat / SelectMany). Previously, the code used a complicated scheme to reserve space in the output for partial sources known to be ICollections, so it could use ICollection.CopyTo for each consistuent source. But CopyTo doesn't support partial copies, which means we can only use it if there's enough room available in an individual segment. The implementation thus tries to use it, and falls back to normal enumeration if it can't. For larger enumerations where the cost would end up being more apparent, the expectation is we'd quickly grow to a segment size where the subsequent appends are able to fit into the current segment and can thus still use the ICollection.CopyTo path. Hopefully.

My main motivation here was to reduce the complexity of having all of the various builders.

Method Toolchain count Mean Ratio Allocated Alloc Ratio
Iterator_ToArray \main\corerun.exe 1 77.21 ns 1.00 144 B 1.00
Iterator_ToArray \pr\corerun.exe 1 69.04 ns 0.90 88 B 0.61
Iterator_ToArray \main\corerun.exe 5 135.72 ns 1.00 264 B 1.00
Iterator_ToArray \pr\corerun.exe 5 104.83 ns 0.77 120 B 0.45
Iterator_ToArray \main\corerun.exe 50 583.05 ns 1.00 1200 B 1.00
Iterator_ToArray \pr\corerun.exe 50 530.17 ns 0.91 480 B 0.40
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests<>).Assembly).Run(args);

public partial class Tests<T> where T : new()
{
    private T _value = new();

    [Benchmark]
    [Arguments(1)]
    [Arguments(5)]
    [Arguments(50)]
    public T[] Iterator_ToArray(int count) => GetItems(count).ToArray();

    private IEnumerable<T> GetItems(int count)
    {
        for (int i = 0; i < count; i++) yield return _value;
    }
}

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class Int32Tests : Tests<int> { }

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class ObjectTests : Tests<object> { }

@stephentoub stephentoub added this to the 9.0.0 milestone Jan 5, 2024
@ghost
Copy link

ghost commented Jan 5, 2024

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Following up on #90459 (comment). FYI, @neuecc.

There's a lot of code involved in these, some of which we special-case to only build into some of the targets. We can get most of the benefit with a simpler solution, which this attempts to provide. This commit removes those three types and replaces them with a SegmentedArrayBuilder. It uses stack space for the first N items (currently 8, hopefully that won't be an issue if some value type T is ridiculously large), and then uses an InlineArray of T[]s to store pool-rented arrays for each new segment it needs to allocate. Calling ToArray then produces an array of the exact right size, copying in the elements from the stack-allocated span and each of the constiuent arrays, then returning everything necessary to the pool.

The changes to ToArray also obviate the need for the old Buffer type. It existed to avoid one array allocation at the end of loading an enumerable, where we'd doubled and doubled and doubled in size, and then eventually have an array with all the data but some extra space. Now that we don't have such an array, we don't need Buffer, and can just the normal Enumerable.ToArray.

The one thing the new scheme doesn't handle as well is when there are multiple sources being added (specifically in Concat / SelectMany). Previously, the code used a complicated scheme to reserve space in the output for partial sources known to be ICollections, so it could use ICollection.CopyTo for each consistuent source. But CopyTo doesn't support partial copies, which means we can only use it if there's enough room available in an individual segment. The implementation thus tries to use it, and falls back to normal enumeration if it can't. For larger enumerations where the cost would end up being more apparent, the expectation is we'd quickly grow to a segment size where the subsequent appends are able to fit into the current segment and can thus still use the ICollection.CopyTo path. Hopefully.

My main motivation here was to reduce the complexity of having all of the various builders.

Method Toolchain count Mean Ratio Allocated Alloc Ratio
Iterator_ToArray \main\corerun.exe 1 77.21 ns 1.00 144 B 1.00
Iterator_ToArray \pr\corerun.exe 1 69.04 ns 0.90 88 B 0.61
Iterator_ToArray \main\corerun.exe 5 135.72 ns 1.00 264 B 1.00
Iterator_ToArray \pr\corerun.exe 5 104.83 ns 0.77 120 B 0.45
Iterator_ToArray \main\corerun.exe 50 583.05 ns 1.00 1200 B 1.00
Iterator_ToArray \pr\corerun.exe 50 530.17 ns 0.91 480 B 0.40
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests<>).Assembly).Run(args);

public partial class Tests<T> where T : new()
{
    private T _value = new();

    [Benchmark]
    [Arguments(1)]
    [Arguments(5)]
    [Arguments(50)]
    public T[] Iterator_ToArray(int count) => GetItems(count).ToArray();

    private IEnumerable<T> GetItems(int count)
    {
        for (int i = 0; i < count; i++) yield return _value;
    }
}

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class Int32Tests : Tests<int> { }

[HideColumns("Error", "StdDev", "Median", "RatioSD", "Job")]
[MemoryDiagnoser(false)]
public class ObjectTests : Tests<object> { }
Author: stephentoub
Assignees: -
Labels:

area-System.Linq

Milestone: 9.0.0

There's a lot of code involved in these, some of which we special-case to only build into some of the targets, and it's unnecessarily complicated.  We can get most of the benefit with a simpler solution, which this attempts to provide. This commit removes those three types and replaces them with a SegmentedArrayBuilder.

The changes to ToArray also obviate the need for the old Buffer type. It existed to avoid one array allocation at the end of loading an enumerable, where we'd doubled and doubled and doubled in size, and then eventually have an array with all the data but some extra space. Now that we don't have such an array, we don't need Buffer, and can just the normal Enumerable.ToArray.

The one thing the new scheme doesn't handle as well is when there are multiple sources being added (specifically in Concat / SelectMany). Previously, the code used a complicated scheme to reserve space in the output for partial sources known to be ICollections, so it could use ICollection.CopyTo for each consistuent source. But CopyTo doesn't support partial copies, which means we can only use it if there's enough room available in an individual segment. The implementation thus tries to use it, and falls back to normal enumeration if it can't. For larger enumerations where the cost would end up being more apparent, the expectation is we'd quickly grow to a segment size where the subsequent appends are able to fit into the current segment. Hopefully.
@eiriktsarpalis eiriktsarpalis added the tenet-performance Performance related issue label Jan 8, 2024
Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for this year's perf blog it seems? :-)

@stephentoub stephentoub merged commit f44d62c into dotnet:main Jan 8, 2024
109 of 111 checks passed
@stephentoub stephentoub deleted the linqtoarray branch January 8, 2024 19:15
@eerhardt
Copy link
Member

@stephentoub, we are seeing a slight app size regression in the ASP.NET benchmarks that I think might be caused by this change.

The regression happened between 1063fc2...b4ec422.

I got the mstat files before and after, comparing them in sizoscope shows:

image

Is this expected? The size regression is about 2% (roughly 200KB of 10MB).

Here are the the mstat files I got from the benchmark server, in case someone wants to take a look.
BasicMinimalApi.zip

@eerhardt eerhardt mentioned this pull request Jan 10, 2024
@stephentoub
Copy link
Member Author

Is this expected?

I actually expected the opposite.

Do you know where those additional interface implementations are coming from?

The array pool ones are expected, in that we're renting arrays as part of ToArray, so we'll get ArrayPool types for T[]s being created by ToArray.

@stephentoub
Copy link
Member Author

stephentoub commented Jan 10, 2024

Ah, looked at your sizeoscope files... they're all because of using ArrayPool in ToArray now. ArrayPool itself has this, for example:


so for every distinct T used with the pool, you also get a distinct SharedArrayPool<T>.Partitions.Partition[] type and all of the interfaces it implements.

@stephentoub
Copy link
Member Author

stephentoub commented Jan 11, 2024

@MichalStrehovsky, is there any way (in corelib) to say "i promise this T[] will never be cast to any of the interfaces it implements, feel free not to keep that code around if it wouldn't otherwise be needed"? Or is there any kind of analysis we could be doing to automatically see that it's never used in that capacity?

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky, is there any way (in corelib) to say "i promise this T[] will never be cast to any of the interfaces it implements, feel free not to keep that code around if it wouldn't otherwise be needed"?

We don't have that. We could build that; it sounds a bit like it would open a can of worms though.

A different can of worms (with different problems) for consideration:

static T[] AllocateNewFrugalArray<T>(int count)
    => typeof(T).IsValueType ? new T[count] : Unsafe.As<T[]>(new object[count]);

Or is there any kind of analysis we could be doing to automatically see that it's never used in that capacity?

We already do this analysis, but interfaces on reference type arrays are difficult to get rid of due to array covariance. The app might not be using IList<string>, but if it uses IList<object>, that could end up calling IList<string> methods on string[]. It doesn't even matter that IList<T> is not variant because the interface has magic behaviors on arrays.

@stephentoub
Copy link
Member Author

Thanks. I should be able to just use object[].

@neuecc
Copy link

neuecc commented Jan 12, 2024

thanks for great implementation!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants