Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize storage for segment information #51383

Merged
merged 5 commits into from
Feb 23, 2021

Conversation

sharwell
Copy link
Member

@sharwell sharwell commented Feb 21, 2021

  • Closes the gap for using SegmentedArray<int> (and other unmanaged types) from 2.69x the cost of int[] to 1.29x on .NET Framework
  • Reduces the penalty for using SegmentedArray<object> (and other reference types), particularly on .NET Framework

Before this change

net472

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4300.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4300.0), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 78.29 μs 0.133 μs 0.104 μs 1.00 0.00 107 B
'object[]' 100000 180.58 μs 0.285 μs 0.238 μs 2.31 0.00 113 B
SegmentedArray<int> 100000 358.65 μs 1.645 μs 1.458 μs 4.58 0.02 144 B
SegmentedArray<object> 100000 1,381.35 μs 7.889 μs 7.379 μs 17.62 0.08 224 B

netcoreapp3.1

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
  DefaultJob : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 133.1 μs 0.75 μs 0.70 μs 1.00 0.00 107 B
'object[]' 100000 155.7 μs 0.66 μs 0.62 μs 1.17 0.01 113 B
SegmentedArray<int> 100000 221.8 μs 0.74 μs 0.69 μs 1.67 0.01 219 B
SegmentedArray<object> 100000 554.6 μs 2.43 μs 2.15 μs 4.16 0.03 275 B

net5.0

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  DefaultJob : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 133.2 μs 0.62 μs 0.58 μs 1.00 0.00 107 B
'object[]' 100000 155.7 μs 0.77 μs 0.65 μs 1.17 0.01 113 B
SegmentedArray<int> 100000 202.8 μs 0.46 μs 0.41 μs 1.52 0.01 213 B
SegmentedArray<object> 100000 448.2 μs 1.89 μs 1.68 μs 3.37 0.02 279 B

After this change

net472

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4300.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4300.0), X64 RyuJIT

Method Count Mean Error StdDev Ratio Code Size
'int[]' 100000 133.9 μs 0.25 μs 0.21 μs 1.00 107 B
'object[]' 100000 157.3 μs 0.82 μs 0.73 μs 1.18 113 B
SegmentedArray<int> 100000 172.6 μs 0.65 μs 0.61 μs 1.29 205 B
SegmentedArray<object> 100000 531.9 μs 1.88 μs 1.66 μs 3.97 233 B

netcoreapp3.1

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
  DefaultJob : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 133.2 μs 0.65 μs 0.61 μs 1.00 0.00 107 B
'object[]' 100000 155.7 μs 1.04 μs 0.97 μs 1.17 0.01 113 B
SegmentedArray<int> 100000 172.3 μs 0.59 μs 0.55 μs 1.29 0.01 205 B
SegmentedArray<object> 100000 486.8 μs 1.55 μs 1.37 μs 3.66 0.02 233 B

net5.0

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  DefaultJob : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 77.69 μs 0.292 μs 0.273 μs 1.00 0.00 107 B
'object[]' 100000 155.04 μs 0.427 μs 0.378 μs 2.00 0.01 113 B
SegmentedArray<int> 100000 166.89 μs 0.550 μs 0.515 μs 2.15 0.01 203 B
SegmentedArray<object> 100000 434.87 μs 1.560 μs 1.459 μs 5.60 0.03 232 B

@sharwell sharwell marked this pull request as ready for review February 21, 2021 06:24
@sharwell sharwell requested review from a team as code owners February 21, 2021 06:24
@sharwell
Copy link
Member Author

sharwell commented Feb 21, 2021

I was able to further optimize the net5.0 case by avoiding array variance checks.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  DefaultJob : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Method Count Mean Error StdDev Ratio RatioSD Code Size
'int[]' 100000 77.64 μs 0.440 μs 0.411 μs 1.00 0.00 107 B
'object[]' 100000 154.76 μs 0.519 μs 0.486 μs 1.99 0.01 113 B
SegmentedArray<int> 100000 172.20 μs 0.702 μs 0.622 μs 2.22 0.01 208 B
SegmentedArray<object> 100000 287.53 μs 0.865 μs 0.809 μs 3.70 0.02 225 B

The conditional code in the indexer does appear to have influenced the performance of SegmentedArray<int>, but only by a small amount.

private static int SegmentSize
{
[MethodImpl(SegmentedArrayHelper.FastPathMethodImplOptions)]
get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get [](start = 12, length = 3)

Use expression bodies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using [method: MethodImpl(...)] with expression bodied properties, but it didn't compile.

using BenchmarkDotNet.Attributes;
using Microsoft.CodeAnalysis.Collections;

namespace IdeCoreBenchmarks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate us checking in benchmarks with the improvements as you've done here.

[MethodImpl(FastPathMethodImplOptions)]
internal static int GetSegmentSize<T>()
{
if (Unsafe.SizeOf<T>() == Unsafe.SizeOf<object>())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach means our performance will differ between architectures. For instance the int will be considered a reference type 32 bit but not in 64 bit. Why aren't we using typeof(T).IsValueType here?

Copy link
Member Author

@sharwell sharwell Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real input to the whole segmenting algorithm is the value of Unsafe.SizeOf<T>(). For value types, generic specialization in the JIT means the value for any given value type will be inlined. For reference types, the JIT produces one common assembly implementation, so to avoid overhead we want all reference types to be using the same field reference for this value (both to avoid a lookup table for all the instantiations, and to improve the ability of the JIT to inline the indexer. Value types that happen to have the same size as a reference type are correct on both possible paths, so it doesn't matter which one we use.

typeof(T).IsValueType isn't a JIT intrinsic prior to net5.0, so if we try to use it for earlier targets we get something like a 20x regression. Unsafe.SizeOf<T>() behaves as an intrinsic (runtime constant) for all targets, so it gives good all-around results.

src/Dependencies/Collections/SegmentedArray`1.cs Outdated Show resolved Hide resolved
src/Dependencies/Collections/SegmentedArray`1.cs Outdated Show resolved Hide resolved
There were two situations where the previous code would allow net5.0
execution to access memory outside array bounds:

1. Replacement of an element in _items with a shorter array. Even though
_items is indirectly exposed through SyncRoot, this is not a serious
concern because replacement of an array element means unsafe/bad
bytecode is already running in the process to perform the replacement.

2. A torn read of SegmentedArray<T> could allow a read of the _items
field of one structure and the _length field of a different structure.
This is a concern because it allows concurrent execution to reach a
state where memory safety is violated without a precondition that unsafe
code be running.

This reverts commit 15e1fd3.
@sharwell sharwell merged commit fcbb568 into dotnet:master Feb 23, 2021
@sharwell sharwell deleted the faster-segments branch February 23, 2021 19:06
@ghost ghost added this to the Next milestone Feb 23, 2021
@allisonchou allisonchou modified the milestones: Next, 16.10.P2 Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants