Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ToDictionary() selectors parallel. #100590

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lateapexearlyspeed
Copy link
Contributor

Fix #96262

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 3, 2024
@jozkee
Copy link
Member

jozkee commented May 20, 2024

As stated in #96262 (comment), this will likely regress cases with a trivial key selector. @lateapexearlyspeed could you please provide a benchmark to see how bad it hits it so we can do an informed decision?

@jozkee jozkee added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 20, 2024
@lateapexearlyspeed
Copy link
Contributor Author

@jozkee following is benchmark result of trivial key (and element) selector case. Toolchain in "runtime-main" is library before change and toolchain in "runtime" is library after change.

// * Summary *

BenchmarkDotNet=v0.13.1.1603-nightly, OS=Windows 10.0.22631
12th Gen Intel Core i7-12800H, 1 CPU, 20 logical and 14 physical cores
.NET SDK=8.0.202
[Host] : .NET 6.0.31 (6.0.3124.26714), X64 RyuJIT
Job-XVKCDM : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT
Job-ODFYHP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain ItemCount Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
TestParallelToDictionary Job-XVKCDM \runtime-main\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 1 42.00 ns 9.363 ns 10.782 ns 35.46 ns 32.76 ns 62.01 ns 1.00 0.00 0.0298 - - 376 B
TestParallelToDictionary Job-ODFYHP \runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 1 7,778.30 ns 149.193 ns 139.555 ns 7,773.13 ns 7,457.77 ns 8,025.18 ns 214.43 27.42 0.6916 - - 8,896 B
TestParallelToDictionary Job-XVKCDM \runtime-main\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 10 120.94 ns 2.399 ns 2.356 ns 121.89 ns 116.36 ns 124.39 ns 1.00 0.00 0.0765 - - 960 B
TestParallelToDictionary Job-ODFYHP \runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 10 8,034.68 ns 190.599 ns 219.495 ns 8,038.56 ns 7,658.90 ns 8,423.41 ns 66.69 2.11 0.7844 - - 9,984 B
TestParallelToDictionary Job-XVKCDM \runtime-main\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 100 1,054.74 ns 94.978 ns 109.377 ns 1,025.83 ns 905.00 ns 1,335.60 ns 1.00 0.00 0.6014 0.0073 - 7,576 B
TestParallelToDictionary Job-ODFYHP \runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 100 10,700.76 ns 295.269 ns 328.191 ns 10,768.11 ns 10,152.59 ns 11,369.07 ns 10.20 1.00 1.5175 - - 18,920 B
TestParallelToDictionary Job-XVKCDM \runtime-main\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 1000 8,536.74 ns 206.045 ns 237.282 ns 8,593.82 ns 8,068.56 ns 8,911.05 ns 1.00 0.00 5.8294 0.6955 - 73,352 B
TestParallelToDictionary Job-ODFYHP \runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 1000 45,359.53 ns 1,780.828 ns 2,050.804 ns 44,628.87 ns 42,440.57 ns 50,316.83 ns 5.31 0.22 8.3220 1.3587 - 104,118 B
TestParallelToDictionary Job-XVKCDM \runtime-main\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 10000 175,304.48 ns 5,273.794 ns 6,073.309 ns 174,695.64 ns 166,454.45 ns 186,023.27 ns 1.00 0.00 124.3351 124.3351 124.3351 673,290 B
TestParallelToDictionary Job-ODFYHP \runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe 10000 425,559.34 ns 21,829.962 ns 25,139.417 ns 422,713.75 ns 382,274.84 ns 461,048.75 ns 2.43 0.20 131.2500 131.2500 123.4375 848,901 B

        [Benchmark]
        public void TestParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().ToDictionary(i => i, i => i);
        }

        [Params(1, 10, 100, 1000, 10000)]
        public int ItemCount { get; set; }

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 29, 2024
@stephentoub
Copy link
Member

@EgorBot -amd

using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Tests>(args: args);

public class Tests
{
        [Benchmark]
        public void TestParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().ToDictionary(i => i, i => i);
        }

        [Benchmark]
        public void TestSelectParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().Select(i => i).ToDictionary(i => i, i => i);
        }

        [Benchmark]
        public void TestWhereSelectParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().Where(i => true).Select(i => i).ToDictionary(i => i, i => i);
        }

        [Params(1, 10, 100, 1000, 10000)]
        public int ItemCount { get; set; }
}

@stephentoub
Copy link
Member

@EgorBo, what's the invocation syntax for your bot?

@EgorBo
Copy link
Member

EgorBo commented May 29, 2024

@EgorBo, what's the invocation syntax for your bot?

I typically use -intel -arm64 -profiler [the rest of the args are directly passed to BDN app e.g. --disasm]
it means "run for Intel (Xeon) and Arm64 (Ampere), with profiler and with the given BDN args".
There is also -amd if you need AMD Epyc (Milano) specific results. All args are optional (-intel is used by default when no target specified, mostly, becuase it's the fastest VM instance I have)

Example: #102805 (comment)

-profiler is not useful when you have multiple [Benchmark]s.

@stephentoub
Copy link
Member

@EgorBot -intel -arm64

using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Tests>(args: args);

public class Tests
{
        [Benchmark]
        public void TestParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().ToDictionary(i => i, i => i);
        }

        [Benchmark]
        public void TestSelectParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().Select(i => i).ToDictionary(i => i, i => i);
        }

        [Benchmark]
        public void TestWhereSelectParallelToDictionary()
        {
            IEnumerable<int> collection = Enumerable.Range(0, ItemCount);
            Dictionary<int, int> _ = collection.AsParallel().Where(i => true).Select(i => i).ToDictionary(i => i, i => i);
        }

        [Params(1, 10, 100, 1000, 10000)]
        public int ItemCount { get; set; }
}

@stephentoub
Copy link
Member

@EgorBo, what's the invocation syntax for your bot?

I typically use -intel -arm64 -profiler [the rest of the args are directly passed to BDN app e.g. --disasm] it means "run for Intel (Xeon) and Arm64 (Ampere), with profiler and with the given BDN args". There is also -amd if you need AMD Epyc (Milano) specific results. All args are optional (-intel is used by default when no target specified, mostly, becuase it's the fastest VM instance I have)

Example: #102805 (comment)

-profiler is not useful when you have multiple [Benchmark]s.

Thanks. What am I doing wrong that it's not kicking in?

@EgorBo
Copy link
Member

EgorBo commented May 29, 2024

@EgorBo, what's the invocation syntax for your bot?

I typically use -intel -arm64 -profiler [the rest of the args are directly passed to BDN app e.g. --disasm] it means "run for Intel (Xeon) and Arm64 (Ampere), with profiler and with the given BDN args". There is also -amd if you need AMD Epyc (Milano) specific results. All args are optional (-intel is used by default when no target specified, mostly, becuase it's the fastest VM instance I have)
Example: #102805 (comment)
-profiler is not useful when you have multiple [Benchmark]s.

Thanks. What am I doing wrong that it's not kicking in?

It's correct as is (I already see 3 16-core VMs allocated for these requests😆) - it's just that currently there is no feedback that the jobs have started yet

@stephentoub
Copy link
Member

it's just that currently there is no feedback that the jobs have started yet

Ah! Ok. I'm used to the other bots that post a comment indicating that they've received the request. Thanks.

@EgorBot
Copy link

EgorBot commented May 29, 2024

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-UDSLNT : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-JPHOCC : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain ItemCount Mean Error Ratio
TestParallelToDictionary Main 1 105.8 ns 0.90 ns 1.00
TestParallelToDictionary PR 1 8,586.0 ns 47.85 ns 81.20
TestSelectParallelToDictionary Main 1 8,611.2 ns 37.32 ns 1.00
TestSelectParallelToDictionary PR 1 8,805.7 ns 72.94 ns 1.02
TestWhereSelectParallelToDictionary Main 1 10,038.1 ns 45.88 ns 1.00
TestWhereSelectParallelToDictionary PR 1 10,348.0 ns 60.22 ns 1.03
TestParallelToDictionary Main 10 390.3 ns 3.63 ns 1.00
TestParallelToDictionary PR 10 9,635.3 ns 64.86 ns 24.69
TestSelectParallelToDictionary Main 10 9,300.1 ns 86.65 ns 1.00
TestSelectParallelToDictionary PR 10 9,630.1 ns 163.45 ns 1.03
TestWhereSelectParallelToDictionary Main 10 10,586.7 ns 67.26 ns 1.00
TestWhereSelectParallelToDictionary PR 10 11,616.0 ns 96.22 ns 1.10
TestParallelToDictionary Main 100 2,414.5 ns 5.49 ns 1.00
TestParallelToDictionary PR 100 14,295.5 ns 110.44 ns 5.92
TestSelectParallelToDictionary Main 100 13,759.9 ns 110.77 ns 1.00
TestSelectParallelToDictionary PR 100 14,346.2 ns 179.50 ns 1.04
TestWhereSelectParallelToDictionary Main 100 15,354.5 ns 148.65 ns 1.00
TestWhereSelectParallelToDictionary PR 100 16,238.8 ns 137.54 ns 1.06
TestParallelToDictionary Main 1000 22,040.0 ns 404.10 ns 1.00
TestParallelToDictionary PR 1000 49,659.3 ns 992.25 ns 2.33
TestSelectParallelToDictionary Main 1000 50,769.5 ns 992.20 ns 1.00
TestSelectParallelToDictionary PR 1000 52,010.0 ns 896.11 ns 1.02
TestWhereSelectParallelToDictionary Main 1000 52,453.1 ns 496.28 ns 1.00
TestWhereSelectParallelToDictionary PR 1000 51,200.5 ns 1,087.75 ns 1.05
TestParallelToDictionary Main 10000 317,249.3 ns 4,007.57 ns 1.00
TestParallelToDictionary PR 10000 598,908.9 ns 3,963.41 ns 1.89
TestSelectParallelToDictionary Main 10000 552,960.2 ns 9,053.46 ns 1.00
TestSelectParallelToDictionary PR 10000 599,659.8 ns 3,316.79 ns 1.08
TestWhereSelectParallelToDictionary Main 10000 560,577.1 ns 5,482.61 ns 1.00
TestWhereSelectParallelToDictionary PR 10000 611,083.7 ns 6,500.18 ns 1.09

BDN_Artifacts.zip

@EgorBot
Copy link

EgorBot commented May 29, 2024

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-DMAJRO : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-CCUEBP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain ItemCount Mean Error Ratio
TestParallelToDictionary Main 1 68.87 ns 0.774 ns 1.00
TestParallelToDictionary PR 1 9,353.72 ns 182.088 ns 135.81
TestSelectParallelToDictionary Main 1 9,613.30 ns 187.263 ns 1.00
TestSelectParallelToDictionary PR 1 9,438.78 ns 185.942 ns 0.98
TestWhereSelectParallelToDictionary Main 1 10,173.87 ns 202.386 ns 1.00
TestWhereSelectParallelToDictionary PR 1 11,065.01 ns 217.562 ns 1.10
TestParallelToDictionary Main 10 235.89 ns 2.557 ns 1.00
TestParallelToDictionary PR 10 9,453.04 ns 182.520 ns 40.04
TestSelectParallelToDictionary Main 10 9,254.72 ns 157.309 ns 1.00
TestSelectParallelToDictionary PR 10 10,072.27 ns 199.426 ns 1.11
TestWhereSelectParallelToDictionary Main 10 10,812.37 ns 264.585 ns 1.00
TestWhereSelectParallelToDictionary PR 10 11,822.02 ns 200.344 ns 1.11
TestParallelToDictionary Main 100 1,676.41 ns 26.051 ns 1.00
TestParallelToDictionary PR 100 13,394.86 ns 256.131 ns 7.99
TestSelectParallelToDictionary Main 100 12,813.12 ns 255.738 ns 1.00
TestSelectParallelToDictionary PR 100 13,425.19 ns 240.527 ns 1.05
TestWhereSelectParallelToDictionary Main 100 13,975.55 ns 271.767 ns 1.00
TestWhereSelectParallelToDictionary PR 100 14,356.32 ns 286.720 ns 1.03
TestParallelToDictionary Main 1000 14,119.17 ns 175.553 ns 1.00
TestParallelToDictionary PR 1000 33,632.66 ns 655.722 ns 2.39
TestSelectParallelToDictionary Main 1000 31,916.77 ns 507.908 ns 1.00
TestSelectParallelToDictionary PR 1000 32,717.81 ns 633.848 ns 1.02
TestWhereSelectParallelToDictionary Main 1000 34,647.44 ns 684.606 ns 1.00
TestWhereSelectParallelToDictionary PR 1000 42,381.94 ns 2,569.036 ns 1.28
TestParallelToDictionary Main 10000 216,167.31 ns 341.834 ns 1.00
TestParallelToDictionary PR 10000 383,979.18 ns 6,465.315 ns 1.78
TestSelectParallelToDictionary Main 10000 375,850.28 ns 4,573.328 ns 1.00
TestSelectParallelToDictionary PR 10000 396,121.31 ns 5,709.440 ns 1.05
TestWhereSelectParallelToDictionary Main 10000 376,878.77 ns 7,402.242 ns 1.00
TestWhereSelectParallelToDictionary PR 10000 391,383.93 ns 7,662.109 ns 1.03

BDN_Artifacts.zip

@ericstj
Copy link
Member

ericstj commented Nov 4, 2024

It looks like @EgorBo's test results show some significant regressions. Am I reading that correctly @jozkee @stephentoub?

@stephentoub
Copy link
Member

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Linq.Parallel community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParallelEnumerable.ToDictionary does not parallelize
6 participants