-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParallelEnumerable.ToDictionary does not parallelize #96262
Comments
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsDescriptionI would expect the following to be a valid pattern, but it does not lead to parallel execution of ComputeExpensiveThing():
Reproduction Steps
Expected behaviorFrom a parallelism perspective, I would expect
to be equivalent to:
For example,
Actual behaviorkeySelector and elementSelector execute serially Regression?No response Known WorkaroundsAs shown above, pushing any non-trivial selectors up one layer into a Configuration.NET 8 Windows 11 x64 Other information
|
Tagging subscribers to this area: @dotnet/area-system-linq-parallel Issue DetailsDescriptionI would expect the following to be a valid pattern, but it does not lead to parallel execution of ComputeExpensiveThing():
Reproduction Steps
Expected behaviorFrom a parallelism perspective, I would expect
to be equivalent to:
For example,
Actual behaviorkeySelector and elementSelector execute serially Regression?No response Known WorkaroundsAs shown above, pushing any non-trivial selectors up one layer into a Configuration.NET 8 Windows 11 x64 Other information
|
If memory serves, the decision not to parallelize the selectors is because they're almost always trivial, e.g. just accessing a property from the element to select, and parallelizing it would make the 99% case slower. What led to this issue? I realize it's of course possible for a non-trivial selector to be used, but could you highlight real-world examples of this being done with PLINQ? As is highlighted, a developer can always choose to use a Select instead if they in fact have an expensive selector. |
@stephentoub A pattern I’ve found myself wanting to use fairly often with PLINQ is that I have N items and for each item I want to compute some expensive value. It’s convenient to build a dictionary that maps from the items to the values for use in downstream computations. It’s been a while but I believe the case that prompted me to post this was related to processing large vectors. I had a set of N long vectors (represented as float arrays) and for each one I needed to compute a statistic that required comparing that vector to M other vectors in a larger set. It felt natural to PLINQ a dictionary mapping the vectors to the statistics and then process from there. Since then, another case I ran into was implementing a generic algorithm style optimization where I had N chromosomes and for each one I needed to run a fitness function. And associate that result wi the the chromosome. For both of these, there are obviously other ways to do it but I believe it would be preferable from a “least surprise” perspective if the various PLINQ operators were consistent in terms of parallelism. How much overhead would consistent parallelism really add in this case (especially given that we can assume the query involves parallelism already)? I would assume that this would just tack one more selector to the chain of operations that is already running in parallel. |
We could test the overhead when there's already parallel work happening and this would indeed be appending an additional item. Possibly we could allow the ToDictionary selector to be parallel in that case but not in the case where there's not already parallel work happening. But it would really come down to measurements... there's going to be significant overhead just in the AsParallel().ToDictionary(...) case if the selectors are trivial, as is often the case (I understand it's not true in your cited case). |
If a user calls |
There's a ton of inappropriate use of AsParallel out there, where folks add AsParallel to say "make this fast" even if parallelism is the wrong answer. Making such use way slower is indeed problematic. |
Is it at all convincing that as far as I’ve tested every other PLINQ operator does parallelize its selector (Select, Where, Sum, Min, Max, Any, All, Average, Count, etc), and does incur significant, measurable overhead on someone who just throws in AsParallel() when it isn’t needed? I’m also curious whether there is any usage data that would inform whether the impact of penalizing folks who are misusing AsParallel().ToDictionary() would be counterbalanced by helping those who are using it “correctly” and silently not getting the parallelism they’d expect. |
Description
I would expect the following to be a valid pattern, but it does not lead to parallel execution of ComputeExpensiveThing():
Reproduction Steps
Expected behavior
From a parallelism perspective, I would expect
items.AsParallel().ToDictionary(i => ..., i => ...)
to be equivalent to:
items.AsParallel().Select(i => KeyValuePair.Create(..., ...)).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
For example,
ParallelEnumerable.Min()
parallelizes its selector:Actual behavior
keySelector and elementSelector execute serially
Regression?
No response
Known Workarounds
As shown above, pushing any non-trivial selectors up one layer into a
Select
call gets around the issue.Configuration
.NET 8 Windows 11 x64
Other information
ParallelEnumerable.ToDictionary
has a serial loop in which the selectors are executed.The text was updated successfully, but these errors were encountered: