Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IListProvider<T>.ToList() implementations use list.Add(element) instead of direct array initialization. #80760

Closed
neon-sunset opened this issue Jan 18, 2023 · 14 comments · Fixed by #86796
Assignees
Labels
Milestone

Comments

@neon-sunset
Copy link
Contributor

neon-sunset commented Jan 18, 2023

Description

IListProvider<T>.ToList() implementations in System.Linq for iterators of known lengths use list.Add(element) to initialize the list over directly writing to its underlying array.

This is to not forget to update the implementations once any of the following is done: #80311, #55217 or #80756.

Configuration

All versions

Regression?

No

Analysis

Once we have an API for direct List<T> initialization (aside from new List<T>(collection)), it's a good idea to utilize it in BCL to reduce the overhead.

@neon-sunset neon-sunset added the tenet-performance Performance related issue label Jan 18, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 18, 2023
@ghost
Copy link

ghost commented Jan 18, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

IListProvider<T>.ToList() implementations in System.Linq for iterators of known lengths use list.Add(element) to initialize the list over directly writing to its underlying array.

This is to not forget to update the implementations once #80311 / #80311 or #80756 is closed.

Configuration

All versions

Regression?

No

Analysis

Once we have an API for direct List<T> initialization (aside from new List<T>(collection)), it's a good idea to utilize it in BCL to reduce the overhead.

Author: neon-sunset
Assignees: -
Labels:

area-System.Collections, tenet-performance

Milestone: -

@eiriktsarpalis
Copy link
Member

Which implementations do you specifically have in mind? I'm looking at the implementation of Select and it seems to be specializing for arrays and IList types:

public List<TResult> ToList()
{
int count = _source.Count;
var results = new List<TResult>(count);
for (int i = 0; i < count; i++)
{
results.Add(_selector(_source[i]));
}
return results;
}

If you could identify the implementations that could be optimized we might take a look at accepting a PR that improves things, provided it does not regress assembly size substantially.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jan 18, 2023
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Jan 18, 2023
@tarekgh tarekgh added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 18, 2023
@ghost
Copy link

ghost commented Jan 18, 2023

This issue has been marked needs-author-action and may be missing some important information.

@neon-sunset
Copy link
Contributor Author

neon-sunset commented Jan 18, 2023

If you could identify the implementations that could be optimized we might take a look at accepting a PR that improves things, provided it does not regress assembly size substantially.

As you noted, implementations do specialize ToList(). However, for example, RangeIterator's ToList() is 2-3 times slower than it could be because of having to go through list.Add(i) call. Same applies to SelectArrayIterator and other specializations of known lengths, albeit impact is less significant especially when _selector is not inlined with DynamicPGO.

Once BCL gets #55217 (or any of the alternatives), the performance can be improved by removing such overhead. This issue tracks this optimization.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jan 18, 2023
@eiriktsarpalis
Copy link
Member

However, for example, RangeIterator's ToList() is 2-3 times slower than it could be because of having to go through list.Add(i) call.

Can you clarify? Looks like it does presize the list:

public List<int> ToList()
{
List<int> list = new List<int>(_end - _start);
for (int cur = _start; cur != _end; cur++)
{
list.Add(cur);
}
return list;
}

@stephentoub
Copy link
Member

Looks like it does presize the list:

It presizes the capacity, but not the count, and as such it still goes through the Add call to then fill in each member. They're suggesting that instead of:

     for (int cur = _start; cur != _end; cur++) 
     { 
         list.Add(cur); 
     } 

which will incur a bounds check, a version increment, etc. on every Add, that we could instead do something like that:

CollectionsMarshal.SetCount(list, _end - _start);
Span<int> span = CollectionsMarshal.AsSpan(list);
for (int i = 0; i < span.Length; i++)
{
    span[i] = i + _start;
}

@eiriktsarpalis
Copy link
Member

Sounds reasonable. These components are constrained to the *.SpeedOpt.cs files so I think we could afford to make that improvement there.

@neon-sunset would you be interested in contributing a PR?

@eiriktsarpalis eiriktsarpalis removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jan 19, 2023
@stephentoub
Copy link
Member

It's not possible to do until SetCount is added, and PR #77794 was closed. Do we have a plan for finishing that?

@eiriktsarpalis
Copy link
Member

I think we do. Seems it was closed by the bot for inactivity.

@neon-sunset
Copy link
Contributor Author

would you be interested in contributing a PR?

Yes, thanks, this was the intention when creating this issue, to not miss it when it becomes possible.

@eiriktsarpalis eiriktsarpalis added the blocked Issue/PR is blocked on something - see comments label Jan 20, 2023
@theodorzoulias
Copy link
Contributor

It is already possible to initialize a List<T> with default(T) values, by using a custom ICollection<T> that has a CopyTo implemented as a no-op. Not as ideal as the CollectionsMarshal.SetCount, but possible.

@brantburnett
Copy link
Contributor

It appears that the blocker has been resolved, CollectionsMarshal.SetCount has been merged. Would anyone mind if I worked on this?

@stephentoub
Copy link
Member

stephentoub commented May 24, 2023

It appears that the blocker has been resolved, CollectionsMarshal.SetCount has been merged. Would anyone mind if I worked on this?

Thanks. You're welcome to if you find opportunity. I think most of it has been done, however. This probably should have been closed as completed.

@stephentoub stephentoub removed the blocked Issue/PR is blocked on something - see comments label May 24, 2023
@brantburnett
Copy link
Contributor

Thanks. You're welcome to if you find opportunity. I think most of it has been done, however. This probably should have been closed as completed.

I'm still finding a few spots I think we can optimize if we feel it's warranted. These are locations where the list has a known size but where we must iterate the source using foreach. The three main spots I see are:

  • SelectIPartitionIterator.ToList (this optimization is applied to ToArray but not ToList)
  • Enumerable.HashSetToList (used by DistinctIterator.ToList and UnionIterator.ToList after the HashSet is built)
  • Lookup<TKey, TElement>.ToList for grouping

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 26, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 30, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants