Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: CollectionsMarshal.CreateList<T>(T[]) #80311

Closed
Rekkonnect opened this issue Jan 6, 2023 · 15 comments
Closed

[API Proposal]: CollectionsMarshal.CreateList<T>(T[]) #80311

Rekkonnect opened this issue Jan 6, 2023 · 15 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@Rekkonnect
Copy link
Contributor

Rekkonnect commented Jan 6, 2023

Background and motivation

The System.Collections.Generic.List<T> collection does not provide a way to begin its lifetime from a given underlying array buffer backing the contents of the list. The best options are:

  • for safe paths: to create a new list from a given array and copy the contents manually
  • for dangerously-living people: mimic the layout of the List<T> class and alter the contents through exposed properties

To avoid resorting to suboptimal or outright hacky solutions, this API enables a performant and trustworthy solution, while being hidden under the CollectionsMarshal class. While the unsafe solution proposed above is sometimes viable and works deceivingly well for the given layout of the class, it always relies on a per-framework basis, and may not be considered stable. This is unwanted for a solution to a problem around a very commonly-used data structure.

API Proposal

using System.Collections.Generic;

namespace System.Runtime.InteropServices;

public static class CollectionsMarshal
{
    // Creates a list with the given underlying array buffer, with the Count property being equal to the array's length
    public static List<T> CreateList<T>(T[] arrayBuffer);
    // Creates a list with the given underlying array buffer, with the Count property manually specified
    public static List<T> CreateList<T>(T[] arrayBuffer, int listCount);
}

API Usage

// Create a buffer for 1000 elements
var buffer = new int[1000];
const int count = 578;
// Only fill the first 578 elements
for (int i = 0; i < count; i++)
{
    buffer[i] = i * i - 2;
}

var list = CollectionsMarshal.CreateList(buffer, count);
// list.Count == 578
// list.Capacity == 1000

Alternative Designs

As mentioned above, alternatives include:

  • the safer option to create a new list from a given array and copy the contents manually
  • the dangerous option to mimic the layout of the List<T> class and alter the contents through exposed properties

Risks

No breaking changes. This API provides a safer and built-in way for a performance-oriented improvement on the subject of constructing a list.

@Rekkonnect Rekkonnect added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 6, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 6, 2023
@ghost
Copy link

ghost commented Jan 6, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The [System.Collections.Generic.List<T>](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs) collection does not provide a way to begin its lifetime from a given underlying array buffer backing the contents of the list. The best options are:

  • for safe paths: to create a new list from a given array and copy the contents manually
  • for dangerously-living people: mimic the layout of the List class and alter the contents through exposed properties

To avoid resorting to suboptimal or outright hacky solutions, this API enables a performant and trustworthy solution, while being hidden under the CollectionsMarshal class. While the unsafe solution proposed above is viable and works 100% for the given layout of the class, it always relies on a per-framework basis. This is unwanted for a solution to a problem around a very commonly-used data structure.

API Proposal

using System.Collections.Generic;

namespace System.Runtime.InteropServices;

public static class CollectionsMarshal
{
    // Creates a list with the given underlying array buffer, with the Count property being equal to the array's length
    public static List<T> CreateList<T>(T[] arrayBuffer);
    // Creates a list with the given underlying array buffer, with the Count property manually specified
    public static List<T> CreateList<T>(T[] arrayBuffer, int listCount);
}

API Usage

// Create a buffer for 1000 elements
var buffer = new int[1000];
const int count = 578;
// Only fill the first 578 elements
for (int i = 0; i < count; i++)
{
    buffer[i] = i * i - 2;
}

var list = CollectionsMarshal.CreateList(buffer, count);
// list.Count == 578
// list.Capacity == 1000

Alternative Designs

As mentioned above, alternatives include:

  • the safer option to create a new list from a given array and copy the contents manually
  • the dangerous option to mimic the layout of the List class and alter the contents through exposed properties

Risks

No breaking changes. This API provides a safer and built-in way for a performance-oriented improvement on the subject of constructing a list.

Author: Rekkonnect
Assignees: -
Labels:

api-suggestion, area-System.Collections

Milestone: -

@AraHaan
Copy link
Member

AraHaan commented Jan 6, 2023

I would argue against over allocating the array with 1k elements if you know for sure that you will only use a specific count of elements. I would change the sample to:

const int count = 578;
var buffer = new int[count];
// Only fill the first 578 elements
for (int i = 0; i < count; i++)
{
    buffer[i] = i * i - 2;
}

var list = CollectionsMarshal.CreateList(buffer, count);
// list.Count == 578
// list.Capacity == 578

Why? because there are cases where you might want to not allocate more than what is needed on the backing buffer.

Also, while this is a thing, why not also allow using properties that returns an ReadOnlySpan's as well to lists like this to where it uses the backing results from the optimized property (that avoids an memcpy).

At least if I remember right ReadOnlySpan<int> avoids memcpy and allocating (since it simply returns a span to the data that the operating system itself allocated that holds the actual array).

Also, lifetimes can be a problem as well. Say for example you use CreateList() to create a static list to a type, and the buffer itself gets GC'd when it leaves scope because the GC does not see that it's used by the static list. Then when you go to use the list you find invalid data stored in it.

@Rekkonnect
Copy link
Contributor Author

For your specific scenario, CollectionsMarshal.CreateList(buffer); should suffice. The example I provided is related to wanting an expandable list. After all, you might not want to initialize the entire buffer, but want to allow expanding your list within the bounds of your buffer.

@Rekkonnect
Copy link
Contributor Author

Rekkonnect commented Jan 6, 2023

Also, lifetimes can be a problem as well. Say for example you use CreateList() to create a static list to a type, and the buffer itself gets GC'd when it leaves scope because the GC does not see that it's used by the static list. Then when you go to use the list you find invalid data stored in it.

Is this a valid concern? If so, one way I believe can counter this is to introduce a pin argument, which would pin the array and prevent it from being collected.

@Sergio0694
Copy link
Contributor

"While the unsafe solution proposed above is viable and works 100% for the given layout of the class"

It is not. It's completely undefined behavior and could cause all sorts of memory corruption issues.
It happens to work today, in the cases you've tested so far. It doesn't mean it'll always work or that it works everywhere.
That approach is not a viable alternative - it's not a "dangerous" workaround, it's flat out broken and shouldn't be used.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Jan 7, 2023

The only dangerous workaround possible would be doing it with a DynamicMethod that ignores access checks.

@Rekkonnect
Copy link
Contributor Author

Edited the thread to reflect the actual viability of the proposed dangerous workaround

@MichalPetryka
Copy link
Contributor

Edited the thread to reflect the actual viability of the proposed dangerous workaround

Doing Unsafe.As on a class isn't safe even within a single .NET version, it will cause UB even with the layout being the same.

@AraHaan
Copy link
Member

AraHaan commented Jan 8, 2023

Unsafe.As is safe when you cast an object from its interface type to its concrete type when you know for sure that it is that specific type (after you check that the type of the object implements that specific internal interface and it's only used in a single type.

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Jan 9, 2023
@layomia layomia added this to the Future milestone Jan 9, 2023
@eiriktsarpalis
Copy link
Member

I might be missing something, but couldn't your usage example could be written as follows?

var list = new List<int>(1000);
const int count = 578;
// Only fill the first 578 elements
for (int i = 0; i < count; i++)
{
    list[i] = i * i - 2;
}

// list.Count == 578
// list.Capacity == 1000

What is a use case where you have a buffer whose ownership you need to transfer to a list and that buffer couldn't have been a list in the first place?

@Rekkonnect
Copy link
Contributor Author

That example would throw an index out of range because your list's buffer is 1000 items, but its count is 0. You cannot directly set to the buffer, you have to follow the current size of the list.

@stephentoub
Copy link
Member

You cannot directly set to the buffer

#55217 has been approved for .NET 8 and will let you set the count.

@Rekkonnect
Copy link
Contributor Author

That's the one step. With that addition to the API, the whole operation of setting values directly to the buffer is basically entirely available, alongside the usage of the API getting you the span to the list's buffer.

This API therefore becomes a matter of unsafely constructing a list out of a buffer becoming more convenient.

@eiriktsarpalis
Copy link
Member

Closing in favor of #55217

@eiriktsarpalis eiriktsarpalis closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2023
@MichalPetryka
Copy link
Contributor

Closing in favor of #55217

Do note that this API would be slightly more efficient due to avoiding a single copy, at the cost of being more unsafe.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests

7 participants