-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: BCL additions to support C# 12's "Collection Literals" feature. #87569
Comments
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsBackground and motivationthe C# design group is moving forward with a proposal for a lightweight syntax construct collections: csharplang/collection-literals.md We intend to ship this in C# 12 for linear collections (like List, ImmutableArray, HashSet, etc.). We also intend to support this for map collections (like Dictionary<K,V>) though that may only be in 'preview' in C# 12. Part of this proposal involves being able to efficiently construct certain collections (like List), as well as construct immutable collections (which have generally never worked with the existing "new ImmutableXXX() { 1, 2, 3 }" form). To that end, working with @stephentoub and @captainsafia , we've come up with a set of API proposals we'd like to work through with the runtime team to allow these types to "light up" with this language feature. This would be expected to align with the release of C#12. API ProposalThe API shape is as follows (with all naming/shaping open to bike shedding): A new attribute to be placed on a type to specify where to find the method responsible for constructing it:namespace System.Runtime.CompilerServices
{
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Interface, Inherited = false, AllowMultiple = true)]
public sealed class CollectionLiteralsBuilderAttribute : Attribute
{
/// <summary>Initialize the attribute to refer to the <paramref name="methodName"/> method on the <paramref name="builderType"/> type.</summary>
/// <param name="builderType">The type of the builder to use to construct the collection.</param>
/// <param name="methodName">
/// The method on the builder to use to construct the collection. This must refer to a static method,
/// or if the <paramref name="builderType"/> is also the type of the collection being constructed, it
/// may be an empty string to indicate a constructor should be used.
/// </param>
public CollectionLiteralsBuilderAttribute(Type builderType, string methodName)
{
BuilderType = builderType;
MethodName = methodName;
}
/// <summary>Gets the type of the builder to use to construct the collection.</summary>
public Type BuilderType { get; }
/// <summary>Gets the name of the method on the builder to use to construct the collection.</summary>
public string MethodName { get; }
}
} For the purposes of this proposal, you can consider this attribute added to every type whose factory method the proposal suggests. Pattern1: Specialized entrypoints to efficiently construct
|
Author: | CyrusNajmabadi |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
The exact set of APIs I have prototyped here: namespace System.Collections.Immutable
{
+ [CollectionLiteralsBuilder(typeof(ImmutableCollectionsMarshal), "Create")]
public readonly struct ImmutableArray<T> { ... }
public static class ImmutableHashSet
{
+ public static ImmutableHashSet<T> Create<T>(ReadOnlySpan<T> items);
+ public static ImmutableHashSet<T> Create<T>(IEqualityComparer<T>? equalityComparer, ReadOnlySpan<T> items);
}
+ [CollectionLiteralsBuilder(typeof(ImmutableHashSet), "Create")]
public sealed class ImmutableHashSet<T> { ... }
public static class ImmutableList
{
+ public static ImmutableList<T> Create<T>(ReadOnlySpan<T> items);
}
+ [CollectionLiteralsBuilder(typeof(ImmutableList), "Create")]
public sealed class ImmutableList<T> { ... }
public static class ImmutableQueue
{
+ public static ImmutableQueue<T> Create<T>(ReadOnlySpan<T> items);
}
+ [CollectionLiteralsBuilder(typeof(ImmutableQueue), "Create")]
public sealed class ImmutableQueue<T> { ... }
public static class ImmutableSortedSet
{
+ public static ImmutableSortedSet<T> Create<T>(ReadOnlySpan<T> items);
+ public static ImmutableSortedSet<T> Create<T>(IComparer<T>? comparer, ReadOnlySpan<T> items);
}
+ [CollectionLiteralsBuilder(typeof(ImmutableSortedSet), "Create")]
public sealed class ImmutableSortedSet<T> { ... }
public static class ImmutableStack
{
+ public static ImmutableStack<T> Create<T>(ReadOnlySpan<T> items);
}
+ [CollectionLiteralsBuilder(typeof(ImmutableStack), "Create")]
public sealed class ImmutableStack<T> { ... }
}
namespace System.Runtime.InteropServices
{
public static partial class CollectionsMarshal
{
+ public static Span<T> Create<T>(int count, out List<T> list);
// Alternative: List<T> AsList(T[])
}
public static class ImmutableCollectionsMarshal
{
+ public static Span<T> Create<T>(int length, out ImmutableArray<T> array);
// Alternative: don't add this if we don't add Create for List<T>
}
}
namespace System.Runtime.CompilerServices
{
+ [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface | AttributeTargets.Struct, Inherited = false, AllowMultiple = true)]
+ public sealed class CollectionLiteralsBuilderAttribute : Attribute
+ {
+ public CollectionLiteralsBuilderAttribute(Type builderType, string methodName);
+ public Type BuilderType { get; }
+ public string MethodName { get; }
+ }
} Once dotnet/csharplang#7276 lands and we validate we're comfortable with any cross-language impact, we will also want: namespace System.Collections.Generic
{
// This isn't strictly necessary; it's only for efficiency.
public class HashSet<T>
{
+ public HashSet(ReadOnlySpan<T> collection);
+ public HashSet(ReadOnlySpan<T> collection, IEqualityComparer? comparer);
}
// With the current rules, these are necessary to use Queue/Stack<T> with collection literals at all
public class Queue<T>
{
+ public Queue(ReadOnlySpan<T> collection);
}
public class Stack<T>
{
+ public Stack(ReadOnlySpan<T> collection);
}
// This would just be for consistency. We should also consider moving the AddRange(ReadOnlySpan) extension back to being an instance method on List.
public class List<T>
{
+ public List(ReadOnlySpan<T> collection);
}
} |
I've marked this as ready for review so we can work through any issues concurrently with everything flowing through language design. Note that this doesn't include anything to do with dictionary collection literals, which we'll handle separately (and possibly not for .NET 8). |
I'm really concerned for making these builders ordinal methods (somehow "hidden" though), instead of some special method marked with |
@huoyaoyuan not sure what you mean by 'ordinal methods'. Can you expand? |
@stephentoub thanks for that. I was sure I missed something:-) |
The |
Why shouldn't these Create methods be callable? The immutable collection ones other than immutable array are all just overloads of existing Create methods and aren't unsafe at all. Only two of the methods in the proposal are "unsafe", and both of them just expose a different shape on functionality already available. Further, if you look at my branch, you'll see they're quite useful in certain circunstances... eg it allowed me to remove a bespoke helper from LINQ and just use this method instead. |
Is there a reason why the overloads with the equalitycomparer have that argument at the first position instead the last position?, |
Another alternative for the List and ImmutableArray constructors would be taking ownership of an array directly. This has the advantage of not creating address-taken variables, which I believe can (or could) trouble the JIT a bit depending on inlining and how they are used afterwards. public static class CollectionsMarshal
{
public static List<T> Create<T>(T[] storage);
public static ImmutableArray<T> Create<T>(T[] storage);
} |
I was meaning the methods on |
Mainly because that's the order of the existing array-based overloads. This is just creating overloads with spans where there are already arrays.
We could and it was discussed. That already exists for ImmutableArray: For List it has its own complications. For example, the method would need to validate that the type of the array is actually None of these are deal-breakers. If we decided we didn't care about List backing being exposed as an array, that could tip the balance. Then we'd add an AsList and remove both Create methods on {Immutable} CollectionsMarshal. As a pattern, though, the span-based approach is more flexible. It can represent future cases the array one can't, eg where the storage is contiguous but not an array or not the entirety of an array.
They have no power beyond what's already possible with CollectionsMarshal.AsSpan, CollectionsMarshal.SetCount, and ImmutableCollectionsMarshal.AsImmutableArray. |
We discussed this offline too, but I think that this feature has value beyond collection literals. For instance, deserializers should be able to take advantage of such annotations when determining how a specific collection type is meant to be hydrated. We might want to consider naming the attribute in such a way that reflects this. |
If returning a |
What would be the overhead of using |
Rather than being able to allocate the backing array once and write all items directly into it, the compiler would need to first store all the items somewhere else and then call this method to allocate an array that the elements are all copied into. Best case, it's an extra copy. Worst case, it's double the allocation. |
It could, and using
GC.KeepAlive will box the struct. |
I though the JIT avoided the boxing for Anyway, I understand the first issue, so you don't want the exposed type to outlive the stack lifetime? |
That is the concern, yes. We could choose to say it's not enough of a concern, and then as noted in my previous replies, that could shift the balance of what we decide to do. |
Thanks, so this is like Could this pattern be generalized for any type without that allocation for use in collection literals? string.Create(length, out Span<char>)
ImmutableArray<T>.Create(length, out Span<T>) I'm not sure how that lambda in the string.Create API helps compared to out Span which saves an allocation. Also notice that that is not considered an "advanced" API as OP mentions, since it lives in |
I def want this point discussed with ldm though. My personal leaning is still preferring this case have no overhead. |
The runtime needs |
If the delegate supported a state parameter, it could be cached... but it still has the performance cost of a delegate call. Also, unless the delegate returns I'm not sure if a delegate would be the best approach for this problem. |
For reasons I already mentioned, I don't forsee is exposing any APIs that encourage mutating String after it's already handed out. If we wanted to enable collection literals to produce strings, we'd do so via the ROS ctor.
The box is optimized away for NET 6+, but not on mono to my knowledge, and more relevant to my comment not on .NET Framework, which is still important, in particular from a pattern perspective (we don't want to use a pattern that'll actively be more expensive).
If this is important, we should enable the compiler to a) know how to construct collection-initializer collections with a capacity and for List we'll just point to that existing ctor, then populating it will happen with Add{Range} and b) know how to transfer ownership of an array, and for ImmutableArray we'll point it to that. Then we won't expose those two Create methods. This will be simpler, safer, and only incrementally less efficient in the list case for some scenarios. Or we augment rather than replace the kinds of patterns supported, allow for multiple attributes, and let the compiler decide which it wants to use on a case by case basis. |
Sounds good. Can you and Chuck discuss if this comes up while I'm away. I'd prefer to get the right long term solution in so we don't regret things we may end changing soon down the line. Thanks! |
We think that we can leave out the "Literal" token in CollectionLiteralBuilderAttribute, so just "CollectionBuilderAttribute". There's still work to be done on the List creation pattern (exposing storage, ownership transferrence, etc). The ImmutableCollections(ReadOnlySpan) are all good things to add, even without the language feature. public static class ImmutableHashSet
{
+ public static System.Collections.Immutable.ImmutableHashSet<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableList
{
+ public static System.Collections.Immutable.ImmutableList<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableQueue
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableSortedSet
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableStack
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
} |
@bartonjs great! Would it make sense to break out the discussion (and the impl) between the "this is trivial, and we can just do it now" (like immutable collections) vs the more complex ownership apis which can happen a little later? It would be nice to have some of the APIs soon for the compiler to light up on. |
@CyrusNajmabadi, the easy ones have already merged: What's left is "just" the attribute (which I didn't want to add until we agreed on exactly what patterns would be supported) and the one or two APIs for List / ImmutableArray depending on what we decide to do with them. Next step is you/me/etc. need to have some follow-up discussions. |
Ok. I'll schedule something for next week! |
This feature would be greatly improved if ref structs like |
While I would also like that restriction removed, how is it relevant to this issue? The C# compiler itself isn't subject to the same constraint for span-related code it itself manufactures. |
namespace System.Runtime.CompilerServices
{
[AttributeUsage(AttributeTargets.Class |
AttributeTargets.Interface |
AttributeTargets.Struct,
Inherited = false,
AllowMultiple = false)]
public sealed class CollectionBuilderAttribute : Attribute
{
public CollectionBuilderAttribute(Type builderType, string methodName);
public Type BuilderType { get; }
public string MethodName { get; }
}
}
namespace System.Collections.Immutable
{
[CollectionBuilder(typeof(ImmutableArray), "Create")]
public readonly partial struct ImmutableArray<T> { }
[CollectionBuilder(typeof(ImmutableHashSet), "Create")]
public sealed partial class ImmutableHashSet<T> { }
[CollectionBuilder(typeof(ImmutableList), "Create")]
public sealed partial class ImmutableList<T> { }
[CollectionBuilder(typeof(ImmutableQueue), "Create")]
public sealed partial class ImmutableQueue<T> { }
[CollectionBuilder(typeof(ImmutableSortedSet), "Create")]
public sealed partial class ImmutableSortedSet<T> { }
[CollectionBuilder(typeof(ImmutableStack), "Create")]
public sealed partial class ImmutableStack<T> { }
} |
Do you have cases where you even have multiple options? Or a case where you would feel bad guessing? For example, being able to write the following would be a totally mainline use case in C#: public IImutableSet<Whatever> Values { get; } = [a, b, c]; Needing to be explicit about the concrete type here seems like it would just be an unnecessary burden most of the time. In the rare (IMO) case where the default you pick is wrong then the user can always sidestep by supplying the target themselves like so: public IImutableSet<Whatever> Values { get; } = (ADifferentImmutableSetType<Whatever>)[a, b, c]; But that keeps the common case very reasonable, while making hte rare case the one that has to pay the price. |
For the IImutable* interfaces it might be fine, when I asked my variant of the "are we OK with..." question I meant for collections interfaces in general. Ignoring the fact that current layering says there's actually only one choice, To me, it seems fine to say ISet<int> values = { 1, 2, 3, 4 }; doesn't compile. The code author can just type the variable to which of the four they want, or do a casting assignment. And now they get to decide "general purpose mutable", "fast readonly", "slow mutable but prints in a pretty way", and "whatever reason someone would pick immutable set over frozen set" (not dissing ImmutableSet, I just don't have domain knowledge here to pick one). |
Got it. To me the question is: "is there a reasonable default that is the right choice 95% of hte time". For things like ISet i'm not sure that exists. For IEnumerable/IList/IReadOnlyList/IDictionary (once we support dictionaries), then i think such a case would exist. That said, even if you don't support things like IEnumerable/IList/IReadOnlyList, the Lang/Compiler will (and the interfaces on Dictionary as well), so it's not critical that you do anything there. So really, it's just about IImmutableXXX. :) |
Why would the compiler special-case IEnumerable/IList/IReadOnlyList but not IImmutableXXX? Why are the immutable interfaces important to handle but not important enough to special-case? |
EDITED by @stephentoub on 6/26/2023 with updated remaining APIs for review
main...stephentoub:runtime:immutablecollectionsbuilder
For C# 12 / .NET 8, the attribute will only recognize the pattern
CollectionType Method(ReadOnlySpan)
. However, the compiler may special-case system types it cares about to do something more efficient based on its knowledge of how they work, in particular forList<T>
(which is otherwise supported via its support for collection initializers) andImmutableArray<T>
(which is otherwise supported via copy via the attribute to use Create). We can add more supported patterns in the future.Background and motivation
the C# design group is moving forward with a proposal for a lightweight syntax construct collections: csharplang/collection-literals.md
We intend to ship this in C# 12 for linear collections (like
List<T>
,ImmutableArray<T>
,HashSet<T>
, etc.). We also intend to support this for map collections (likeDictionary<TKey, TValue>
) though that may only be in 'preview' in C# 12.Part of this proposal involves being able to efficiently construct certain collections (like
List<T>
), as well as construct immutable collections (which have generally never worked with the existingnew ImmutableXXX<int>() { 1, 2, 3 }
form). To that end, working with @stephentoub and @captainsafia , we've come up with a set of API proposals we'd like to work through with the runtime team to allow these types to "light up" with this language feature. This would be expected to align with the release of C#12.API Proposal
The API shape is as follows (with all naming/shaping open to bike shedding):
A new attribute to be placed on a type to specify where to find the method responsible for constructing it:
For the purposes of this proposal, you can consider this attribute added to every type whose factory method the proposal suggests.
Pattern1: Specialized entrypoints to efficiently construct
List<T>
andImmutableArray<T>
.We would like to be able to construct these two types as efficiently as possible.
These apis are placed in CollectionsMarshal as they are viewed as too advanced to be something we want to expose directly on the types themselves.
Here, the compiler would ask the runtime to create an instance of these types with an explicit capacity, and also be given direct access to the underlying storage backing these types. The compiler would then translate code like so:
This would be practically the lowest overhead possible, and would allow users to use immutable collections as easily as normal collections.
Note: the above api would incur a copy cost in the case of code involving async/await (as the compiler would need to make its own array that it populated, which it would then copy into the Span given by this api). We could avoid any copy overhead entirely if the above api were instead:
However, there was some reservation about an API (even one in CollectionsMarshal) directly exposing the array to operate over. Language/Compiler will use whatever api is provided here. So if you are comfortable on your end exposing this, then we'll use it. If not, we can always use the Span approach (with the additional cost in async contexts).
Pattern2:
ReadOnlySpan<T>
entrypoints to construct existing collection types.This would allow lower overhead construction of certain types (no need for intermediary heap allocations), as well as providing a mechanism for constructing Immutable types (without intermediary builder allocations).
API Usage
Alternative Designs
List<T>/ImmutableArray<T>
apis we have the options of returning the underlying array as a span or as an array. e.g.:The latter would work better in async contexts, but has caused a small amount of concern about directly exposing values that could be placed on the heap. However, this capability is already possible today according to Stephen, so perhaps that is fine.
List<T>/ImmutableArray<T>
we could use multiple out-params instead of out+return. i.e.:If there is only a single method exposed, this is mainly a stylistic difference. If we did want both methods, this would allow for overloads.
Risks
Adding overloads to existing
IEnumerable<T>
taking methods introduced ambiguity errors for existing compilers. Specifically:The C# team is working through a proposal to make this not an error, and to prefer the ReadOnlySpan version as the preferred overload. This strongly matches the intuition we have that if you have overloads like this, they will have the same semantics, just that the latter will be lower overhead.
However, this may cause issues on other compilers that haven't updated, or on older compilers. To that end, we may want to emit the new ReadOnlySpan overload with a modreq so that older compilers do not consider it a viable option. Newer compilers will be ok with it and will then switch which overload they call on recompile.
This work is being tracked here: dotnet/csharplang#7276
The text was updated successfully, but these errors were encountered: