-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Encoding.TryGetBytes/Chars #84425
Comments
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsBackground and motivationIn .NET Core 2.1 we added an Workarounds today are to first call GetByte/CharCount to determine how much space is required, or to use alternate APIs for the specific encoding in question, e.g. UTF8.FromUtf16 that's an OperationStatus-based API. API Proposalnamespace System.Text;
public abstract class Encoding
{
public int GetBytes(ReadOnlySpan<char> chars, Span<byte> bytes);
+ public bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten);
public int GetChars(ReadOnlySpan<byte> bytes, Span<char> chars);
+ public bool TryGetChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten);
} API Usageif (Encoding.UTF8.TryGetBytes(str, destination, out int bytesWritten))
{
_length += bytesWritten;
return true;
}
return false; Alternative DesignsNo response RisksNo response
|
I also spoke with @EgorBo about making the UTF8EncodingSealed override of TryGetBytes a JIT intrinsic, and he's prototyped it. When we use TryGetBytes from the Utf8.TryWrite interpolated string handler, this should then enable all of the valid literals created for the interpolated string to be encoded at JIT time once rather than on every append, and also the copies unrolled (effectively turning the implementation of |
But isn't that suboptimal compared to proactively calculating an upper bound via the |
GetMaxByte/CharCount will typically be a complete overestimate. You can't use the result of those to determine whether the data you have will fit into a span you have, as the space remaining in the span might be perfectly sufficient to store the encoded data but still less than those maxes. Maybe I'm not understanding your question? |
Maybe our use cases are different, I typically rent/allocate a span after I calculate the max length. |
The use case here is you're given a buffer to write into. But even if it weren't, the max approach while fast to compute can also result in overallocating by 4x (in the case of UTF8). |
Corrected the proposal to make these methods We discussed the names namespace System.Text;
public abstract partial class Encoding
{
public virtual bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten);
public virtual bool TryGetChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten);
} |
Background and motivation
In .NET Core 2.1 we added an
Encoding.GetBytes(ReadOnlySpan<char>, Span<byte>)
andEncoding.GetChars(ReadOnlySpan<byte>, Span<char>)
that will encode/decode the source into the destination and return how many chars/bytes it wrote. However, the destination span needs to be large enough to store all the written data; if it's too small, the method throws. Encoding doesn't provide a variant that allows for failing without exception when the destination is too small, and that's particularly useful when writing out a larger composed output where you loop to grow the buffer to accommodate more output.Workarounds today are to first call GetByte/CharCount to determine how much space is required, or to use alternate APIs for the specific encoding in question, e.g. UTF8.FromUtf16 that's an OperationStatus-based API.
API Proposal
The base virtual implementation will use GetByte/CharCount and GetBytes/GetChars, but implementations like UTF8Encoding can then do better in their overrides.
API Usage
Alternative Designs
No response
Risks
No response
The text was updated successfully, but these errors were encountered: