[API Proposal]: Encoding.TryGetBytes/Chars #84425

stephentoub · 2023-04-06T16:35:37Z

Background and motivation

In .NET Core 2.1 we added an Encoding.GetBytes(ReadOnlySpan<char>, Span<byte>) and Encoding.GetChars(ReadOnlySpan<byte>, Span<char>) that will encode/decode the source into the destination and return how many chars/bytes it wrote. However, the destination span needs to be large enough to store all the written data; if it's too small, the method throws. Encoding doesn't provide a variant that allows for failing without exception when the destination is too small, and that's particularly useful when writing out a larger composed output where you loop to grow the buffer to accommodate more output.

Workarounds today are to first call GetByte/CharCount to determine how much space is required, or to use alternate APIs for the specific encoding in question, e.g. UTF8.FromUtf16 that's an OperationStatus-based API.

API Proposal

namespace System.Text;

public abstract class Encoding
{
    public int GetBytes(ReadOnlySpan<char> chars, Span<byte> bytes);
+   public bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten);

    public int GetChars(ReadOnlySpan<byte> bytes, Span<char> chars);
+   public bool TryGetChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten);
}

The base virtual implementation will use GetByte/CharCount and GetBytes/GetChars, but implementations like UTF8Encoding can then do better in their overrides.

API Usage

if (Encoding.UTF8.TryGetBytes(str, destination, out int bytesWritten))
{
    _length += bytesWritten;
    return true;
}

return false;

Alternative Designs

No response

Risks

No response

The text was updated successfully, but these errors were encountered:

ghost · 2023-04-06T16:35:42Z

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

In .NET Core 2.1 we added an Encoding.GetBytes(ReadOnlySpan<char>, Span<byte>) and Encoding.GetChars(ReadOnlySpan<byte>, Span<char>) that will encode/decode the source into the destination and return how many chars/bytes it wrote. However, the destination span needs to be large enough to store all the written data; if it's too small, the method throws. Encoding doesn't provide a variant that allows for failing without exception when the destination is too small, and that's particularly useful when writing out a larger composed output where you loop to grow the buffer to accommodate more output.

Workarounds today are to first call GetByte/CharCount to determine how much space is required, or to use alternate APIs for the specific encoding in question, e.g. UTF8.FromUtf16 that's an OperationStatus-based API.

API Proposal

namespace System.Text;

public abstract class Encoding
{
    public int GetBytes(ReadOnlySpan<char> chars, Span<byte> bytes);
+   public bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten);

    public int GetChars(ReadOnlySpan<byte> bytes, Span<char> chars);
+   public bool TryGetChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten);
}

API Usage

if (Encoding.UTF8.TryGetBytes(str, destination, out int bytesWritten))
{
    _length += bytesWritten;
    return true;
}

return false;

Alternative Designs

No response

Risks

No response

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Text.Encoding`, `api-ready-for-review`
Milestone:	8.0.0

stephentoub · 2023-04-06T16:35:47Z

cc: @GrabYourPitchforks, @EgorBo

stephentoub · 2023-04-07T12:41:32Z

I also spoke with @EgorBo about making the UTF8EncodingSealed override of TryGetBytes a JIT intrinsic, and he's prototyped it. When we use TryGetBytes from the Utf8.TryWrite interpolated string handler, this should then enable all of the valid literals created for the interpolated string to be encoded at JIT time once rather than on every append, and also the copies unrolled (effectively turning the implementation of TryGetBytes("literal", dest) into "literal"u8.TryCopyTo(dest)). Anyone else using Encoding.UTF8.TryGetBytes with a literal would also benefit, whether that literal was supplied directly or exposed via inlining.

eiriktsarpalis · 2023-04-07T17:50:37Z

that's particularly useful when writing out a larger composed output where you loop to grow the buffer to accommodate more output.

But isn't that suboptimal compared to proactively calculating an upper bound via the GetMaxByteCount/GetMaxCharCount methods?

stephentoub · 2023-04-07T17:58:19Z

But isn't that suboptimal compared to proactively calculating an upper bound via the GetMaxByteCount/GetMaxCharCount methods?

GetMaxByte/CharCount will typically be a complete overestimate. You can't use the result of those to determine whether the data you have will fit into a span you have, as the space remaining in the span might be perfectly sufficient to store the encoded data but still less than those maxes.

Maybe I'm not understanding your question?

eiriktsarpalis · 2023-04-07T18:03:40Z

Maybe our use cases are different, I typically rent/allocate a span after I calculate the max length.

stephentoub · 2023-04-07T18:11:36Z

The use case here is you're given a buffer to write into.

But even if it weren't, the max approach while fast to compute can also result in overallocating by 4x (in the case of UTF8).

bartonjs · 2023-04-20T18:26:24Z

Video

Corrected the proposal to make these methods virtual. Otherwise, looks good as proposed.

We discussed the names source/destination versus the chars/bytes pattern, but since the existing non-Try method has chars/bytes, we went for local consistency.

namespace System.Text;

public abstract partial class Encoding
{
    public virtual bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten);
    public virtual bool TryGetChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten);
}

stephentoub added area-System.Text.Encoding api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 6, 2023

stephentoub added this to the 8.0.0 milestone Apr 6, 2023

stephentoub mentioned this issue Apr 11, 2023

Add internal Encoding.TryGetBytes #84609

Merged

stephentoub self-assigned this Apr 20, 2023

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 20, 2023

stephentoub mentioned this issue Apr 20, 2023

Add public Encoding.TryGetBytes/Chars #85120

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 20, 2023

stephentoub closed this as completed in #85120 Apr 23, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 23, 2023

ghost locked as resolved and limited conversation to collaborators May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Encoding.TryGetBytes/Chars #84425

[API Proposal]: Encoding.TryGetBytes/Chars #84425

stephentoub commented Apr 6, 2023 •

edited

Loading

ghost commented Apr 6, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

stephentoub commented Apr 6, 2023

stephentoub commented Apr 7, 2023 •

edited

Loading

eiriktsarpalis commented Apr 7, 2023

stephentoub commented Apr 7, 2023

eiriktsarpalis commented Apr 7, 2023

stephentoub commented Apr 7, 2023 •

edited

Loading

bartonjs commented Apr 20, 2023 •

edited by dotnet-api-review bot

Loading

[API Proposal]: Encoding.TryGetBytes/Chars #84425

[API Proposal]: Encoding.TryGetBytes/Chars #84425

Comments

stephentoub commented Apr 6, 2023 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Apr 6, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

stephentoub commented Apr 6, 2023

stephentoub commented Apr 7, 2023 • edited Loading

eiriktsarpalis commented Apr 7, 2023

stephentoub commented Apr 7, 2023

eiriktsarpalis commented Apr 7, 2023

stephentoub commented Apr 7, 2023 • edited Loading

bartonjs commented Apr 20, 2023 • edited by dotnet-api-review bot Loading

stephentoub commented Apr 6, 2023 •

edited

Loading

stephentoub commented Apr 7, 2023 •

edited

Loading

stephentoub commented Apr 7, 2023 •

edited

Loading

bartonjs commented Apr 20, 2023 •

edited by dotnet-api-review bot

Loading