-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize HttpUtility.UrlPathEncode allocations. #102823
Conversation
TrayanZapryanov
commented
May 29, 2024
- Omit intermediate string allocations by using Ranges.
- Use stackalloc arrays when input is small enough for encoding and ASCII conversions.
- Remove string.Contains(' ') loop.
Tagging subscribers to this area: @dotnet/ncl |
Benchmark: [MemoryDiagnoser]
public class HttpUtilityBenchmarks
{
[Benchmark(Baseline = true)]
public string UrlTryCreate() => Uri.TryCreate("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds", UriKind.Absolute, out var uri) ? uri.Authority : "error";
[Benchmark]
public string UrlPathEncode() => HttpUtility.UrlPathEncode("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds");
[Benchmark]
public string UrlPathEncode_PR() => MyHttpUtility.UrlPathEncode("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds");
} Results:
|
} | ||
|
||
private static byte[] UrlEncodeNonAscii(byte[] bytes, int offset, int count) | ||
private static string UrlEncodeNonAscii(Span<byte> bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are still allocating quite a bit (the original string even if it's already Ascii, substring and combined string in UrlPathEncodeImpl
...).
The 1 caller of this helper is using Encoding.UTF8
and wants to stop after you see a ?
.
Assuming there's sufficient test coverage, I would change this to instead look something like
private static string UrlPathEncodeImpl(string value)
{
int i = value.AsSpan().IndexOfAnyExceptInRange((char)0x21, (char)0x7F);
if (i < 0)
{
return value;
}
int indexOfQuery = value.IndexOf('?');
if (indexOfQuery >= 0 && indexOfQuery < i)
{
// Everything before the Query is valid ASCII
return value;
}
ReadOnlySpan<char> toEncode = indexOfQuery >= 0
? value.AsSpan(i, indexOfQuery - i)
: value.AsSpan(i);
byte[] bytes = ArrayPool<char>.Shared.Rent(Encoding.UTF8.GetMaxByteCount(toEncode.Length));
char[] chars = ArrayPool<char>.Shared.Rent(bytes.Length * 3);
int utf8Length = Encoding.UTF8.GetBytes(toEncode, bytes);
int charCount = 0;
foreach (byte b in bytes.AsSpan(0, utf8Length))
{
if (IsNonAsciiOrSpaceByte(b))
{
chars[charCount++] = '%';
chars[charCount++] = HexConverter.ToCharLower(b >> 4);
chars[charCount++] = HexConverter.ToCharLower(b);
}
else
{
chars[charCount++] = (char)b;
}
}
ArrayPool<byte>.Shared.Return(bytes);
string result = string.Concat(
value.AsSpan(0, i),
chars.AsSpan(0, charCount),
indexOfQuery >= 0 ? value.AsSpan(indexOfQuery) : ReadOnlySpan<char>.Empty);
ArrayPool<char>.Shared.Return(chars);
return result;
}
This would avoid a bunch of intermediate allocations and transcoding.
schemeAndAuthority = default(Range); | ||
path = Range.All; | ||
queryAndFragment = default(Range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schemeAndAuthority = default(Range); | |
path = Range.All; | |
queryAndFragment = default(Range); | |
return UrlPathEncodeImpl(value); |
Otherwise you would end up allocating a substring for value.AsSpan(path).ToString()
+ a substring for the string.Concat
.
} | ||
|
||
return schemeAndAuthority + UrlPathEncodeImpl(path) + queryAndFragment; | ||
return string.Concat(value.AsSpan(schemeAndAuthority), UrlPathEncodeImpl(value.AsSpan(path).ToString()), value.AsSpan(queryAndFragment)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be combined into the whole UrlPathEncodeImpl
to avoid the substring allocations
} | ||
} | ||
|
||
// Attempts to split a URI into its constituent pieces. | ||
// Even if this method returns true, one or more of the out parameters might contain a null or empty string, e.g. if there is no query / fragment. | ||
// Concatenating the pieces back together will form the original input string. | ||
internal static bool TrySplitUriForPathEncode(string input, [NotNullWhen(true)] out string? schemeAndAuthority, [NotNullWhen(true)] out string? path, out string? queryAndFragment) | ||
internal static bool TrySplitUriForPathEncode(string input, out Range schemeAndAuthority, out Range path, out Range queryAndFragment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By forcing everything into spans and ranges, you are now also allocating substrings where you might not have needed to previously. E.g. if pathRange
is Range.All
, you're reallocating the original string.
You could inline ExtractQueryAndFragment
into this method, and always use Substring
for the path portion.
I would also suggest returning Span
instead of Range
as it's easier to use given that the callers only need the range to make the span anyway.
@MihaZupan Sorry I messed git while trying to rebase after merging previous PRs. |
Assuming
|