Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize HttpUtility.UrlPathEncode allocations. #102823

Closed
wants to merge 0 commits into from

Conversation

TrayanZapryanov
Copy link
Contributor

  1. Omit intermediate string allocations by using Ranges.
  2. Use stackalloc arrays when input is small enough for encoding and ASCII conversions.
  3. Remove string.Contains(' ') loop.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@TrayanZapryanov
Copy link
Contributor Author

Benchmark:

[MemoryDiagnoser]
public class HttpUtilityBenchmarks
{
	[Benchmark(Baseline = true)]
	public string UrlTryCreate() => Uri.TryCreate("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds", UriKind.Absolute, out var uri) ? uri.Authority : "error";

	[Benchmark]
	public string UrlPathEncode() => HttpUtility.UrlPathEncode("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds");

	[Benchmark]
	public string UrlPathEncode_PR() => MyHttpUtility.UrlPathEncode("http://EXAMPLE.NET/d\\u00E9fault.xxx?sdsd=sds");
}

Results:

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
UrlTryCreate 260.7 ns 2.41 ns 2.01 ns 1.00 0.00 0.0277 232 B 1.00
UrlPathEncode 338.5 ns 6.03 ns 5.64 ns 1.30 0.02 0.0839 704 B 3.03
UrlPathEncode_PR 325.4 ns 4.59 ns 4.30 ns 1.25 0.02 0.0658 552 B 2.38

}

private static byte[] UrlEncodeNonAscii(byte[] bytes, int offset, int count)
private static string UrlEncodeNonAscii(Span<byte> bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are still allocating quite a bit (the original string even if it's already Ascii, substring and combined string in UrlPathEncodeImpl ...).

The 1 caller of this helper is using Encoding.UTF8 and wants to stop after you see a ?.
Assuming there's sufficient test coverage, I would change this to instead look something like

private static string UrlPathEncodeImpl(string value)
{
    int i = value.AsSpan().IndexOfAnyExceptInRange((char)0x21, (char)0x7F);
    if (i < 0)
    {
        return value;
    }

    int indexOfQuery = value.IndexOf('?');
    if (indexOfQuery >= 0 && indexOfQuery < i)
    {
        // Everything before the Query is valid ASCII
        return value;
    }

    ReadOnlySpan<char> toEncode = indexOfQuery >= 0
        ? value.AsSpan(i, indexOfQuery - i)
        : value.AsSpan(i);

    byte[] bytes = ArrayPool<char>.Shared.Rent(Encoding.UTF8.GetMaxByteCount(toEncode.Length));
    char[] chars = ArrayPool<char>.Shared.Rent(bytes.Length * 3);

    int utf8Length = Encoding.UTF8.GetBytes(toEncode, bytes);
    int charCount = 0;
    foreach (byte b in bytes.AsSpan(0, utf8Length))
    {
        if (IsNonAsciiOrSpaceByte(b))
        {
            chars[charCount++] = '%';
            chars[charCount++] = HexConverter.ToCharLower(b >> 4);
            chars[charCount++] = HexConverter.ToCharLower(b);
        }
        else
        {
            chars[charCount++] = (char)b;
        }
    }

    ArrayPool<byte>.Shared.Return(bytes);
    string result = string.Concat(
        value.AsSpan(0, i),
        chars.AsSpan(0, charCount),
        indexOfQuery >= 0 ? value.AsSpan(indexOfQuery) : ReadOnlySpan<char>.Empty);
    ArrayPool<char>.Shared.Return(chars);
    return result;
}

This would avoid a bunch of intermediate allocations and transcoding.

Comment on lines 595 to 597
schemeAndAuthority = default(Range);
path = Range.All;
queryAndFragment = default(Range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
schemeAndAuthority = default(Range);
path = Range.All;
queryAndFragment = default(Range);
return UrlPathEncodeImpl(value);

Otherwise you would end up allocating a substring for value.AsSpan(path).ToString() + a substring for the string.Concat.

}

return schemeAndAuthority + UrlPathEncodeImpl(path) + queryAndFragment;
return string.Concat(value.AsSpan(schemeAndAuthority), UrlPathEncodeImpl(value.AsSpan(path).ToString()), value.AsSpan(queryAndFragment));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be combined into the whole UrlPathEncodeImpl to avoid the substring allocations

}
}

// Attempts to split a URI into its constituent pieces.
// Even if this method returns true, one or more of the out parameters might contain a null or empty string, e.g. if there is no query / fragment.
// Concatenating the pieces back together will form the original input string.
internal static bool TrySplitUriForPathEncode(string input, [NotNullWhen(true)] out string? schemeAndAuthority, [NotNullWhen(true)] out string? path, out string? queryAndFragment)
internal static bool TrySplitUriForPathEncode(string input, out Range schemeAndAuthority, out Range path, out Range queryAndFragment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By forcing everything into spans and ranges, you are now also allocating substrings where you might not have needed to previously. E.g. if pathRange is Range.All, you're reallocating the original string.

You could inline ExtractQueryAndFragment into this method, and always use Substring for the path portion.
I would also suggest returning Span instead of Range as it's easier to use given that the callers only need the range to make the span anyway.

@TrayanZapryanov
Copy link
Contributor Author

@MihaZupan Sorry I messed git while trying to rebase after merging previous PRs.
Sorry for the stupid question, but can you tell me how you are updating older branches to latest version without closing existing PR.
Meanwhile I will try to open another PR with your comments integrated.

@MihaZupan
Copy link
Member

Assuming upstream is this repo and origin is your fork, my workflow looks something like

git checkout main
git pull upstream

git checkout your-branch
git rebase main // And now fix any conflicts
git push origin -f

@karelz karelz added this to the 9.0.0 milestone Jun 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants