Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Remove temporary ByteArrayHelpers and use Span.Equals #27654

Closed
wants to merge 1 commit into from

Conversation

ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Mar 2, 2018

{
// TODO #21395:
// Replace with the MemoryExtensions implementation of Equals once it's available
internal static bool EqualsOrdinalAsciiIgnoreCase(string left, ReadOnlySpan<byte> right)
Copy link
Member Author

@ahsonkhan ahsonkhan Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any perf concerns with using the ReadOnlySpan<char> overload available now that will have the "IsAscii" check per iteration which is unnecessary in this case - https://github.com/dotnet/coreclr/blob/47c39edc2fbfbfbddca13ee1a47699ecaaaee204/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L571?

Copy link
Member

@stephentoub stephentoub Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how the changes are valid. Casting a span of ASCII bytes to a span of chars will misinterpret them. Does this pass our tests?

Copy link
Member Author

@ahsonkhan ahsonkhan Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are definitely right. Here, two bytes would be considered one character rather than individual characters. Tests don't pass. We have to pad the byte span to turn it into a char span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd either need to Widen the bytes to short/char and compare
https://github.com/aspnet/KestrelHttpServer/blob/300453396a57cd14bcb297627b1507de83dc88ab/src/Kestrel.Core/Internal/Infrastructure/StringUtilities.cs#L12-L109
Or Narrow the string chars to bytes; which is probably dodgier (as would be discarding high byte)

Copy link
Member

@benaadams benaadams Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can vectorize the casing with something like (not a very good version)

public static void LowerCaseSIMD(byte[] data)
{
    var A = new Vector<byte>(65); // A
    var Z = new Vector<byte>(90); // Z

    for (var o = 0; o < data.Length - Vector<byte>.Count; o += Vector<byte>.Count)
    {
        var v = new Vector<byte>(data, o);

        v = Vector.ConditionalSelect(
            Vector.BitwiseAnd(
                Vector.GreaterThanOrEqual(v, A),
                Vector.LessThanOrEqual(v, Z)
            ),
            Vector.BitwiseOr(new Vector<byte>(0x20), v), // 0010 0000
            v
        );

  // Now Vector.Widen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to see if there was some char to byte conversion happening up-stack, but the data is coming in as bytes... http://source.dot.net/#System.Net.Http/System/Net/Http/Managed/HttpConnection.cs,354

If we "Widen", won't we need to allocate a 2x temporary buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we "Widen", won't we need to allocate a 2x temporary buffer?

For the the vector part its two local Vector<short> variables; for shorter a long, int or char; no buffers needed

@ahsonkhan
Copy link
Member Author

As an aside, @stephentoub, this leftover TODO can be cleaned up now, correct?
http://source.dot.net/#System.Net.Http/System/Net/Http/Managed/HttpConnection.cs,565

This issue - dotnet/roslyn#17287 - got resolved.

@geoffkizer
Copy link

geoffkizer commented Mar 2, 2018

The better approach here (at least for TryGetKnownHeader) is to precalculate the header names as Span<byte> instead of string, then do the compare -- so both operands are Span<byte>. We actually sorta have this precalculation today, we're just not using it. I think we filed an issue on this somewhere.

We still need to do a case-insensitive compare here, so we can't just use SequenceEqual. But perhaps we could do some clever vectorization as per @benaadams suggestion above. And we can guarantee the precalculated header name bytes are all lowercase already.

Or we could do something slightly different: Header names usually use consistent casing (i.e. Content-Length, not CONTENT-LENGTH) so we could do a SequenceEqual on the expected casing first, which will typically succeed, and fall back to a more expensive case-insensitive compare if that fails.

@benaadams
Copy link
Member

The better approach here (at least for TryGetKnownHeader) is to precalculate the header names as Span instead of string, then do the compare

Store a byte[] version of the known headers and use that rather than the string for compare? Makes sense, half the data to compare also.

@geoffkizer
Copy link

Yeah, exactly. GitHub ate the <byte> part of Span<byte> in my original comment. Edited it for clarity.

@stephentoub
Copy link
Member

Store a byte[] version of the known headers

We do; that's what @geoffkizer meant when he said "We actually sorta have this precalculation today, we're just not using it". If we have a case-insensitive comparison that works on bytes, then we can use that; otherwise let's just stick with what we have.

@stephentoub
Copy link
Member

stephentoub commented Mar 2, 2018

As an aside, @stephentoub , this leftover TODO can be cleaned up now, correct?
http://source.dot.net/#System.Net.Http/System/Net/Http/Managed/HttpConnection.cs,565 This issue - dotnet/roslyn#17287 - got resolved.

No. It just needs a different issue number now:
dotnet/csharplang#1331

@stephentoub
Copy link
Member

No. It just needs a different issue number now:

Actually, it was already updated; the code you linked to is stale.

@stephentoub
Copy link
Member

@ahsonkhan, at this point is there more to do here other than just delete the TODO comment?

@ahsonkhan
Copy link
Member Author

at this point is there more to do here other than just delete the TODO comment?

Yes, if we decide to keep the internal ByteArrayHelpers method. In which case, I think we can close this PR.

@stephentoub
Copy link
Member

Yes, if we decide to keep the internal ByteArrayHelpers method. In which case, I think we can close this PR.

If we find a good way to avoid it in the future, I'd be happy to see it go away. In the meantime, though, I don't think we have anything better / shared we can use instead.

@stephentoub stephentoub closed this Mar 7, 2018
@stephentoub
Copy link
Member

Thanks, though!

@karelz karelz added this to the 2.1.0 milestone Mar 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants