Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: One shot hashing of Stream #62489

Closed
vcsjones opened this issue Dec 7, 2021 · 22 comments · Fixed by #63757
Closed

[API Proposal]: One shot hashing of Stream #62489

vcsjones opened this issue Dec 7, 2021 · 22 comments · Fixed by #63757
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Security

Comments

@vcsjones
Copy link
Member

vcsjones commented Dec 7, 2021

Background and motivation

As of .NET 5 (Hash) and .NET 6 (HMAC) we have static one-shots for computing the hash of bytes, either in the form of an array of a ReadOnlySpan<byte>.

This one shot I think is a good direction for .NET. To re-cap what we have today in terms of APIs:

  1. HASHALG.HashData - one shot of fixed-length buffers, array or span
  2. IncrementalHash - updatable hash
  3. HASHALG.Create - .NET Framework 1.0 design

My personal belief is that number three should be mostly de-emphasized for common use cases. One shots should use option one, and incremental should use two. Create should be there for cases where polymorphic behavior is desired, if ever.

There is one case remaining (I believe) where option one and two don't handle as well as three: hashing streams, because the HashAlgorithm instances have ComputeHash(Stream stream)

It's not uncommon to need to hash a stream all in one go (thus still a one-shot). I would propose that these should also be one shots to further reduce the need for option three.

As a matter of implementation, the goal is not to necessarily use platform APIs to produce the most optimal hashing API (though we can certainly do our best to optimize some certain situations). Rather the goal is to provide an API surface that does not require developers to manage an instance of a hash object.

API Proposal

namespace System.Security.Cryptography {
    public abstract partial class MD5 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA1 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA256 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA384 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA512 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACMD5 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA1 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA256 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA384 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA512 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static async ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static async ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
}

API Usage

static void Example1() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        byte[] fileHash = SHA256.HashData(fs);
    }
}

static void Example2() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        Span<byte> buffer = stackalloc byte[32];
        int written = SHA256.TryHashData(fs, buffer);
    }
}

static async Task Example3() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        byte[] fileHash = await SHA256.HashDataAsync(fs);
    }
}

static async Task Example4() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        Memory<byte> existingBuffer = default; // Something reasonable here.
        int written = await SHA256.TryHashDataAsync(fs, existingBuffer);
    }
}

Alternative Designs

"Do nothing" is likely the best alternative to this. It's possible to do this today using IncrementalHash (read from the stream, update, done) or HASHALG.ComputeHash{Async}. Both of these options however require managing an instance of the hash object.

@vcsjones vcsjones added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 7, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 7, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@vcsjones vcsjones added area-System.Security and removed untriaged New issue has not been triaged by the area owner labels Dec 7, 2021
@ghost
Copy link

ghost commented Dec 7, 2021

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

As of .NET 5 (Hash) and .NET 6 (HMAC) we have static one-shots for computing the hash of bytes, either in the form of an array of a ReadOnlySpan<byte>.

This one shot I think is a good direction for .NET. To re-cap what we have today in terms of APIs:

  1. HASHALG.HashData - one shot of fixed-length buffers, array or span
  2. IncrementalHash - updatable hash
  3. HASHALG.Create - .NET Framework 1.0 design

My personal belief is that number three should be mostly de-emphasized for common use cases. One shots should use option one, and incremental should use two. Create should be there for cases where polymorphic behavior is desired, if ever.

There is one case remaining (I believe) where option one and two don't handle as well as three: hashing streams.

It's not uncommon to need to hash a stream all in one go (thus still a one-shot). I would propose that these should also be one shots to further reduce the need for option three.

As a matter of implementation, the goal is not to necessarily use platform APIs to produce the most optimal hashing API (though we can certainly do our best to optimize some certain situations). Rather the goal is to provide an API surface that does not require developers to manage an instance of a hash object.

API Proposal

namespace System.Security.Cryptography {
    public abstract partial class MD5 {
        public static byte[] HashData(Stream source);
        public static int TryHashData(Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA1 {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA256 {
        public static byte[] HashData(Stream source);
        public static int TryHashData(Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA384 {
        public static byte[] HashData(Stream source);
        public static int TryHashData(Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class SHA512 {
        public static byte[] HashData(Stream source);
        public static int TryHashData(Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACMD5 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int TryHashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA1 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int TryHashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA256 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int TryHashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA384 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int TryHashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }

    public abstract partial class HMACSHA512 {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static int TryHashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static Task<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static Task<int> TryHashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
}

API Usage

static void Example1() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        byte[] fileHash = SHA256.HashData(fs);
    }
}

static void Example2() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        Span<byte> buffer = stackalloc byte[32];
        int written = SHA256.TryHashData(fs, buffer);
    }
}

static async Task Example3() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        byte[] fileHash = await SHA256.HashDataAsync(fs);
    }
}

static async Task Example4() {
    using (FileStream fs = File.Open("/please/hash/me", FileMode.Open)) {
        Memory<byte> existingBuffer = default; // Something reasonable here.
        int written = await SHA256.TryHashDataAsync(fs, existingBuffer);
    }
}

Alternative Designs

"Do nothing" is likely the best alternative to this. It's possible to do this today using IncrementalHash (read from the stream, update, done) or HASHALG.ComputeHash{Async}. Both of these options however require managing an instance of the hash object.

Risks

No response

Author: vcsjones
Assignees: -
Labels:

api-suggestion, area-System.Security

Milestone: -

@vcsjones vcsjones added the untriaged New issue has not been triaged by the area owner label Dec 7, 2021
@Tornhoof
Copy link
Contributor

Tornhoof commented Dec 7, 2021

Suggestion: ValueTask, the ReadAsync overload with Memory on Stream is ValueTask too.

@vcsjones
Copy link
Member Author

vcsjones commented Dec 7, 2021

Suggestion: ValueTask, the ReadAsync overload with Memory on Stream is ValueTask too.

That seems sensible. Updated (I suppose this will be discussed in The API Review, assuming this gets that far).

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Dec 7, 2021

+1 to this proposal! Some initial feedback:

  • The key should be taken as a ROS<byte> across all overloads for HMAC. There's no need for the implementation to preserve the key material across the async boundary.
  • I almost want to get rid of the Try* routines, as there could be ambiguity as to the failure conditions. So that would mean that there are overloads of HashData that take a destination buffer as a parameter. All of these algorithms have a fixed digest size, so we could say that the burden is on the caller to pass an appropriately sized buffer. The method could still return int if desired to make slicing easier, but it's not necessary.

I know the int vs. void thing has been the subject of API review debate. I expect Jeremy has more coherent thoughts on this. IMO this is a minor enough detail that it shouldn't delay review.

@vcsjones
Copy link
Member Author

vcsjones commented Dec 7, 2021

The key should be taken as a ROS across all overloads for HMAC. There's no need for the implementation to preserve the key material across the async boundary.

Presumably these APIs are going to actually be async. You can't have ref struct parameters on an async method. You will be greeted with:

error CS4012: Parameters or locals of type 'ReadOnlySpan<byte>' cannot be declared in async methods or async lambda expressions.

All of these algorithms have a fixed digest size, so we could say that the burden is on the caller to pass an appropriately sized buffer. The method could still return int if desired to make slicing easier, but it's not necessary.

The existing APIs already return int even though the length of the hash is fixed. I wasn't a huge fan of that, I would have rather have had void. However, these are overloads of other HashData APIs that also return an int. I would favor being consistent.

Looking at my proposal though, the int returning ones should not be Try. They don't return a bool. Try doesn't work well for async since we can't have an out parameter in an async method. So I got rid of the Try cases. If we wanted Trys, it would have to look something like:

public static async Task<(bool Success, int Written)> TryHashDataAsync(
    ReadOnlyMemory<byte> key,
    Stream source) => throw null;

@svick
Copy link
Contributor

svick commented Dec 7, 2021

@vcsjones

Presumably these APIs are going to actually be async.

I think the method itself does not have to be async. The implementation could look something like this (keep in mind that I don't know anything about cryptography):

ValueTask<byte[]> HashDataAsync(ReadOnlySpan<byte> key, Stream source)
{
    var incrementalHash = IncrementalHash.CreateHMAC(HashAlgorithmName.SHA256, key);

    return HashDataCoreAsync();

    async ValueTask<byte[]> HashDataCoreAsync()
    {
        using (incrementalHash)
        {
            var buffer = new byte[4096];

            while (await source.ReadAsync(buffer) is int bytesRead and not 0)
            {
                incrementalHash.AppendData(buffer.AsSpan(0, bytesRead));
            }

            return incrementalHash.GetHashAndReset();
        }
    }
}

@vcsjones
Copy link
Member Author

vcsjones commented Dec 7, 2021

@svick

I think the method itself does not have to be async.

Ah. That might work. However, I would say there is one other case where a Memory<byte> might work better, and that is the caller. While we can avoid the async, callers might not be able to. For example:

using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

class Program {
    static async Task Main() {
        ReadOnlySpan<byte> key = new byte[32];
        byte[] destination = new byte[64];
        _ = await HMACSHA512.HashDataAsync(key, Stream.Null, destination);
        await File.WriteAllBytesAsync("digest.bin", destination);
    }
}

public abstract partial class HMACSHA512 {
    public static ValueTask<int> HashDataAsync(ReadOnlySpan<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default) => await null;
}

That won't compile because Main is async. So, accepting a span seems tricky because async callers won't be able to declare / create spans easily themselves.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Dec 8, 2021

Related (re: span and async): dotnet/csharplang#1331

If in practice we expect callers to have a byte[] for the key, then there's no harm to making the parameter ROS<byte>. The implicit conversion will succeed, same as it would have succeeded for ROM<byte>. Basically, since span is the universal receiver, it should be preferred over memory if practical. Would need to ponder this a bit more.

Of course, I'm waiting until everybody goes on vacation to send the Secret<T> PR, then I can request one-shot overloads that take Secret<byte> key as the first param. 😈

@vcsjones
Copy link
Member Author

vcsjones commented Dec 8, 2021

in practice we expect callers to have a byte[] for the key

Well in that case I would just say take a byte[] then. The ROM will at least give the opportunity for accepting slices and similar.

then there's no harm to making the parameter ROS. The implicit conversion will succeed

The API shape is.. strange in my opinion. "We accept a ReadOnlySpan<byte> here, but the only way to do it is by implicit conversion". Developers will look at the documentation and left wondering, "how do I even create a span here?"

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Dec 8, 2021

The API shape is.. strange in my opinion.

No stranger than an implicit conversion from byte[] to ROM<byte>. Memory is much less prevalent in the ecosystem than Span. But yeah, not seeing a clean way around this, unfortunately. May as well leave it as proposed and see what falls out.

@GrabYourPitchforks GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Dec 8, 2021
@FiniteReality
Copy link

The API shape is.. strange in my opinion.

No stranger than an implicit conversion from byte[] to ROM<byte>. Memory is much less prevalent in the ecosystem than Span. But yeah, not seeing a clean way around this, unfortunately. May as well leave it as proposed and see what falls out.

I disagree - using ReadOnlyMemory<byte> would be more consistent with other async APIs which take a buffer, like Socket.SendAsync and ReceiveAsync, ClientWebSocket.SendAsync and ReceiveAsync, and even Stream.WriteAsync and ReadAsync. I also imagine "new" code that's likely to be using this would probably also be using MemoryPool, which itself uses Memory<T>.

@teo-tsirpanis
Copy link
Contributor

@FiniteReality an API accepting spans is always better than an API accepting memories because memories can be converted to spans, but not the opposite. Almost all asynchronous methods such as those in Socket and Stream cannot accept spans because they would have to store them on the heap (or if they don't, force inheritors to), which is not allowed for spans.

But for HMAC, we don't have to pass the key as a memory because we only use it in the beginning, before any asynchronous operation is performed. Since the algorithm allows it, not accepting spans would be a missed opportunity. And accepting spans is better than accepting memories.

@FiniteReality
Copy link

Almost all asynchronous methods such as those in Socket and Stream cannot accept spans because they would have to store them on the heap (or if they don't, force inheritors to), which is not allowed for spans.

A consumer of your asynchronous method can't pass a span, because they are asynchronous. Just because the implementation doesn't need the data when it goes asynchronous doesn't mean you should accept a span, because you have to think about how people will use the method.

For example, Socket.SendAsync doesn't actually need the data when it goes async (it just copies the data to a kernel buffer and then returns a handle for when that send completes, or for TCP, when the ACK for that send is received) but it still takes a Memory<byte> because a consumer is likely to be writing an async method, where ReadOnlySpan<byte> is an inconvenient type to use. Here's some sample code to demonstrate my point:

static ValueTask<bool> M(ReadOnlySpan<byte> x) {
    return new(true);
}


static async ValueTask N()
{
    // CS4012
    ReadOnlySpan<byte> data = /* memory from somewhere: */ new byte[10];
    // would work but we're async
    await M(data);

    ReadOnlyMemory<byte> data2 = new byte[10];

    // CS1503
    await M(data2);
    // requires an extra .Span
    await M(data2.Span);
}

I believe using a Span here effectively lulls people into a false sense of security, because while you are discarding the key material when you go async, a consumer is likely already in an async context, which means that they're keeping the key material referenced in memory for the entire execution of the asynchronous part of the code, simply as a side effect of how the C# compiler disallows the construction of span variables in async methods.

@GSPP
Copy link

GSPP commented Dec 15, 2021

I'm linking to the original proposal for the now completed one-shot hashing methods for span-like data: #17590 This proposal has reasoning and discussion that applies to this issue as well.

@Joe4evr
Copy link
Contributor

Joe4evr commented Dec 17, 2021

@FiniteReality Just a reminder that since the data in your async method has to be heapable anyway, a conversion directly at calling the span-taking method does work today.

@FiniteReality
Copy link

@FiniteReality Just a reminder that since the data in your async method has to be heapable anyway, a conversion directly at calling the span-taking method does work today.

Yeah, it works if you .AsSpan() or .Span, but IMO that seems like a lot of unnecessary overhead when you could just accept ReadOnlyMemory<byte>, particularly when the data has to be heapable anyway.

@GrabYourPitchforks
Copy link
Member

Yeah, it works if you .AsSpan() or .Span, but IMO that seems like a lot of unnecessary overhead when you could just accept ReadOnlyMemory<byte>, particularly when the data has to be heapable anyway.

The key doesn't have to be heapable. In theory somebody who's security-paranoid would want to stackalloc it rather than store it in the heap. (And in fact the asp.net DataProtection implementation does just this for the majority of its key material.) This was also why I half-seriously suggested that this would be an appropriate place to implement a Secret<byte>-consuming overload for the key rather than using ROM<byte>. That way the caller could keep the key within a data structure that's purpose-built for key storage rather than needing to materialize a byte[].

@terrajobst
Copy link
Member

terrajobst commented Jan 4, 2022

Video

  • Looks good as proposed
    • The APIs returning byte[] should have overloads that also accept byte[]
namespace System.Security.Cryptography
{
    public abstract partial class MD5
    {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class SHA1
    {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class SHA256
    {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class SHA384
    {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class SHA512
    {
        public static byte[] HashData(Stream source);
        public static int HashData(Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class HMACMD5
    {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static byte[] HashData(byte[] key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<byte[]> HashDataAsync(byte[] key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class HMACSHA1
    {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static byte[] HashData(byte[] key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<byte[]> HashDataAsync(byte[] key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class HMACSHA256
    {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static byte[] HashData(byte[] key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<byte[]> HashDataAsync(byte[] key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class HMACSHA384
    {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static byte[] HashData(byte[] key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<byte[]> HashDataAsync(byte[] key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
    public abstract partial class HMACSHA512
    {
        public static byte[] HashData(ReadOnlySpan<byte> key, Stream source);
        public static byte[] HashData(byte[] key, Stream source);
        public static int HashData(ReadOnlySpan<byte> key, Stream source, Span<byte> destination);

        public static ValueTask<byte[]> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<byte[]> HashDataAsync(byte[] key, Stream source, CancellationToken cancellationToken = default);
        public static ValueTask<int> HashDataAsync(ReadOnlyMemory<byte> key, Stream source, Memory<byte> destination, CancellationToken cancellationToken = default);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 4, 2022
@vcsjones vcsjones self-assigned this Jan 4, 2022
@vcsjones
Copy link
Member Author

@GrabYourPitchforks @bartonjs Feedback for implementation:

We can use MemoryStream.TryGetBuffer(out ArraySegment<byte> segment) and feed that in to the existing one shots, which will be quicker and allocation-less in the right environments. This has some issues:

  1. TryGetBuffer does not account for position. This isn't actually a big deal, we can account for this with Position, e.g.

    if (source is MemoryStream ms && ms.TryGetBuffer(out ArraySegment<byte> segment))
    {
        // TryGetBuffer does not account for Position, so move where the Position is.
        Span<byte> data = ((ReadOnlySpan<byte>)segment).Slice(ms.Position);
        // Hash data
    }
  2. The next is that TryGetBuffer does not advance the stream. Do we want to try to make the behavior consistent? e.g. ms.Position = ms.Length; Or should we leave the Position alone?

  3. MemoryStream is not sealed. Do want want to make this shortcut with ours-and-ours-only MemoryStream? Does the answer to number 2 affect that decision?

  4. Does any of this make us believe that we shouldn't give MemoryStream special treatment?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 13, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 23, 2022
@GSPP
Copy link

GSPP commented Jan 25, 2022

@vcsjones I can see that the implementation that was merged does not special-case MemoryStream. Was this optimization rejected?

@vcsjones
Copy link
Member Author

@GSPP it was split off in to #64173, where we came to the conclusion it was not worth it.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants