-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DeflateStream's handling of partial reads #53644
Conversation
Tagging subscribers to this area: @carlossanlop Issue DetailsStream.Read{Async} is supposed to return once at least a byte of data is available, and in particular, if there's any data already available, it shouldn't block. But DeflateStream.Read{Async} won't return until either it hits the end of the stream or the caller's buffer is filled. This makes it behave very unexpectedly when used in a context where the app is using a large read buffer but expects to be able to process data as it's available. This fixes that by stopping looping once any data is consumed. Just doing that, though, caused problems for zero-byte reads, so I've also addressed another issue in DeflateStream. Zero-byte reads are typically used by code that's trying to delay allocating a buffer for the read data until data will be available to read. At present, however, zero-byte reads return immediately regardless of whether data is available to be consumed. I've changed the flow to make it so that zero-byte reads don't return until there's at least some data available as input to the inflater (this, though, doesn't 100% guarantee the inflater will be able to produce output data). As part of this, I separated out some duplicated code, including a throw, and then fixed the style of other throw helpers nearby to match to the new one I was adding. Finally, I changed the invalid-concurrency-checking to use Exchange rather than CompareExchange to make it a bit cheaper. Fixes #53502 I did some performance validation and don't see any meaningful negative impact. The cost that could in theory cause us to regress is more entering/exiting of the ReadAsync async method, since we'll now exit out of it when we have any data vs staying inside it looping to fill the user buffer. The flip side of that is if partial data is available, the consumer can act on it more quickly rather than waiting around for more data to arrive.
|
I take that back. Here's an extreme case that does show a regression: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;
using System.Threading;
using System.Threading.Tasks;
[MemoryDiagnoser]
public class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
private Stream _source;
private byte[] _buffer = new byte[100_000];
[GlobalSetup]
public void Setup()
{
var ms = new MemoryStream();
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
ds.Write(RandomNumberGenerator.GetBytes(100_000));
}
_source = new TrickleMemoryStream(ms.ToArray());
}
[Benchmark]
public async Task ReadAsync()
{
_source.Position = 0;
using var ds = new DeflateStream(_source, CompressionMode.Decompress, leaveOpen: true);
int bytesRead;
while ((bytesRead = await ds.ReadAsync(_buffer, default)) != 0) ;
}
private sealed class TrickleMemoryStream : MemoryStream
{
public TrickleMemoryStream(byte[] data) : base(data) { }
public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
{
return base.ReadAsync(buffer.Slice(0, Math.Min(buffer.Length, 10)), cancellationToken);
}
}
}
This is an extreme case, though, with an underlying stream that only returns at most 10 bytes per read, and reading into a 100K user buffer. |
Turns out BrotliStream and CryptoStream suffer the same issues. I've fixed BrotliStream, and I'll do so for CryptoStream as well. |
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
805a74d
to
75b8ae9
Compare
@bartonjs, mind reviewing the CryptoStream changes? @adamsitnik, @carlossanlop, @jozkee, mind reviewing the DeflateStream and BrotliStream changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CryptoStream changes LGTM as long as coverage says all the blocks in the rewrite got hit.
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs
Show resolved
Hide resolved
...raries/System.IO.Compression.Brotli/src/System/IO/Compression/dec/BrotliStream.Decompress.cs
Outdated
Show resolved
Hide resolved
...ies/System.Security.Cryptography.Primitives/src/System/Security/Cryptography/CryptoStream.cs
Show resolved
Hide resolved
The core logic in CryptoStream is not great for perf (or memory utilization) and could be improved significantly -- see #45080. This is even more true after this PR, since we are no longer filling the user's buffer. Some of the weirdness of the existing logic comes from attempting to do that. But I assume that's beyond the scope of this PR. |
Stream.Read{Async} is supposed to return once at least a byte of data is available, and in particular, if there's any data already available, it shouldn't block. But Read{Async} on DeflateStream (and thus also GZipStream and ZLibStream), BrotliStream, and CryptoStream won't return until either it hits the end of the stream or the caller's buffer is filled. This makes it behave very unexpectedly when used in a context where the app is using a large read buffer but expects to be able to process data as it's available, e.g. in networked streaming scenarios where messages are being sent as part of bidirectional communication. This fixes that by stopping looping once any data is consumed. Just doing that, though, caused problems for zero-byte reads. Zero-byte reads are typically used by code that's trying to delay-allocate a buffer for the read data until data will be available to read. At present, however, zero-byte reads return immediately regardless of whether data is available to be consumed. I've changed the flow to make it so that zero-byte reads don't return until there's at least some data available as input to the inflater/transform (this, though, doesn't 100% guarantee the inflater/transform will be able to produce output data). Note that both of these changes have the potential to introduce breaks into an app that erroneously depended on these implementation details: - If an app passing in a buffer of size N to Read{Async} depended on that call always producing the requested number of bytes (rather than what the Stream contract defines), they might experience behavioral changes. - If an app passing in a zero-byte buffer expected it to return immediately, it might instead end up waiting until data was actually available.
Stream.Read{Async} is supposed to return once at least a byte of data is available, and in particular, if there's any data already available, it shouldn't block. But DeflateStream.Read{Async} won't return until either it hits the end of the stream or the caller's buffer is filled. This makes it behave very unexpectedly when used in a context where the app is using a large read buffer but expects to be able to process data as it's available.
This fixes that by stopping looping once any data is consumed. Just doing that, though, caused problems for zero-byte reads, so I've also addressed another issue in DeflateStream. Zero-byte reads are typically used by code that's trying to delay allocating a buffer for the read data until data will be available to read. At present, however, zero-byte reads return immediately regardless of whether data is available to be consumed. I've changed the flow to make it so that zero-byte reads don't return until there's at least some data available as input to the inflater (this, though, doesn't 100% guarantee the inflater will be able to produce output data).
As part of this, I separated out some duplicated code, including a throw, and then fixed the style of other throw helpers nearby to match to the new one I was adding. Finally, I changed the invalid-concurrency-checking to use Exchange rather than CompareExchange to make it a bit cheaper.
Fixes #53502
cc: @geoffkizer, @ayende, @adamsitnik, @carlossanlop, @jozkee
I did some performance validation and don't see any meaningful negative impact. The cost that could in theory cause us to regress is more entering/exiting of the ReadAsync async method, since we'll now exit out of it when we have any data vs staying inside it looping to fill the user buffer. The flip side of that is if partial data is available, the consumer can act on it more quickly rather than waiting around for more data to arrive.