Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Byte mixing and/or native access violation failures #52047

Closed
CarnaViire opened this issue Apr 29, 2021 · 10 comments
Closed

[QUIC] Byte mixing and/or native access violation failures #52047

CarnaViire opened this issue Apr 29, 2021 · 10 comments

Comments

@CarnaViire
Copy link
Member

CarnaViire commented Apr 29, 2021

Repro:

        [Fact]
        public async Task ByteMixingOrNativeAVE_MinimalFailingTest()
        {
            const int writeSize = 64 * 1024;
            const int NumberOfWrites = 512;
            byte[] data1 = new byte[writeSize * NumberOfWrites];
            byte[] data2 = new byte[writeSize * NumberOfWrites];
            Array.Fill(data1, (byte)1);
            Array.Fill(data2, (byte)2);

            Task t1 = RunTest(data1);
            Task t2 = RunTest(data2);

            async Task RunTest(byte[] data)
            {
                await RunClientServer(
                    iterations: 20,
                    serverFunction: async connection =>
                    {
                        await using QuicStream stream = await connection.AcceptStreamAsync();

                        byte[] buffer = new byte[data.Length];
                        int bytesRead = await ReadAll(stream, buffer);
                        Assert.Equal(data.Length, bytesRead);
                        AssertArrayEqual(data, buffer);

                        for (int pos = 0; pos < data.Length; pos += writeSize)
                        {
                            await stream.WriteAsync(data[pos..(pos + writeSize)]);
                        }
                        await stream.WriteAsync(Memory<byte>.Empty, endStream: true);

                        await stream.ShutdownCompleted();
                    },
                    clientFunction: async connection =>
                    {
                        await using QuicStream stream = connection.OpenBidirectionalStream();

                        for (int pos = 0; pos < data.Length; pos += writeSize)
                        {
                            await stream.WriteAsync(data[pos..(pos + writeSize)]);
                        }
                        await stream.WriteAsync(Memory<byte>.Empty, endStream: true);

                        byte[] buffer = new byte[data.Length];
                        int bytesRead = await ReadAll(stream, buffer);
                        Assert.Equal(data.Length, bytesRead);
                        AssertArrayEqual(data, buffer);

                        await stream.ShutdownCompleted();
                    }
                );
            }

            await (new[] { t1, t2 }).WhenAllOrAnyFailed(millisecondsTimeout: 1000000);
        }

For me it now fails on every run with either dotnet.exe crashing with exit code -1073741819 = 0xc0000005 (Access Violation), or Assert failing on receiving wrong bytes -- either random bytes

Wrong data starting from idx=131072
Expected: [..., 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
Actual:   [..., 1, 1, 1, 80, 113, 124, 123, 182, 1, 0, 0, ...]
Wrong data starting from idx=17825792
Expected: [..., 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...]
Actual:   [..., 2, 2, 2, 208, 44, 146, 244, 155, 1, 0, 0, ...]

or bytes from parallel connection

Wrong data starting from idx=131072
Expected: [..., 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...]
Actual:   [..., 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...]

For native Access Violation, crash dump shows it happening on memcpy on sending data

Unhandled exception at 0x00007FFE0A90237C (msquic.dll) in dotnet.exe.20252.dmp: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.	

>	msquic.dll!memcpy() Line 299	Unknown
 	msquic.dll!QuicStreamSendBufferRequest(QUIC_STREAM * Stream, QUIC_SEND_REQUEST * Req) Line 447	C
 	msquic.dll!QuicSendBufferFill(QUIC_CONNECTION * Connection) Line 181	C
 	[Inline Frame] msquic.dll!QuicStreamCompleteSendRequest(QUIC_STREAM *) Line 414	C
 	msquic.dll!QuicStreamOnAck(QUIC_STREAM * Stream, QUIC_SEND_PACKET_FLAGS PacketFlags, QUIC_SENT_FRAME_METADATA * FrameMetadata) Line 1403	C
 	msquic.dll!QuicLossDetectionOnPacketAcknowledged(QUIC_LOSS_DETECTION * LossDetection, QUIC_ENCRYPT_LEVEL EncryptLevel, QUIC_SENT_PACKET_METADATA * Packet) Line 502	C
 	msquic.dll!QuicLossDetectionProcessAckBlocks(QUIC_LOSS_DETECTION * LossDetection, QUIC_PATH * Path, QUIC_ENCRYPT_LEVEL EncryptLevel, unsigned __int64 AckDelay, QUIC_RANGE * AckBlocks, unsigned char * InvalidAckBlock) Line 1357	C
 	[Inline Frame] msquic.dll!QuicLossDetectionProcessAckFrame(QUIC_LOSS_DETECTION * LossDetection, QUIC_PATH *) Line 1462	C
 	msquic.dll!QuicConnRecvFrames(QUIC_CONNECTION * Connection, QUIC_PATH * Path, CXPLAT_RECV_PACKET * Packet, CXPLAT_ECN_TYPE ECN) Line 4090	C
 	msquic.dll!QuicConnRecvDatagramBatch(QUIC_CONNECTION * Connection, QUIC_PATH * Path, unsigned char BatchCount, CXPLAT_RECV_DATA * * Datagrams, const unsigned char * Cipher, QUIC_RECEIVE_PROCESSING_STATE * RecvState) Line 4994	C
 	msquic.dll!QuicConnRecvDatagrams(QUIC_CONNECTION * Connection, CXPLAT_RECV_DATA * DatagramChain, unsigned int DatagramChainCount, unsigned char IsDeferred) Line 5248	C
 	[Inline Frame] msquic.dll!QuicConnFlushRecv(QUIC_CONNECTION *) Line 5322	C
 	msquic.dll!QuicConnDrainOperations(QUIC_CONNECTION * Connection) Line 6665	C
 	msquic.dll!QuicWorkerProcessConnection(QUIC_WORKER * Worker, QUIC_CONNECTION * Connection) Line 487	C
 	msquic.dll!QuicWorkerThread(void * Context) Line 579	C
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

I would suspect that we treat our send buffers in some wrong way. We do pin them, but maybe we don't pin them long enough. From msquic docs: "As long as there is room to buffer the data, MsQuic will copy the data locally and then immediately complete the send back to the app, via the QUIC_STREAM_EVENT_SEND_COMPLETE event. If there is no room to copy the data, then MsQuic will hold onto the buffer until there is room."

Affected tests:

  • System.Net.Quic.Tests.MsQuicTests.WriteTests
  • System.Net.Quic.Tests.QuicStreamTests_MsQuicProvider.LargeDataSentAndReceived
  • System.Net.Quic.Tests.QuicStreamTests_MsQuicProvider.ReadWrite_Random_Success
@ghost
Copy link

ghost commented Apr 29, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

        [Fact]
        public async Task ByteMixingOrNativeAVE_MinimalFailingTest()
        {
            const int writeSize = 64 * 1024;
            const int NumberOfWrites = 512;
            byte[] data1 = new byte[writeSize * NumberOfWrites];
            byte[] data2 = new byte[writeSize * NumberOfWrites];
            Array.Fill(data1, (byte)1);
            Array.Fill(data2, (byte)2);

            Task t1 = RunTest(data1);
            Task t2 = RunTest(data2);

            async Task RunTest(byte[] data)
            {
                await RunClientServer(
                    iterations: 20,
                    serverFunction: async connection =>
                    {
                        await using QuicStream stream = await connection.AcceptStreamAsync();

                        byte[] buffer = new byte[data.Length];
                        int bytesRead = await ReadAll(stream, buffer);
                        Assert.Equal(data.Length, bytesRead);
                        AssertArrayEqual(data, buffer);

                        for (int pos = 0; pos < data.Length; pos += writeSize)
                        {
                            await stream.WriteAsync(data[pos..(pos + writeSize)]);
                        }
                        await stream.WriteAsync(Memory<byte>.Empty, endStream: true);

                        await stream.ShutdownCompleted();
                    },
                    clientFunction: async connection =>
                    {
                        await using QuicStream stream = connection.OpenBidirectionalStream();

                        for (int pos = 0; pos < data.Length; pos += writeSize)
                        {
                            await stream.WriteAsync(data[pos..(pos + writeSize)]);
                        }
                        await stream.WriteAsync(Memory<byte>.Empty, endStream: true);

                        byte[] buffer = new byte[data.Length];
                        int bytesRead = await ReadAll(stream, buffer);
                        Assert.Equal(data.Length, bytesRead);
                        AssertArrayEqual(data, buffer);

                        await stream.ShutdownCompleted();
                    }
                );
            }

            await (new[] { t1, t2 }).WhenAllOrAnyFailed(millisecondsTimeout: 1000000);
        }

For me it now fails on every run with either dotnet.exe crashing with exit code -1073741819 = 0xc0000005 (Access Violation), or Assert failing on receiving wrong bytes -- either random bytes

Wrong data starting from idx=131072
Expected: [..., 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
Actual:   [..., 1, 1, 1, 80, 113, 124, 123, 182, 1, 0, 0, ...]
Wrong data starting from idx=17825792
Expected: [..., 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...]
Actual:   [..., 2, 2, 2, 208, 44, 146, 244, 155, 1, 0, 0, ...]

or bytes from parallel connection

Wrong data starting from idx=131072
Expected: [..., 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...]
Actual:   [..., 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...]

For native Access Violation, crash dump shows it happening on memcpy on sending data

Unhandled exception at 0x00007FFE0A90237C (msquic.dll) in dotnet.exe.20252.dmp: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.	

>	msquic.dll!memcpy() Line 299	Unknown
 	msquic.dll!QuicStreamSendBufferRequest(QUIC_STREAM * Stream, QUIC_SEND_REQUEST * Req) Line 447	C
 	msquic.dll!QuicSendBufferFill(QUIC_CONNECTION * Connection) Line 181	C
 	[Inline Frame] msquic.dll!QuicStreamCompleteSendRequest(QUIC_STREAM *) Line 414	C
 	msquic.dll!QuicStreamOnAck(QUIC_STREAM * Stream, QUIC_SEND_PACKET_FLAGS PacketFlags, QUIC_SENT_FRAME_METADATA * FrameMetadata) Line 1403	C
 	msquic.dll!QuicLossDetectionOnPacketAcknowledged(QUIC_LOSS_DETECTION * LossDetection, QUIC_ENCRYPT_LEVEL EncryptLevel, QUIC_SENT_PACKET_METADATA * Packet) Line 502	C
 	msquic.dll!QuicLossDetectionProcessAckBlocks(QUIC_LOSS_DETECTION * LossDetection, QUIC_PATH * Path, QUIC_ENCRYPT_LEVEL EncryptLevel, unsigned __int64 AckDelay, QUIC_RANGE * AckBlocks, unsigned char * InvalidAckBlock) Line 1357	C
 	[Inline Frame] msquic.dll!QuicLossDetectionProcessAckFrame(QUIC_LOSS_DETECTION * LossDetection, QUIC_PATH *) Line 1462	C
 	msquic.dll!QuicConnRecvFrames(QUIC_CONNECTION * Connection, QUIC_PATH * Path, CXPLAT_RECV_PACKET * Packet, CXPLAT_ECN_TYPE ECN) Line 4090	C
 	msquic.dll!QuicConnRecvDatagramBatch(QUIC_CONNECTION * Connection, QUIC_PATH * Path, unsigned char BatchCount, CXPLAT_RECV_DATA * * Datagrams, const unsigned char * Cipher, QUIC_RECEIVE_PROCESSING_STATE * RecvState) Line 4994	C
 	msquic.dll!QuicConnRecvDatagrams(QUIC_CONNECTION * Connection, CXPLAT_RECV_DATA * DatagramChain, unsigned int DatagramChainCount, unsigned char IsDeferred) Line 5248	C
 	[Inline Frame] msquic.dll!QuicConnFlushRecv(QUIC_CONNECTION *) Line 5322	C
 	msquic.dll!QuicConnDrainOperations(QUIC_CONNECTION * Connection) Line 6665	C
 	msquic.dll!QuicWorkerProcessConnection(QUIC_WORKER * Worker, QUIC_CONNECTION * Connection) Line 487	C
 	msquic.dll!QuicWorkerThread(void * Context) Line 579	C
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

I would suspect that we treat our send buffers in some wrong way. We do pin them, but maybe we don't pin them long enough. From msquic docs: "As long as there is room to buffer the data, MsQuic will copy the data locally and then immediately complete the send back to the app, via the QUIC_STREAM_EVENT_SEND_COMPLETE event. If there is no room to copy the data, then MsQuic will hold onto the buffer until there is room."

Author: CarnaViire
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 29, 2021
@CarnaViire CarnaViire removed the untriaged New issue has not been triaged by the area owner label Apr 29, 2021
@CarnaViire CarnaViire mentioned this issue Apr 29, 2021
9 tasks
@CarnaViire CarnaViire added this to the 6.0.0 milestone Apr 29, 2021
@CarnaViire
Copy link
Member Author

@nibanks Have you ever encountered an error like this? Is there something that we should be aware of when passing buffers to msquic? Will QUIC_STREAM_EVENT_SEND_COMPLETE always mean that msquic is done with our buffer?

@ThadHouse
Copy link
Contributor

When do you unpin the data? QUIC_STREAM_EVENT_SEND_COMPLETE is the last time MsQuic uses the buffer, and its safe to unpin only after that is received for a specific buffer.

@wfurt
Copy link
Member

wfurt commented Apr 29, 2021

It would be nice IMHO if the send function tells you directly if the buffer was consumed. Always waiting for event and going through the wait state seems to add lot of processing for normal case.
Seems like the 0xFFFFFFFFFFFFFFFF location is completely bogus.

@CarnaViire
Copy link
Member Author

@ThadHouse from what I see, now we always wait for QUIC_STREAM_EVENT_SEND_COMPLETE before unpinning

@nibanks
Copy link

nibanks commented Apr 29, 2021

Then it's likely we'll need logs to make any more progress.

CarnaViire added a commit that referenced this issue May 4, 2021
#52050)

Removed MsQuicTestBase class as it was essentially the same as QuicTestBase. Updated ActiveIssue comments with newly created issues #52047 and #52048 and added repros from them as new tests.
@wfurt
Copy link
Member

wfurt commented May 13, 2021

I think #52368 should address this. can you please check @CarnaViire and close this if everything looks ok?

@CarnaViire
Copy link
Member Author

I haven't encountered these problems anymore on Windows, so it seems fixed. @ManickaP could you please check that everything is ok on Linux side too?

@wfurt
Copy link
Member

wfurt commented May 14, 2021

Confirmation on Linux would be good. However, I think we should keep it open to track the suggestions from #52368. (unless we want to open new issue for cases when Stream is disposed in middle of IO)

@ManickaP
Copy link
Member

Triage: we haven't seen this in a very long time, it should be fixed, closing

@ghost ghost locked as resolved and limited conversation to collaborators Aug 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants