Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve efficiency and correctness of SSL handshake #1949

Closed
wants to merge 20 commits into from

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Jan 21, 2020

This is replacement for #1350. Instead of using shared buffer to hold data, it turns ProtocolToken to readOnly structure. Aside from allocation count there was no significant benefit of using shared buffer so I abandoned that part.

As part of #453 we start using _internalBuffer to read data for handshake. That was not done quite right as if SslStream is disposed while doing handshake, we may cause NRE. (#1291)
I could add locking like the other places but that complicates renegotiation - and that is common case for TLS1.3. So address feedback and improve stability I reused ArrayBuffers from HTTP2.
It pretty much does what I wanted, it can also optionally use shared buffers and it can grow buffer as needed. It also provides nice Span and Memory view on data.

To take advantage of it I added more Spanification to where #453 left it off. Instead of using ArraySegment I pushed ReadOnlySpan through Windows PAL in way similar to other platforms.

While this change is focused on Async path, I plan to come back and cleanup and unify with synchronous path. This also does not address locking issues discovered while testing TLS1.3.
Keeping negotiation buffers separate should hopefully help.

try
{
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecurityBufferDescriptor.cBuffers];
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecBuffers.Count];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the intention is for the stackalloc to be 1..3 entries long but there's no assertion or check that i can see and since count is unconditionally increased in SetNextBuffer it could in theory any value if someone found a way to abuse it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share an example of where/when it wouldn't be a small value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing obvious, as i said the intention is clearly that it should be 1..3 which causes no issue. I just thought it might be worth an assert or check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sure, adding an assert would be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Assert in SetNextBuffer so it won't go more than 3. I can also make it consistent and always allocate 3 since the rest of the logic is fixed anyway. I did not find a good way how to make the equivalent of fixed() in a more flexible way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if hot path but having a constant-sized stack via (stackalloc Interop.SspiCli.SecBuffer[3])[0..inSecBuffers.Count] may use fewer instructions, if you do have this upper bound.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if the max is actually 3, just use 3. The JIT is better at optimizing const-sized stackallocs. There's also no reason to slice it beyond that unless you actually use the length later. If you do use the length later, just stick with the variable-sized stackalloc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated code to use const-sized allocation since the difference between 0 and 3 is really small. (and will be 2 in most cases)
I was wondering if '3' should be some named constant instead of magic number but I not sure what the name should be and where it should live.

@wfurt
Copy link
Member Author

wfurt commented Jan 22, 2020

I pushed an update to fix another regression caused by #453. Renegotiation can happen in the synchronous path and in that case current code would hit assert. To fix that I plummed adapter through the new code path and updated the existing test to cover that case.

@wfurt
Copy link
Member Author

wfurt commented Jan 22, 2020

contributes to #1720

}
}

[StructLayout(LayoutKind.Auto)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Auto? (It's unusual to see this attribute, so I'm curious)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy that from the original SecurityBuffer as the intention was the same.

[StructLayout(LayoutKind.Auto)]
internal struct SecurityBuffer		     internal struct SecurityBuffer

Copy link
Member

@stephentoub stephentoub Feb 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use it when we may not have chosen the optimal layout for fields and want to allow the runtime to do so for us. That said, it's most beneficial when the type is going to end up on the heap, either on its own or as part of an array, such that you want to minimize the overall allocation size. But this is a ref struct... it'll never be on the heap.

@wfurt wfurt changed the title WIP: improve efficiency and correctness of SSL handshake improve efficiency and correctness of SSL handshake Jan 24, 2020
@wfurt
Copy link
Member Author

wfurt commented Jan 24, 2020

all failing tests have no console output. This can be a Helix issue.
cc: @MattGal

@MattGal
Copy link
Member

MattGal commented Jan 27, 2020

all failing tests have no console output. This can be a Helix issue.
cc: @MattGal

When I'm OOF it's good to tag @dotnet/dnceng . Looking at this now, hopefully logs are still there...

@MattGal
Copy link
Member

MattGal commented Jan 27, 2020

@wfurt these are crashes where the file permissions of the core dumps being produced are somehow incorrect, resulting in "[Errno 18] Invalid cross-device link".

I filed https://github.com/dotnet/core-eng/issues/8662 to track this. Unfortunately the core dumps here are likely lost as the harness crashed trying to move them around on disk.

@wfurt
Copy link
Member Author

wfurt commented Feb 4, 2020

All your concerns should be resolved @stephentoub. This is ready for another review round. The last updates bring the use of the IO Adapter as well as a fix for another regression caused by #453.

@@ -431,34 +431,38 @@ internal abstract partial class SafeDeleteContext : SafeHandle
SafeFreeContextBuffer outFreeContextBuffer = null;
try
{
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecurityBufferDescriptor.cBuffers];
inUnmanagedBuffer.Clear();
// Allocate always maximum to allow better code optimization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment isn't necessary

inUnmanagedBuffer.Clear();
// Allocate always maximum to allow better code optimization.
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[3];
for (int index = 0; index < inSecBuffers.Count; ++index)
Copy link
Member

@stephentoub stephentoub Feb 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert anything about the relationship between inSecBuffers.Count and 3?


if (inSecBuffers.Count > 0 && inUnmanagedBuffer[0].pvBuffer == IntPtr.Zero)
{
inUnmanagedBuffer[0].pvBuffer = (IntPtr)pinnedToken0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to move the earlier block that created and filled in the span to here? They wouldn't need to first assign IntPtr.Zero and then check for IntPtr.Zero, you'd just assign the right value to begin with (either the UnmanagedToken's handle or the pinned pointer).

internal InputSecurityBuffer GetBuffer(int index)
{
Debug.Assert(index < 3);
return index == 0 ? _item0 : index == 1 ? _item1 : _item2;
Copy link
Member

@stephentoub stephentoub Feb 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help perf-wise if GetBuffer returned ref InputSecurityBuffer rather than InputSecurityBuffer. Could also be investigated subsequently.

if (result == null || result.IsCompleted)
{
// Operation has completed.
_handshakeBuffer.Dispose();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If result completes in failure/cancellation, this could end up running concurrently with the _handshakeBuffer.Dispose; _nestedAuth = 0; in the ContinueWith callback. That seems wrong, no?

else
{
await InnerStream.WriteAsync(message.Payload, readAdapter.CancellationToken).ConfigureAwait(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this if/else block is necessary. Why can't you just use the adapter to do the write? Is the issue that you have a read adapter but you need a write adapter? If so, you can't pass a write adapter in here?

Maybe the read and write adapters should be combined, so that we just have an SslSyncAdapter and an SslAsyncAdapter, each of which has both the read and write methods on it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, combing them together would do the trick.

int bytesRead = await task.ConfigureAwait(false);
if (bytesRead == 0)
{
throw new IOException(SR.net_io_eof);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this throw for EOF but the fast-path earlier successfully returns 0 for it? That means there's a race condition, where it may return 0 or may throw for EOF depending on how quickly the EOF is received.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this has been in the works for a while, but the change is making me nervous, in large part because I'm having trouble following what's necessary for correctness and what's being done for perf optimizations. Can we split this into two pieces, one that's purely about fixing the assert we've been seeing in CI and making the implementation correct, and a second that follows that up that tries to improve perf / reduce allocation?

@wfurt
Copy link
Member Author

wfurt commented Feb 9, 2020

yes, I can do that. It increased over time but I think it can be separated into smaller chunks.

@stephentoub
Copy link
Member

Thanks.

@wfurt
Copy link
Member Author

wfurt commented Feb 13, 2020

will be replaced by smaller changes.

@wfurt wfurt closed this Feb 13, 2020
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants