-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve efficiency and correctness of SSL handshake #1949
Conversation
try | ||
{ | ||
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecurityBufferDescriptor.cBuffers]; | ||
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecBuffers.Count]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the intention is for the stackalloc to be 1..3 entries long but there's no assertion or check that i can see and since count is unconditionally increased in SetNextBuffer it could in theory any value if someone found a way to abuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share an example of where/when it wouldn't be a small value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing obvious, as i said the intention is clearly that it should be 1..3 which causes no issue. I just thought it might be worth an assert or check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Sure, adding an assert would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is Assert in SetNextBuffer so it won't go more than 3. I can also make it consistent and always allocate 3 since the rest of the logic is fixed anyway. I did not find a good way how to make the equivalent of fixed()
in a more flexible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if hot path but having a constant-sized stack via (stackalloc Interop.SspiCli.SecBuffer[3])[0..inSecBuffers.Count]
may use fewer instructions, if you do have this upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if the max is actually 3, just use 3. The JIT is better at optimizing const-sized stackallocs. There's also no reason to slice it beyond that unless you actually use the length later. If you do use the length later, just stick with the variable-sized stackalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated code to use const-sized allocation since the difference between 0 and 3 is really small. (and will be 2 in most cases)
I was wondering if '3' should be some named constant instead of magic number but I not sure what the name should be and where it should live.
I pushed an update to fix another regression caused by #453. Renegotiation can happen in the synchronous path and in that case current code would hit assert. To fix that I plummed |
contributes to #1720 |
src/libraries/Common/src/Interop/Windows/SspiCli/SecuritySafeHandles.cs
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
[StructLayout(LayoutKind.Auto)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Auto? (It's unusual to see this attribute, so I'm curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy that from the original SecurityBuffer
as the intention was the same.
[StructLayout(LayoutKind.Auto)]
internal struct SecurityBuffer internal struct SecurityBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally use it when we may not have chosen the optimal layout for fields and want to allow the runtime to do so for us. That said, it's most beneficial when the type is going to end up on the heap, either on its own or as part of an array, such that you want to minimize the overall allocation size. But this is a ref struct... it'll never be on the heap.
src/libraries/Common/src/Interop/Windows/SspiCli/SecuritySafeHandles.cs
Outdated
Show resolved
Hide resolved
all failing tests have no console output. This can be a Helix issue. |
When I'm OOF it's good to tag @dotnet/dnceng . Looking at this now, hopefully logs are still there... |
@wfurt these are crashes where the file permissions of the core dumps being produced are somehow incorrect, resulting in "[Errno 18] Invalid cross-device link". I filed https://github.com/dotnet/core-eng/issues/8662 to track this. Unfortunately the core dumps here are likely lost as the harness crashed trying to move them around on disk. |
src/libraries/Common/src/Interop/Windows/SspiCli/SecuritySafeHandles.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Windows/SspiCli/SecuritySafeHandles.cs
Outdated
Show resolved
Hide resolved
All your concerns should be resolved @stephentoub. This is ready for another review round. The last updates bring the use of the IO Adapter as well as a fix for another regression caused by #453. |
src/libraries/Common/src/Interop/Windows/SspiCli/SecuritySafeHandles.cs
Outdated
Show resolved
Hide resolved
@@ -431,34 +431,38 @@ internal abstract partial class SafeDeleteContext : SafeHandle | |||
SafeFreeContextBuffer outFreeContextBuffer = null; | |||
try | |||
{ | |||
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[inSecurityBufferDescriptor.cBuffers]; | |||
inUnmanagedBuffer.Clear(); | |||
// Allocate always maximum to allow better code optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment isn't necessary
inUnmanagedBuffer.Clear(); | ||
// Allocate always maximum to allow better code optimization. | ||
Span<Interop.SspiCli.SecBuffer> inUnmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[3]; | ||
for (int index = 0; index < inSecBuffers.Count; ++index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert anything about the relationship between inSecBuffers.Count and 3?
|
||
if (inSecBuffers.Count > 0 && inUnmanagedBuffer[0].pvBuffer == IntPtr.Zero) | ||
{ | ||
inUnmanagedBuffer[0].pvBuffer = (IntPtr)pinnedToken0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be easier to move the earlier block that created and filled in the span to here? They wouldn't need to first assign IntPtr.Zero and then check for IntPtr.Zero, you'd just assign the right value to begin with (either the UnmanagedToken's handle or the pinned pointer).
internal InputSecurityBuffer GetBuffer(int index) | ||
{ | ||
Debug.Assert(index < 3); | ||
return index == 0 ? _item0 : index == 1 ? _item1 : _item2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might help perf-wise if GetBuffer returned ref InputSecurityBuffer
rather than InputSecurityBuffer
. Could also be investigated subsequently.
if (result == null || result.IsCompleted) | ||
{ | ||
// Operation has completed. | ||
_handshakeBuffer.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If result completes in failure/cancellation, this could end up running concurrently with the _handshakeBuffer.Dispose; _nestedAuth = 0; in the ContinueWith callback. That seems wrong, no?
else | ||
{ | ||
await InnerStream.WriteAsync(message.Payload, readAdapter.CancellationToken).ConfigureAwait(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this if/else block is necessary. Why can't you just use the adapter to do the write? Is the issue that you have a read adapter but you need a write adapter? If so, you can't pass a write adapter in here?
Maybe the read and write adapters should be combined, so that we just have an SslSyncAdapter and an SslAsyncAdapter, each of which has both the read and write methods on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, combing them together would do the trick.
int bytesRead = await task.ConfigureAwait(false); | ||
if (bytesRead == 0) | ||
{ | ||
throw new IOException(SR.net_io_eof); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this throw for EOF but the fast-path earlier successfully returns 0 for it? That means there's a race condition, where it may return 0 or may throw for EOF depending on how quickly the EOF is received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this has been in the works for a while, but the change is making me nervous, in large part because I'm having trouble following what's necessary for correctness and what's being done for perf optimizations. Can we split this into two pieces, one that's purely about fixing the assert we've been seeing in CI and making the implementation correct, and a second that follows that up that tries to improve perf / reduce allocation?
yes, I can do that. It increased over time but I think it can be separated into smaller chunks. |
Thanks. |
will be replaced by smaller changes. |
This is replacement for #1350. Instead of using shared buffer to hold data, it turns ProtocolToken to readOnly structure. Aside from allocation count there was no significant benefit of using shared buffer so I abandoned that part.
As part of #453 we start using _internalBuffer to read data for handshake. That was not done quite right as if SslStream is disposed while doing handshake, we may cause NRE. (#1291)
I could add locking like the other places but that complicates renegotiation - and that is common case for TLS1.3. So address feedback and improve stability I reused ArrayBuffers from HTTP2.
It pretty much does what I wanted, it can also optionally use shared buffers and it can grow buffer as needed. It also provides nice Span and Memory view on data.
To take advantage of it I added more Spanification to where #453 left it off. Instead of using ArraySegment I pushed ReadOnlySpan through Windows PAL in way similar to other platforms.
While this change is focused on Async path, I plan to come back and cleanup and unify with synchronous path. This also does not address locking issues discovered while testing TLS1.3.
Keeping negotiation buffers separate should hopefully help.