-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of re-used symmetric one shots #93124
Conversation
Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones Issue DetailsToday, when you use the symmetric one-shots to decrypt or decrypt, it creates a new lite cipher, uses it, then clean it up every time. (Context: "lite cipher" is the internal facade over the platform's native encryption components) Creating a lite cipher is expensive from various things like creating handles, validating keys and inputs, and simply the natural of the algorithm itself from activities like key expansion. This change allows caching the lite cipher for the duration of the key, so long as the key doesn't change. If the key changes, then the lite cipher is invalidated. The scenario this improves is this: using Aes aes = aes.Create();
aes.Key = GetTheKey();
byte[] iv1 = RandomNumberGenerator.GetBytes(16);
byte[] iv2 = RandomNumberGenerator.GetBytes(16);
byte[] ciphertext1 = aes.EncryptCbc(plaintext1, iv1);
byte[] ciphertext2 = aes.EncryptCbc(plaintext2, iv1);
// N or more calls to EncryptCbc / DecryptCbc, etc. Because the one-shots were previously more-or-less fully contained, they were thread safe - such that calling the one shots from a single instance of SymmetricAlgorithm would work and produce correct results. This is trait I would like to keep and I am not comfortable breaking. Because of this then, the caching uses interlocked exchanges of handles. Since we are only caching one instance at a given time, using an instance from multiple threads may not benefit from the cache. I think this is reasonable to keep the caching mechanism simple, and there is an easy developer workaround: pool an instance of the SymmetricAlgorithm. This does mean though that there is a performance penalty, albeit a small one, for when the one-shot is used exactly once. The extra bookkeeping for the caching is wasted. However, it is far outweighed if you use it more than once. Full data is below, but some examples: Aes.EncryptCbc with a single call when from 579.0ns to 606.2ns. TripleDES.EncryptCbc with a single call when from 11,311.6ns to 11,294.2ns (this "improvement" is within error) The performance improvements can be substantial when it's used multiple times. For single uses, the performance loss is two This pull request addresses ECB and CBC only. Absent is CFB. CFB, because of feedback size, could be addressed differently, which is why I am leaving it out of this pull request. ECB and CBC are the far more common modes as well. CBC Benchmarks
ECB Benchmarks
CBC benchmark source
|
...libraries/System.Security.Cryptography/src/System/Security/Cryptography/AesImplementation.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Security.Cryptography/src/System/Security/Cryptography/AesImplementation.cs
Show resolved
Hide resolved
base.Key = value; | ||
InvalidateOneShotAlgorithms(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This instance has been used, so there's a cached lite-cipher for the mode we care about.
- Thread A calls into set_Key, hasn't yet made it to updating the field
- Thread B starts encrypting (or decrypting, whatever), something long. Gigabyteish. Still has k0
- Thread A makes it out to InvalidateOneShotAlgorithms. get_Key will now return k1
- Thread B finishes. The field is null, so it puts back the instance it used at k0.
- As long as parallel ops never happen again, the cache stays at k0 from one-borrow one-return.
Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, yep.
Okay, going to close this for now, as the threading part needs a bit of work. Will re-open when that's in better shape, with new benchmarks. |
Today, when you use the symmetric one-shots to decrypt or decrypt, it creates a new lite cipher, uses it, then clean it up every time.
(Context: "lite cipher" is the internal facade over the platform's native encryption components)
Creating a lite cipher is expensive from various things like creating handles, validating keys and inputs, and simply the natural of the algorithm itself from activities like key expansion.
This change allows caching the lite cipher for the duration of the key, so long as the key doesn't change. If the key changes, then the lite cipher is invalidated.
The scenario this improves is this:
Because the one-shots were previously more-or-less fully contained, they were thread safe - such that calling the one shots from a single instance of SymmetricAlgorithm would work and produce correct results. This is trait I would like to keep and I am not comfortable breaking. Because of this then, the caching uses interlocked exchanges of handles. Since we are only caching one instance at a given time, using an instance from multiple threads may not benefit from the cache. I think this is reasonable to keep the caching mechanism simple, and there is an easy developer workaround: pool an instance of the SymmetricAlgorithm.
This does mean though that there is a performance penalty, albeit a small one, for when the one-shot is used exactly once. The extra bookkeeping for the caching is wasted. However, it is far outweighed if you use it more than once.
Full data is below, but some examples:
Aes.EncryptCbc with a single call when from 579.0ns to 606.2ns.
Aes.EncryptCbc with three calls when from 1,395.5ns to 757.2ns.
TripleDES.EncryptCbc with a single call when from 11,311.6ns to 11,294.2ns (this "improvement" is within error)
TripleDES.EncryptCbc with three calls when from 33,826.3ns to 13,004.6ns.
The performance improvements can be substantial when it's used multiple times. For single uses, the performance loss is two
Interlocked.Exchange
calls.This pull request addresses ECB and CBC only. Absent is CFB. CFB, because of feedback size, could be addressed differently, which is why I am leaving it out of this pull request. ECB and CBC are the far more common modes as well.
Benchmarks are in this Gist so they don't wrap funny in this comment. https://gist.github.com/vcsjones/b79889c4c6196bec50d783ff6e9aba0d
CBC benchmark source
ECB benchmark source