-
-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sse2 version of inverse transform #1819
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1819 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 936 936
Lines 48193 48320 +127
Branches 6037 6038 +1
=======================================
+ Hits 42099 42221 +122
- Misses 5094 5097 +3
- Partials 1000 1002 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
// a01 a11 a21 a31 x x x x | ||
// a02 a12 a22 a32 x x x x | ||
// a03 a13 a23 a33 x x x x | ||
if (doTwo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to avoid giving unnecessary extra work to the branch predictor, and refactor ITransform
into two separate methods.
One hidden cost is that assembly code having branches on a runtime constant will be longer, and needs more cache lines, which are expensive to load.
Alternative "fun" solution, sort of template metaprogramming in C#, that will eliminate branches at JIT time:
ITransform<TDoTwo>(...) {
...
if (typeof(TDoTwo) == typeof(DoTwo_True)) {
...
}
else {
...
}
}
Can't decide if it's worth it or better to just duplicate the code, and maybe create reusable helper methods for shared bits.
Unsafe.As<byte, Vector64<byte>>(ref outputRef) = ref0.GetLower(); | ||
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1.GetLower(); | ||
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2.GetLower(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first 3 calls, it's possible and better to avoid GetLower
. The second store will overwrite the upper 8 bit written in the first store and so on:
Unsafe.As<byte, Vector64<byte>>(ref outputRef) = ref0.GetLower(); | |
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1.GetLower(); | |
Unsafe.As<byte, Vector64<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2.GetLower(); | |
Unsafe.As<byte, Vector128<byte>>(ref outputRef) = ref0; | |
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps)) = ref1; | |
Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref outputRef, WebpConstants.Bps * 2)) = ref2; |
If you guarantee at the call site that dst
has a patting of 8 bytes, you can also avoid it at the last call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think that will work. I only want to write 8 bytes from each ref vector.
The first write will be at position 0, the second at 32 (note the Unsafe.Add() with WebpConstants.Bps
, WebpConstants.Bps is 32), the third at 64 and the last at 96.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah haven't noticed that Bps
is 32, nevermind then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Prerequisites
Description
This PR introduces a SSE2 version of the method
ITransform
, which is used during lossy webp encoding.Related to #1786
The profiling results look good:
master branch:
PR
Note: the different call counts are due to the SSE version is doing two transforms at once.