-
-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SSE2 version of Vp8Sse4X4 #1817
Conversation
# Conflicts: # src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs
Codecov Report
@@ Coverage Diff @@
## master #1817 +/- ##
==========================================
- Coverage 87.34% 87.19% -0.15%
==========================================
Files 936 936
Lines 48154 48164 +10
Branches 6038 6037 -1
==========================================
- Hits 42058 41998 -60
- Misses 5095 5162 +67
- Partials 1001 1004 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a)); | ||
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8))); | ||
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8))); | ||
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing has some unnecessary extra costs, if input is safe we can do everything with pointer Unsafe
arithmetics:
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a)); | |
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8))); | |
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8))); | |
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8))); | |
ref byte aRef = ref MemoryMarshal.GetReference(a); | |
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref aRef); | |
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps)); | |
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 2)); | |
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 3)); |
Same for b.
# Conflicts: # tests/ImageSharp.Tests/Formats/WebP/LossyUtilsTests.cs
{ | ||
return GetSse(a, b, 4, 4); | ||
} | ||
} | ||
|
||
[MethodImpl(InliningOptions.ShortMethod)] | ||
public static int GetSse(Span<byte> a, Span<byte> b, int w, int h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name confuses me shouldn't it be Vp84X4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vp84X4
is confusing because of 84
. Parent method can be Vp8_4x4
while this one can be Vp8_4X4Scalar
. Or maybe Vp8_4x4Scalar
could be inlined if it's not used anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the name is not ideal, but its called VP8SSE4x4
in the original libwebp source and I wanted to keep the method names to be as close to the original as possible to make it easier to follow/commpare the original code.
A better name might have been CalculateDistortion4x4
, but again I am still in favor of keeping the names similar to the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name confuses me shouldn't it be
Vp84X4
?
GetSse
is not called Vp84X4
because of the variable parameters w and h.
I tried to rename the methods: 7e20c5d
Is that better (or worse)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better yes 👍
{ | ||
return GetSse(a, b, 4, 4); | ||
} | ||
} | ||
|
||
[MethodImpl(InliningOptions.ShortMethod)] | ||
public static int GetSse(Span<byte> a, Span<byte> b, int w, int h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vp84X4
is confusing because of 84
. Parent method can be Vp8_4x4
while this one can be Vp8_4X4Scalar
. Or maybe Vp8_4x4Scalar
could be inlined if it's not used anywhere else?
Array.Clear(this.YDcLevels, 0, this.YDcLevels.Length); | ||
Array.Clear(this.YAcLevels, 0, this.YAcLevels.Length); | ||
Array.Clear(this.UvLevels, 0, this.UvLevels.Length); | ||
Array.Clear(this.ModesI4, 0, this.ModesI4.Length); | ||
Array.Clear(this.Derr, 0, this.Derr.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the main thing in the PR, but Vp8ModeScore
is also a good candidate to be made a value type with fixed buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes seems reasonable, Vp8ModeScore
makes up the most of the allocations during encoding, but I might try that in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely worth a look after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM so far.
Prerequisites
Description
This adds a SSE2 version of the method
Vp8Sse4X4
, which is used during lossy encoding of webp images.Related to #1786
Profiling results:
Master Branch:
PR:
P.S. im not sure again why the call count is different, it was the same image