Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSE2 version of Vp8Sse4X4 #1817

Merged
merged 13 commits into from
Nov 10, 2021
Merged

Add SSE2 version of Vp8Sse4X4 #1817

merged 13 commits into from
Nov 10, 2021

Conversation

brianpopow
Copy link
Collaborator

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This adds a SSE2 version of the method Vp8Sse4X4, which is used during lossy encoding of webp images.
Related to #1786

Profiling results:

Master Branch:

Vp8Sse4X4_master

PR:

Vp8Sse4X4_sse

P.S. im not sure again why the call count is different, it was the same image

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #1817 (1997d59) into master (255226b) will decrease coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1817      +/-   ##
==========================================
- Coverage   87.34%   87.19%   -0.15%     
==========================================
  Files         936      936              
  Lines       48154    48164      +10     
  Branches     6038     6037       -1     
==========================================
- Hits        42058    41998      -60     
- Misses       5095     5162      +67     
- Partials     1001     1004       +3     
Flag Coverage Δ
unittests 87.19% <100.00%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Formats/Webp/Lossy/Vp8Encoding.cs 100.00% <ø> (ø)
src/ImageSharp/Formats/Webp/Lossy/Vp8Histogram.cs 100.00% <ø> (ø)
src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs 100.00% <100.00%> (ø)
src/ImageSharp/Formats/Webp/Lossy/QuantEnc.cs 96.92% <100.00%> (-0.04%) ⬇️
src/ImageSharp/Formats/Webp/Lossy/Vp8ModeScore.cs 100.00% <100.00%> (ø)
.../ImageSharp/Formats/Webp/Lossless/LosslessUtils.cs 90.98% <0.00%> (-6.56%) ⬇️
...ageSharp/Formats/Webp/Lossless/PredictorEncoder.cs 92.81% <0.00%> (-5.49%) ⬇️
...ImageSharp/Formats/Webp/Lossless/Vp8LBitEntropy.cs 98.80% <0.00%> (-1.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 255226b...1997d59. Read the comment docs.

Comment on lines 31 to 34
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a));
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8)));
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8)));
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8)));
Copy link
Member

@antonfirsov antonfirsov Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing has some unnecessary extra costs, if input is safe we can do everything with pointer Unsafe arithmetics:

Suggested change
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a));
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8)));
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8)));
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8)));
ref byte aRef = ref MemoryMarshal.GetReference(a);
Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref aRef);
Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps));
Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 2));
Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 3));

Same for b.

{
return GetSse(a, b, 4, 4);
}
}

[MethodImpl(InliningOptions.ShortMethod)]
public static int GetSse(Span<byte> a, Span<byte> b, int w, int h)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name confuses me shouldn't it be Vp84X4?

Copy link
Member

@antonfirsov antonfirsov Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vp84X4 is confusing because of 84. Parent method can be Vp8_4x4 while this one can be Vp8_4X4Scalar. Or maybe Vp8_4x4Scalar could be inlined if it's not used anywhere else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the name is not ideal, but its called VP8SSE4x4 in the original libwebp source and I wanted to keep the method names to be as close to the original as possible to make it easier to follow/commpare the original code.
A better name might have been CalculateDistortion4x4, but again I am still in favor of keeping the names similar to the original.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name confuses me shouldn't it be Vp84X4?

GetSse is not called Vp84X4 because of the variable parameters w and h.

I tried to rename the methods: 7e20c5d
Is that better (or worse)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better yes 👍

{
return GetSse(a, b, 4, 4);
}
}

[MethodImpl(InliningOptions.ShortMethod)]
public static int GetSse(Span<byte> a, Span<byte> b, int w, int h)
Copy link
Member

@antonfirsov antonfirsov Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vp84X4 is confusing because of 84. Parent method can be Vp8_4x4 while this one can be Vp8_4X4Scalar. Or maybe Vp8_4x4Scalar could be inlined if it's not used anywhere else?

Comment on lines +100 to +104
Array.Clear(this.YDcLevels, 0, this.YDcLevels.Length);
Array.Clear(this.YAcLevels, 0, this.YAcLevels.Length);
Array.Clear(this.UvLevels, 0, this.UvLevels.Length);
Array.Clear(this.ModesI4, 0, this.ModesI4.Length);
Array.Clear(this.Derr, 0, this.Derr.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the main thing in the PR, but Vp8ModeScore is also a good candidate to be made a value type with fixed buffers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes seems reasonable, Vp8ModeScore makes up the most of the allocations during encoding, but I might try that in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth a look after.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far.

@JimBobSquarePants JimBobSquarePants merged commit 7d74c4c into master Nov 10, 2021
@JimBobSquarePants JimBobSquarePants deleted the bp/sse4X4 branch November 10, 2021 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants