Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize Png encoder filters #1630

Merged

Conversation

TechPizzaDev
Copy link
Contributor

@TechPizzaDev TechPizzaDev commented May 15, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

@CLAassistant
Copy link

CLAassistant commented May 15, 2021

CLA assistant check
All committers have signed the CLA.

@TechPizzaDev
Copy link
Contributor Author

TechPizzaDev commented May 15, 2021

And here are some excessively detailed benchmarks on random RGB (and Alpha 255) data.

Environment
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-4720HQ CPU 2.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.202
  [Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Job=MediumRun  EvaluateOverhead=True  Toolchain=InProcessEmitToolchain  
IterationCount=15  LaunchCount=1  WarmupCount=10  
Average
Type Method Size Mean Error StdDev Ratio
Average Average 64 38.834 μs 0.1914 μs 0.1494 μs 1.000
Average Ssse3Average 64 5.135 μs 0.0210 μs 0.0186 μs 0.132
Average Sse2Average 64 6.175 μs 0.3104 μs 0.2903 μs 0.159
Average Avx2Average 64 3.599 μs 0.0302 μs 0.0267 μs 0.093
Average Average 256 672.200 μs 25.7805 μs 22.8537 μs 1.000
Average Ssse3Average 256 85.646 μs 1.5865 μs 1.3248 μs 0.127
Average Sse2Average 256 92.483 μs 1.2481 μs 1.0422 μs 0.138
Average Avx2Average 256 57.248 μs 1.3243 μs 1.2388 μs 0.085
Average Average 1024 9,968.715 μs 59.5136 μs 55.6691 μs 1.000
Average Ssse3Average 1024 1,546.775 μs 8.0654 μs 7.1497 μs 0.155
Average Sse2Average 1024 1,652.449 μs 7.1568 μs 5.9763 μs 0.166
Average Avx2Average 1024 1,268.928 μs 18.3060 μs 17.1234 μs 0.127
Average Average 4096 177,359.313 μs 1,886.8009 μs 1,764.9148 μs 1.000
Average Ssse3Average 4096 27,926.817 μs 226.7537 μs 189.3496 μs 0.157
Average Sse2Average 4096 29,114.548 μs 176.3376 μs 164.9463 μs 0.164
Average Avx2Average 4096 19,337.158 μs 206.2467 μs 182.8324 μs 0.109
Paeth
Type Method Size Mean Error StdDev Ratio
Paeth Paeth 64 147.512 μs 1.0109 μs 0.8961 μs 1.000
Paeth VectorPaeth 64 17.381 μs 0.0630 μs 0.0492 μs 0.118
Paeth UnsafeVectorPaeth 64 16.269 μs 0.0782 μs 0.0653 μs 0.110
Paeth Paeth 256 2,541.548 μs 12.2208 μs 10.8334 μs 1.000
Paeth VectorPaeth 256 271.942 μs 1.6199 μs 1.4360 μs 0.107
Paeth UnsafeVectorPaeth 256 255.983 μs 0.8942 μs 0.7927 μs 0.101
Paeth Paeth 1024 40,888.074 μs 142.4349 μs 133.2337 μs 1.000
Paeth VectorPaeth 1024 4,403.322 μs 11.5166 μs 10.2092 μs 0.108
Paeth UnsafeVectorPaeth 1024 4,241.534 μs 76.9621 μs 68.2249 μs 0.104
Paeth Paeth 4096 664,667.564 μs 2,478.8314 μs 2,197.4195 μs 1.000
Paeth VectorPaeth 4096 73,365.707 μs 360.9557 μs 319.9778 μs 0.110
Paeth UnsafeVectorPaeth 4096 67,785.956 μs 1,067.5039 μs 946.3144 μs 0.102
Up
Type Method Size Mean Error StdDev Ratio
Up Up 64 28.109 μs 0.0926 μs 0.0774 μs 1.000
Up VectorUp 64 3.464 μs 0.0113 μs 0.0106 μs 0.123
Up UnsafeVectorUp 64 3.208 μs 0.0186 μs 0.0165 μs 0.114
Up Up 256 449.780 μs 2.2934 μs 2.1453 μs 1.000
Up VectorUp 256 59.635 μs 0.2284 μs 0.2025 μs 0.133
Up UnsafeVectorUp 256 52.384 μs 0.2513 μs 0.2098 μs 0.116
Up Up 1024 7,226.976 μs 33.3938 μs 29.6027 μs 1.000
Up VectorUp 1024 1,174.367 μs 14.2758 μs 12.6551 μs 0.162
Up UnsafeVectorUp 1024 1,179.356 μs 9.2726 μs 8.2199 μs 0.163
Up Up 4096 137,018.553 μs 696.9427 μs 651.9207 μs 1.000
Up VectorUp 4096 18,564.646 μs 164.0998 μs 128.1183 μs 0.135
Up UnsafeVectorUp 4096 18,764.654 μs 153.3329 μs 135.9256 μs 0.137
Sub
Type Method Size Mean Error StdDev Ratio
Sub Sub 64 31.704 μs 0.2713 μs 0.2405 μs 1.000
Sub VectorSub 64 3.358 μs 0.0158 μs 0.0132 μs 0.106
Sub UnsafeVectorSub 64 3.166 μs 0.0057 μs 0.0051 μs 0.100
Sub Sub 256 504.137 μs 1.5142 μs 1.4164 μs 1.000
Sub VectorSub 256 56.196 μs 0.2222 μs 0.1855 μs 0.111
Sub UnsafeVectorSub 256 50.632 μs 0.6273 μs 0.5561 μs 0.100
Sub Sub 1024 8,162.503 μs 26.4349 μs 22.0743 μs 1.000
Sub VectorSub 1024 923.093 μs 4.9075 μs 4.5905 μs 0.113
Sub UnsafeVectorSub 1024 922.966 μs 13.2451 μs 11.7415 μs 0.113
Sub Sub 4096 155,133.122 μs 971.6967 μs 908.9257 μs 1.000
Sub VectorSub 4096 14,224.630 μs 119.9329 μs 93.6357 μs 0.092
Sub UnsafeVectorSub 4096 14,952.815 μs 82.6097 μs 77.2732 μs 0.096

@codecov
Copy link

codecov bot commented May 15, 2021

Codecov Report

Merging #1630 (4cfc701) into master (a8cb711) will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1630      +/-   ##
==========================================
+ Coverage   83.67%   83.75%   +0.08%     
==========================================
  Files         749      749              
  Lines       33111    33275     +164     
  Branches     3714     3736      +22     
==========================================
+ Hits        27707    27871     +164     
  Misses       4682     4682              
  Partials      722      722              
Flag Coverage Δ
unittests 83.75% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ImageSharp/Common/Helpers/Numerics.cs 97.23% <100.00%> (+0.23%) ⬆️
...rc/ImageSharp/Formats/Png/Filters/AverageFilter.cs 100.00% <100.00%> (ø)
src/ImageSharp/Formats/Png/Filters/PaethFilter.cs 100.00% <100.00%> (ø)
src/ImageSharp/Formats/Png/Filters/SubFilter.cs 100.00% <100.00%> (ø)
src/ImageSharp/Formats/Png/Filters/UpFilter.cs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 993f96e...4cfc701. Read the comment docs.

@brianpopow
Copy link
Collaborator

@TechnologicalPizza thank you for the contribution, the performance gains look very promising 👍

Unit Tests for this could be done with the help of the FeatureTestRunner

See TransposeInto tests in Block8x8FTests.cs:

@JimBobSquarePants
Copy link
Member

I cannot adequately describe how much joy it brings me to see such fantastic collaboration here! Really great to see everyone pitching in with ideas. 😄

@TechPizzaDev
Copy link
Contributor Author

Environment
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-4720HQ CPU 2.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.202
  [Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Job=MediumRun  EvaluateOverhead=True  Toolchain=InProcessEmitToolchain  
IterationCount=15  LaunchCount=1  WarmupCount=10
Paeth 🥳
Method Size Mean Error StdDev Ratio Code Size
Paeth 256 2,533.20 μs 5.760 μs 4.810 μs 1.00 386 B
VectorPaeth 256 256.20 μs 0.683 μs 0.570 μs 0.10 386 B
Avx2Paeth 256 65.25 μs 0.238 μs 0.223 μs 0.03 386 B
Paeth 1024 40,717.35 μs 62.354 μs 52.068 μs 1.00 386 B
VectorPaeth 1024 3,999.67 μs 18.097 μs 16.928 μs 0.10 386 B
Avx2Paeth 1024 1,335.79 μs 60.784 μs 53.884 μs 0.03 386 B
Paeth 4096 660,246.26 μs 1,574.296 μs 1,472.598 μs 1.00 386 B
VectorPaeth 4096 68,048.32 μs 206.054 μs 192.743 μs 0.10 386 B
Avx2Paeth 4096 19,532.50 μs 84.610 μs 79.145 μs 0.03 386 B
Average
Method Size Mean Error StdDev Ratio
Average 256 620.15 μs 5.117 μs 4.786 μs 1.00
Ssse3Average 256 64.73 μs 0.177 μs 0.165 μs 0.10
Avx2Average 256 33.87 μs 0.072 μs 0.060 μs 0.05
Average 1024 9,897.43 μs 46.592 μs 41.303 μs 1.00
Ssse3Average 1024 1,325.84 μs 7.321 μs 6.848 μs 0.13
Avx2Average 1024 1,055.26 μs 35.832 μs 33.517 μs 0.11
Average 4096 176,043.52 μs 400.279 μs 354.837 μs 1.00
Ssse3Average 4096 23,177.05 μs 162.236 μs 135.474 μs 0.13
Avx2Average 4096 17,104.39 μs 188.302 μs 166.925 μs 0.10
Up
Method Size Mean Error StdDev Ratio
Up 256 447.54 μs 0.267 μs 0.208 μs 1.00
VectorUp 256 52.68 μs 0.184 μs 0.163 μs 0.12
Avx2Up 256 28.81 μs 0.117 μs 0.097 μs 0.06
Up 1024 7,239.53 μs 34.194 μs 31.985 μs 1.00
VectorUp 1024 1,225.86 μs 12.761 μs 11.312 μs 0.17
Avx2Up 1024 1,001.42 μs 13.296 μs 11.103 μs 0.14
Up 4096 136,730.78 μs 495.719 μs 439.442 μs 1.00
VectorUp 4096 18,613.86 μs 127.692 μs 99.693 μs 0.14
Avx2Up 4096 16,700.44 μs 177.679 μs 166.201 μs 0.12
Sub
Method Size Mean Error StdDev Ratio
Sub 256 502.37 μs 1.479 μs 1.311 μs 1.00
VectorSub 256 50.32 μs 0.315 μs 0.294 μs 0.10
Avx2Sub 256 24.05 μs 0.117 μs 0.098 μs 0.05
Sub 1024 8,061.44 μs 29.533 μs 27.625 μs 1.00
VectorSub 1024 982.06 μs 15.100 μs 13.386 μs 0.12
Avx2Sub 1024 595.18 μs 25.069 μs 23.450 μs 0.07
Sub 4096 153,740.68 μs 385.423 μs 341.667 μs 1.00
VectorSub 4096 14,872.55 μs 111.238 μs 104.052 μs 0.10
Avx2Sub 4096 11,971.57 μs 34.938 μs 29.175 μs 0.08

@JimBobSquarePants
Copy link
Member

Excellent stuff, let's get it merged! 🚀

@JimBobSquarePants JimBobSquarePants merged commit af51960 into SixLabors:master May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants