-
Notifications
You must be signed in to change notification settings - Fork 21
Home
For test results, look here.
For comparison we used Win32 assembly-based optimized version (Asm
), Win32 C-based version (C
), Win32 DLL-based version (DLL
) and Win32 C-based version built from original sources (Orig
). "DLL based version" utilized prebuilt zlib 1.2.3 obtained from Winimage zlib page. These DLL files were built from zlib with official Assembly patches.
Also performance-oriented zlib fork, zlib-ng was tested (this is a collection of zlib optimization patches borrowed from different sources, most are very platform-specific).
Tests were built using Visual Studio 2019 with speed optimization. Test system is Windows 8.1 64-bit, Intel Core i7-4702MQ @ 2.2 GHz.
For testing we used 3 data sets.
- Unreal engine 4 source code (Engine/Source)
- Unreal engine 4 binaries (Engine/Binaries/Win64)
- Graphical files containing geometry and textures in tif and png formats.
All data sets contains up to 256 Mb of data (Unreal engine 4 source code is smaller, 137Mb).
For 64-bit target, original C implementation (Orig
) is 6-10% slower than 32-bit build of Orig
. The same is for DLL
build. However optimized (fast_zlib) C version (C
) in 64 bits is faster than optimized 32-bit code. In other words: original code in 64 bits works slower than original 32-bit code, but 64-bit optimized code works faster than optimized 32-bit code. Therefore, code gets additional performance boost (compared to Orig
version) just because of its 64-bitness.
For 32-bit target, optimized assembly version is ~5% faster than optimized C version, so 64-bit C
version performance is somewhere in between 32-bit Asm
and 32-bit C
. This is the reason why I didn't provide 64-bit assembly version of algorithm.
I've tested library on 32-bit Ubuntu, compiled with GCC 5.4.0. Optimized code performs 1-4% slower than Win32 version. Original zlib implementation performs 7-8% slower.
Test mode is name of compared version, mentioned in Compared versions paragraph. The following number is compression level. Each table cell contains data in following format: <elapsed time> / <compressed size> / <compression speed>
.
As you may see, Asm version is just a little bit faster than C version. Optimized version performs nearly 2.5-10 times faster than original C version. Thank slower compression, than more performance improvement achieved.
Test application was designed to exclude file access times as much as possible. Both compression and decompression are performed in-memory, file reading operations are excluded. So, the table below shows "clean" compression results. Test application source code could be found here.
Current tests are for zlib version 1.2.11.
I've re-tested zlib with use of Visual C++ 2019 compiler (previous tests were made with VS 2013). Optimized C code works 7-12% faster. "Original" implementation works with nearly same speed.
When tested with compression level 1 (lowest one), optimized zlib generates exactly the same archive, with nearly same performance as original zlib. It was expected because new algorithm is intended to be used with higher level settings, so it performs like original code. Zlib-ng performs differently. 32-bit Windows version simply crashes with all test data. 64-bit version doesn't crash, and works 2-3 times faster than others, however it generates archives which are ~30% larger than produced with original zlib. In a case of data which are poorly compressable, the difference is not 30%, it's smaller, but just because test data compressed with ratio 1.17 with original zlib, and 1.02 with zlib-ng (there's almost no way to compress it worse).
This release performs 15-35% faster than Release 1. Than slower compression, than more performance boost we'll get.
Zlib-ng compiled for Win64 performs is nearly 10-15% faster than zlib-ng for Win32. This zlib version has some gcc-specific optimizations, so probably if I'd build it with use of gcc, it would work faster. Visual C build didn't show any significant performance improvements, and even sometimes works slower than original zlib code.
Test mode | Source code | Binaries | Geometry data |
---|---|---|---|
Asm -9 | 5.2s / 28858611 b / 26.49 Mb/s | 12.3s / 54473779 b / 20.74 Mb/s | 17.2s / 114915875 b / 14.85 Mb/s |
C -9 | 5.1s / 28858611 b / 27.08 Mb/s | 12.6s / 54473779 b / 20.29 Mb/s | 18.2s / 114915875 b / 14.07 Mb/s |
zlib-ng -9 | 9.5s / 28859066 b / 14.53 Mb/s | 45.3s / 54485028 b / 5.65 Mb/s | 128.1s / 114934697 b / 2.00 Mb/s |
Dll -9 | 9.6s / 28859103 b / 14.37 Mb/s | 41.7s / 54485836 b / 6.14 Mb/s | 145.9s / 114942962 b / 1.75 Mb/s |
Orig -9 | 10.7s / 28859103 b / 12.84 Mb/s | 49.1s / 54485836 b / 5.21 Mb/s | 180.0s / 114942962 b / 1.42 Mb/s |
Note: used Visual Studio 2013 compiler
Source data size is 202.1 Mb. Note: win32 zlib-ng crashed when compressing with level 1, so this test was performed with win64 (therefore no Asm
test there). Take care when looking at compression speed for zlib-ng/1 - take a look at size of compressed archive.
Test mode | Level 9 | Level 5 | Level 1 |
---|---|---|---|
Asm | 9.7s / 67634024 b / 20.84 Mb/s | 6.9s / 68858828 b / 29.26 Mb/s | N/A |
C | 9.7s / 67634024 b / 20.90 Mb/s | 6.8s / 68858828 b / 29.82 Mb/s | 3.2s / 77259877 b / 63.52 Mb/s |
zlib-ng | 23.5s / 67641873 b / 8.61 Mb/s | 6.7s / 69102052 b / 30.06 Mb/s | 1.4s / 100862999 b / 147.43 Mb/s |
Dll | 21.8s / 67643258 b / 9.29 Mb/s | 7.3s / 69162276 b / 27.78 Mb/s | 3.7s / 77259877 b / 55.30 Mb/s |
Orig | 23.7s / 67643258 b / 8.51 Mb/s | 6.9s / 69162276 b / 29.23 Mb/s | 3.3s / 77259877 b / 62.10 Mb/s |
Note: used Visual Studio 2019 compiler, zlib-ng
and Dll
were compiled with older compiler
Test mode | Source code | Binaries | Geometry data |
---|---|---|---|
Asm -9 | 11.8s / 51136013 b / 21.65 Mb/s | 15.4s / 54456154 b / 16.64 Mb/s | 25.4s / 114917505 b / 10.07 Mb/s |
C -9 | 12.3s / 51136013 b / 20.81 Mb/s | 16.1s / 54456154 b / 15.88 Mb/s | 26.6s / 114917505 b / 9.63 Mb/s |
Dll -9 | 22.6s / 51145811 b / 11.35 Mb/s | 42.0s / 54485836 b / 6.10 Mb/s | 150.0s / 114942962 b / 1.71 Mb/s |
Orig -9 | 26.3s / 51145811 b / 9.75 Mb/s | 52.6s / 54485836 b / 4.87 Mb/s | 184.4s / 114942962 b / 1.39 Mb/s |
Note: used Visual Studio 2010 compiler