Enable SSE2 compression path to work on MSVC #2653

TrianglesPCT · 2021-05-14T22:42:45Z

The compile time detection uses the __SSE2__ predefined macro, but this does not exist on MSVC--resulting it that compiler always using scalar path.

On MSVC you can detect SSE support by checking for _M_AMD64, as when targeting x64 it requires SSE2 as a baseline, this patch adds this check.

msvc suport avx2 path

msvc

facebook-github-bot · 2021-05-14T22:42:49Z

Hi @TrianglesPCT!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

use 8bit

TrianglesPCT

.

switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads

facebook-github-bot · 2021-05-14T23:03:34Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

TrianglesPCT

Switch to unaligned load

Cyan4973 · 2021-05-14T23:07:32Z

Thanks @TrianglesPCT ,

did you attempt to measure performance differences provided by your PR ?

lib/compress/zstd_lazy.c

Switch to other comment style

Remove the AVX2 part

add space

AVX2

It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.

TrianglesPCT · 2021-05-15T01:41:11Z

Thanks @TrianglesPCT ,

did you attempt to measure performance differences provided by your PR ?

I removed the AVX2 stuff for a later PR

Cyan4973 · 2021-05-15T01:53:56Z

Well, even without AVX2,
this PR is presumed to help performance, because it's supposed to help SSE2 detection ?

TrianglesPCT · 2021-05-15T02:17:58Z

All it is doing is normalizing MSVC to target the same code path as GCC/Clang when SSE2 is set as a requirement for your compile target. Not adding any new code..

The defined(__SSE2__) check doesn't work on MSVC because that define doesn't exist even if SSE2 is enabled.

If you want specific performance comparison I can do that might be few days until I have time for that.

terrelln · 2021-05-15T03:23:03Z

If you want specific performance comparison I can do that might be few days until I have time for that.

That would be great! But that doesn't need to block merging this PR. You can easily measure performance with the builtin benchmark tool.

zstd -b5e12 silesia.tar

ghost · 2021-05-15T14:38:17Z

Will there be a hot fix for v1.5.0?

wolfpld · 2021-05-15T20:26:58Z

You can easily measure performance with the builtin benchmark tool.

The difference is rather spectacular. Results for 5950X.

Before:

 5#silesia.tar       : 211957760 ->  63806593 (3.322), 111.0 MB/s , 977.5 MB/s
 6#silesia.tar       : 211957760 ->  62980544 (3.365), 107.9 MB/s ,1001.6 MB/s
 7#silesia.tar       : 211957760 ->  61482185 (3.447),  73.1 MB/s ,1055.9 MB/s
 8#silesia.tar       : 211957760 ->  60914308 (3.480),  57.2 MB/s ,1076.1 MB/s
 9#silesia.tar       : 211957760 ->  59928587 (3.537),  52.1 MB/s ,1100.2 MB/s
10#silesia.tar       : 211957760 ->  59296205 (3.575),  50.3 MB/s ,1103.6 MB/s
11#silesia.tar       : 211957760 ->  59152928 (3.583),  47.7 MB/s ,1106.8 MB/s
12#silesia.tar       : 211957760 ->  58640204 (3.615),  30.6 MB/s ,1123.8 MB/s

After:

 5#silesia.tar       : 211957760 ->  63806593 (3.322), 160.8 MB/s , 980.0 MB/s
 6#silesia.tar       : 211957760 ->  62980544 (3.365), 153.3 MB/s ,1004.7 MB/s
 7#silesia.tar       : 211957760 ->  61482185 (3.447), 111.9 MB/s ,1056.3 MB/s
 8#silesia.tar       : 211957760 ->  60914308 (3.480),  90.9 MB/s ,1079.5 MB/s
 9#silesia.tar       : 211957760 ->  59928587 (3.537),  74.7 MB/s ,1101.9 MB/s
10#silesia.tar       : 211957760 ->  59296205 (3.575),  71.3 MB/s ,1107.0 MB/s
11#silesia.tar       : 211957760 ->  59152928 (3.583),  65.0 MB/s ,1113.2 MB/s
12#silesia.tar       : 211957760 ->  58640204 (3.615),  51.3 MB/s ,1129.6 MB/s

senhuang42

This is a great fix, thanks for the contribution!

TrianglesPCT added 3 commits May 14, 2021 16:32

Add files via upload

25bda90

msvc suport avx2 path

Add files via upload

52f44bb

msvc

Add files via upload

77d54eb

Update zstd_lazy.c

0b9f4bb

use 8bit

TrianglesPCT commented May 14, 2021

View reviewed changes

TrianglesPCT added 2 commits May 14, 2021 16:53

Update zstd_lazy.c

69ac124

Update zstd_lazy.c

0e07121

switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads

facebook-github-bot added the CLA Signed label May 14, 2021

TrianglesPCT commented May 14, 2021

View reviewed changes

terrelln reviewed May 15, 2021

View reviewed changes

lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved

lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved

lib/compress/zstd_lazy.c Outdated Show resolved Hide resolved

TrianglesPCT added 3 commits May 14, 2021 19:02

Update zstd_lazy.c

8f7ea1a

Switch to other comment style

Update zstd_lazy.c

a62856b

Remove the AVX2 part

Update zstd_lazy.c

bb1cdd8

add space

TrianglesPCT changed the title ~~Enable SSE2 compression path to work on MSVC, and add AVX2 match find~~ Enable SSE2 compression path to work on MSVC May 15, 2021

TrianglesPCT added 2 commits May 14, 2021 19:18

Add files via upload

d688ab1

AVX2

Update zstd_lazy.c

bee0ef5

It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.

Cyan4973 approved these changes May 15, 2021

View reviewed changes

senhuang42 approved these changes May 16, 2021

View reviewed changes

Cyan4973 merged commit 02ece5d into facebook:dev May 17, 2021

wolfpld added a commit to wolfpld/tracy that referenced this pull request May 18, 2021

Cherry-pick facebook/zstd#2653

b7832a2

cwoffenden mentioned this pull request May 21, 2021

SSE/Neon path for MSVC x86 and ARM #2680

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SSE2 compression path to work on MSVC #2653

Enable SSE2 compression path to work on MSVC #2653

TrianglesPCT commented May 14, 2021 •

edited

Loading

facebook-github-bot commented May 14, 2021

TrianglesPCT left a comment •

edited

Loading

facebook-github-bot commented May 14, 2021

TrianglesPCT left a comment

Cyan4973 commented May 14, 2021

TrianglesPCT commented May 15, 2021

Cyan4973 commented May 15, 2021

TrianglesPCT commented May 15, 2021

terrelln commented May 15, 2021

ghost commented May 15, 2021

wolfpld commented May 15, 2021

senhuang42 left a comment

Enable SSE2 compression path to work on MSVC #2653

Enable SSE2 compression path to work on MSVC #2653

Conversation

TrianglesPCT commented May 14, 2021 • edited Loading

facebook-github-bot commented May 14, 2021

Action Required

Process

TrianglesPCT left a comment • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented May 14, 2021

TrianglesPCT left a comment

Choose a reason for hiding this comment

Cyan4973 commented May 14, 2021

TrianglesPCT commented May 15, 2021

Cyan4973 commented May 15, 2021

TrianglesPCT commented May 15, 2021

terrelln commented May 15, 2021

ghost commented May 15, 2021

wolfpld commented May 15, 2021

senhuang42 left a comment

Choose a reason for hiding this comment

TrianglesPCT commented May 14, 2021 •

edited

Loading

TrianglesPCT left a comment •

edited

Loading