Use new paramSwitch enum for row matchfinder and block splitter #2788

senhuang42 · 2021-09-20T16:00:14Z

This PR introduces a minor refactor that adds a new param that generalizes the idea of a boolean parameter that can be user-enabled, user-disabled, or determined at runtime by the library what the value should be.

The two discussion points are:
Should the unstable ZSTD_literalCompressionMode_e get moved to this as well, with the associated advanced param renamed to ZSTD_c_useLiteralCompression (like *_useRowMatchFinder and *_useBlockSplitter)?

LDM falls into this category, but the advanced parameter ZSTD_c_enableLongDistanceMatching is already stable, and therefore the type can't be changed to the new enum type. Internally, we could still implement the selection logic as the new enum type, but then the advanced param and the internals would diverge, unlike any of the other paramSwitch-type features.

Note: #2780 will be rebased on top of this

lib/compress/zstd_compress.c

lib/zstd.h

Cyan4973 · 2021-09-20T16:39:27Z

LDM falls into this category, but the advanced parameter ZSTD_c_enableLongDistanceMatching is already stable, and therefore the type can't be changed to the new enum type. Internally, we could still implement the selection logic as the new enum type, but then the advanced param and the internals would diverge, unlike any of the other paramSwitch-type features.

I believe that's nonetheless a good way forward, and likely a better situation than the one we are currently in.
We are effectively already enabling ldm as an auto mode already, even though it's "officially" set to 0==off.

The way I see it, we could employ this strategy :

Internally, the flag does respect the new paramSwitch model, with 0 == auto as the starting value.
At interface level, auto doesn't exist. The API only allows selecting "force disable" or "force enable". Hence 0 -> 1 and 1 -> 2.
- This way, it respects the current interface contract, which remains "stable".
And it still allows automatic ldm decision, as we effectively do today (ldm is automatically enabled when windowLog >= 27).
The downside is that there is no way to "set" ldm to auto. The only way to have auto is to not set ldm at all, or to reset the context.
If it was a problem, we could introduce a new advanced parameter just to set ldm to auto. That being said, I don't see such as need for the time being.

terrelln · 2021-09-20T16:58:56Z

If it was a problem, we could introduce a new advanced parameter just to set ldm to auto. That being said, I don't see such as need for the time being.

Yeah, if we need this functionality, we can just add a new parameter like ZSTD_useLongDistanceMatching (or whatever naming scheme we choose for these 3-state parameters), and eventually deprecate ZSTD_enableLongDistanceMatching.

Cyan4973 · 2021-09-20T17:30:39Z

I did not mention it earlier, but just as a confirmation :

Should the unstable ZSTD_literalCompressionMode_e get moved to this as well, with the associated advanced param renamed to ZSTD_c_useLiteralCompression

Yes, we should. It will be much cleaner.

terrelln · 2021-09-20T21:57:31Z

tests/regression/results.csv

@@ -236,17 +236,17 @@ silesia,                            level 1,                            advanced
 silesia,                            level 3,                            advanced one pass,                  4849553
 silesia,                            level 4,                            advanced one pass,                  4786968
 silesia,                            level 5 row 1,                      advanced one pass,                  4640752
-silesia,                            level 5 row 2,                      advanced one pass,                  4638961


This PR introduces a lot of insignificant changes (which is fine). Can you point out exactly which lines introduce functional changes?

Even better would be to separate into two commits (or two PRs). One to switch to ZSTD_ParamSwitch_e, and one to make the functional change.

Actually there should only be insignificant changes and no real functional changes - these changes appear to be a bug where I missed one of the params in regressiontest.

Turns out I introduced a bug where I was incorrectly passing in the value of the useRowMatchFinder param into the function resolveBlockSplitterMode, causing block splitter to always get enabled when row hash is. Perils of copy-paste..

I split this PR into two commits, one for ldm, since it's a bit different, and one for the rest. Basically there are no real changes, and every change that isn't a direct name replacement is a small adjustment in order to accommodate the naming change or int->enum changes. All the results.csv changes are just switching between 1 and 2 in row match finder, since the meaning of 1 and 2 changed for rowhash's enum. Lmk if there's is a better scheme to divide this up for the sake of making review easier.

facebook-github-bot added the CLA Signed label Sep 20, 2021

Cyan4973 reviewed Sep 20, 2021

View reviewed changes

lib/compress/zstd_compress.c Show resolved Hide resolved

senhuang42 force-pushed the param_switch branch from ef95f6c to 208c114 Compare September 20, 2021 16:27

Cyan4973 reviewed Sep 20, 2021

View reviewed changes

lib/zstd.h Show resolved Hide resolved

senhuang42 force-pushed the param_switch branch from 208c114 to 4e4ad30 Compare September 20, 2021 21:21

senhuang42 marked this pull request as draft September 20, 2021 21:23

terrelln reviewed Sep 20, 2021

View reviewed changes

senhuang42 force-pushed the param_switch branch 3 times, most recently from acb3905 to 716926d Compare September 21, 2021 18:14

senhuang42 marked this pull request as ready for review September 21, 2021 18:19

senhuang42 added 2 commits September 21, 2021 14:22

Use new paramSwitch enum for LCM, row matchfinder, and block splitter

b5c35d7

Use new paramSwitch enum for LDM

06f42c3

senhuang42 force-pushed the param_switch branch from 716926d to 06f42c3 Compare September 21, 2021 18:22

senhuang42 mentioned this pull request Sep 22, 2021

Reduce stack usage of block splitter #2780

Merged

Cyan4973 approved these changes Sep 22, 2021

View reviewed changes

terrelln approved these changes Sep 22, 2021

View reviewed changes

senhuang42 merged commit 1e99d36 into facebook:dev Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new paramSwitch enum for row matchfinder and block splitter #2788

Use new paramSwitch enum for row matchfinder and block splitter #2788

senhuang42 commented Sep 20, 2021 •

edited

Loading

Cyan4973 commented Sep 20, 2021 •

edited

Loading

terrelln commented Sep 20, 2021

Cyan4973 commented Sep 20, 2021

terrelln Sep 20, 2021

senhuang42 Sep 21, 2021

senhuang42 Sep 21, 2021

Use new paramSwitch enum for row matchfinder and block splitter #2788

Use new paramSwitch enum for row matchfinder and block splitter #2788

Conversation

senhuang42 commented Sep 20, 2021 • edited Loading

Cyan4973 commented Sep 20, 2021 • edited Loading

terrelln commented Sep 20, 2021

Cyan4973 commented Sep 20, 2021

terrelln Sep 20, 2021

Choose a reason for hiding this comment

senhuang42 Sep 21, 2021

Choose a reason for hiding this comment

senhuang42 Sep 21, 2021

Choose a reason for hiding this comment

senhuang42 commented Sep 20, 2021 •

edited

Loading

Cyan4973 commented Sep 20, 2021 •

edited

Loading