Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new paramSwitch enum for row matchfinder and block splitter #2788

Merged
merged 2 commits into from
Sep 22, 2021

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented Sep 20, 2021

This PR introduces a minor refactor that adds a new param that generalizes the idea of a boolean parameter that can be user-enabled, user-disabled, or determined at runtime by the library what the value should be.

The two discussion points are:
Should the unstable ZSTD_literalCompressionMode_e get moved to this as well, with the associated advanced param renamed to ZSTD_c_useLiteralCompression (like *_useRowMatchFinder and *_useBlockSplitter)?

LDM falls into this category, but the advanced parameter ZSTD_c_enableLongDistanceMatching is already stable, and therefore the type can't be changed to the new enum type. Internally, we could still implement the selection logic as the new enum type, but then the advanced param and the internals would diverge, unlike any of the other paramSwitch-type features.

Note: #2780 will be rebased on top of this

lib/zstd.h Show resolved Hide resolved
@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 20, 2021

LDM falls into this category, but the advanced parameter ZSTD_c_enableLongDistanceMatching is already stable, and therefore the type can't be changed to the new enum type. Internally, we could still implement the selection logic as the new enum type, but then the advanced param and the internals would diverge, unlike any of the other paramSwitch-type features.

I believe that's nonetheless a good way forward, and likely a better situation than the one we are currently in.
We are effectively already enabling ldm as an auto mode already, even though it's "officially" set to 0==off.

The way I see it, we could employ this strategy :

  • Internally, the flag does respect the new paramSwitch model, with 0 == auto as the starting value.
  • At interface level, auto doesn't exist. The API only allows selecting "force disable" or "force enable". Hence 0 -> 1 and 1 -> 2.
    • This way, it respects the current interface contract, which remains "stable".
  • And it still allows automatic ldm decision, as we effectively do today (ldm is automatically enabled when windowLog >= 27).
  • The downside is that there is no way to "set" ldm to auto. The only way to have auto is to not set ldm at all, or to reset the context.
  • If it was a problem, we could introduce a new advanced parameter just to set ldm to auto. That being said, I don't see such as need for the time being.

@terrelln
Copy link
Contributor

If it was a problem, we could introduce a new advanced parameter just to set ldm to auto. That being said, I don't see such as need for the time being.

Yeah, if we need this functionality, we can just add a new parameter like ZSTD_useLongDistanceMatching (or whatever naming scheme we choose for these 3-state parameters), and eventually deprecate ZSTD_enableLongDistanceMatching.

@Cyan4973
Copy link
Contributor

I did not mention it earlier, but just as a confirmation :

Should the unstable ZSTD_literalCompressionMode_e get moved to this as well, with the associated advanced param renamed to ZSTD_c_useLiteralCompression

Yes, we should. It will be much cleaner.

@@ -236,17 +236,17 @@ silesia, level 1, advanced
silesia, level 3, advanced one pass, 4849553
silesia, level 4, advanced one pass, 4786968
silesia, level 5 row 1, advanced one pass, 4640752
silesia, level 5 row 2, advanced one pass, 4638961
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a lot of insignificant changes (which is fine). Can you point out exactly which lines introduce functional changes?

Even better would be to separate into two commits (or two PRs). One to switch to ZSTD_ParamSwitch_e, and one to make the functional change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there should only be insignificant changes and no real functional changes - these changes appear to be a bug where I missed one of the params in regressiontest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out I introduced a bug where I was incorrectly passing in the value of the useRowMatchFinder param into the function resolveBlockSplitterMode, causing block splitter to always get enabled when row hash is. Perils of copy-paste..

I split this PR into two commits, one for ldm, since it's a bit different, and one for the rest. Basically there are no real changes, and every change that isn't a direct name replacement is a small adjustment in order to accommodate the naming change or int->enum changes. All the results.csv changes are just switching between 1 and 2 in row match finder, since the meaning of 1 and 2 changed for rowhash's enum. Lmk if there's is a better scheme to divide this up for the sake of making review easier.

@senhuang42 senhuang42 force-pushed the param_switch branch 3 times, most recently from acb3905 to 716926d Compare September 21, 2021 18:14
@senhuang42 senhuang42 marked this pull request as ready for review September 21, 2021 18:19
@senhuang42 senhuang42 merged commit 1e99d36 into facebook:dev Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants