Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve branch misses on FSE symbol spreading #2750

Merged
merged 1 commit into from
Aug 20, 2021

Conversation

senhuang42
Copy link
Contributor

This PR simply adds the same optimization as @terrelln's small block FSE decoding which reduces branch misses.

./zstd -b1 -B4K silesia.tar

Before:
2.9% cycles in FSE_buildCTable
7.3% branch misses in FSE_buildCTable
195 MB/s e2e

After:
2.0% cycles in FSE_buildCTable
3.37% branch misses in FSE_buildCTable
200 MB/s e2e

@Cyan4973
Copy link
Contributor

Some minor conversion warnings to take care off,
but otherwise, a fairly straightforward optimization.

@senhuang42 senhuang42 merged commit ae99854 into facebook:dev Aug 20, 2021
int const n = normalizedCounter[s];
MEM_write64(spread + pos, sv);
for (i = 8; i < n; i += 8) {
MEM_write64(spread + pos + i, sv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we some form of guarantee that this won't overwrite beyond spread ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can over-write up to 8 bytes beyond spread. So we need to make sure we have 1 << tableLog + 8 bytes available.

int const n = normalizedCounter[s];
MEM_write64(spread + pos, sv);
for (i = 8; i < n; i += 8) {
MEM_write64(spread + pos + i, sv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can over-write up to 8 bytes beyond spread. So we need to make sure we have 1 << tableLog + 8 bytes available.

FSE_FUNCTION_TYPE* tableSymbol = (FSE_FUNCTION_TYPE*)(cumul + (maxSymbolValue + 2));
U16* cumul = (U16*)workSpace; /* size = maxSV1 */
FSE_FUNCTION_TYPE* tableSymbol = (FSE_FUNCTION_TYPE*)(cumul + (maxSV1+1)); /* size = tableSize */
BYTE* spread = tableSymbol + tableSize; /* size = tableSize */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size should be tableSize + 8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants