-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve branch misses on FSE symbol spreading #2750
Conversation
Some minor conversion warnings to take care off, |
15a4b5e
to
da095ed
Compare
int const n = normalizedCounter[s]; | ||
MEM_write64(spread + pos, sv); | ||
for (i = 8; i < n; i += 8) { | ||
MEM_write64(spread + pos + i, sv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we some form of guarantee that this won't overwrite beyond spread
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can over-write up to 8 bytes beyond spread
. So we need to make sure we have 1 << tableLog + 8
bytes available.
int const n = normalizedCounter[s]; | ||
MEM_write64(spread + pos, sv); | ||
for (i = 8; i < n; i += 8) { | ||
MEM_write64(spread + pos + i, sv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can over-write up to 8 bytes beyond spread
. So we need to make sure we have 1 << tableLog + 8
bytes available.
FSE_FUNCTION_TYPE* tableSymbol = (FSE_FUNCTION_TYPE*)(cumul + (maxSymbolValue + 2)); | ||
U16* cumul = (U16*)workSpace; /* size = maxSV1 */ | ||
FSE_FUNCTION_TYPE* tableSymbol = (FSE_FUNCTION_TYPE*)(cumul + (maxSV1+1)); /* size = tableSize */ | ||
BYTE* spread = tableSymbol + tableSize; /* size = tableSize */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size should be tableSize + 8
.
This PR simply adds the same optimization as @terrelln's small block FSE decoding which reduces branch misses.