-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable fusion of unsafePackLen{Bytes,Chars} #494
base: master
Are you sure you want to change the base?
Conversation
@noughtmare could you please fix warnings? |
Please submit benchmarks as well. |
I've added these benchmarks: , bgroup "pack"
[ bench "not fused" $ nf S.pack (replicate nRepl 0)
, bench "fused" $ nf (S.pack . replicate nRepl) 0
]
, bgroup "unsafePackLenBytes"
[ bench "not fused" $ nf (SI.unsafePackLenBytes nRepl) (replicate nRepl 0)
, bench "fused" $ nf (SI.unsafePackLenBytes nRepl . replicate nRepl) 0
]
, bgroup "unsafePackLenChar"
[ bench "not fused" $ nf (SI.unsafePackLenChars nRepl) (replicate nRepl 'A')
, bench "fused" $ nf (SI.unsafePackLenChars nRepl . replicate nRepl) 'A'
] Before
After
If the fusion works out there is a 20x speedup, but otherwise it is slightly slower. Update: It seems that it is just the inlining which makes things a tiny bit worse in this case. Also, note that this function is only exposed via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do the same for ShortByteString
?
bytestring/Data/ByteString/Short/Internal.hs
Lines 558 to 566 in 1256378
packLenBytes :: Int -> [Word8] -> ShortByteString | |
packLenBytes len ws0 = | |
create len (\mba -> go mba 0 ws0) | |
where | |
go :: MBA s -> Int -> [Word8] -> ST s () | |
go !_ !_ [] = return () | |
go !mba !i (w:ws) = do | |
writeWord8Array mba i w | |
go mba (i+1) ws |
unsafeCreate len . foldr | ||
(\x go p -> poke p x >> go (p `plusPtr` 1)) | ||
(\_ -> return ()) | ||
{-# INLINE unsafePackLenBytes #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why these functions have not been inlined before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the benchmarks show, if no fusion happens then there is no benefit in inlining (it even costs a few percents of performance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now think this is actually quite a big problem, because the only way this function is exposed in the safe API is through pack
which can never fuse anyway, so is this patch really still desirable? We would need a safe packLen
function to really get the benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a version of pack
that is fusable (by using dynamic arrays doubling in size):
https://gist.github.com/noughtmare/8902ddec65fedae9fb24d3826dbe11a0
This again has the problem that it is slightly slower for short inputs if not fused (around x1.5), but a lot faster if it is fused.
@noughtmare what's the status of this? Could you please rebase? |
I was kind of waiting for a response on this thread:
So the options are:
I'd prefer 2 or 3 |
The existing |
I'd rather avoid rules with phase control, they are such a nightmare to work with. |
Marking as draft for now. |
Here's a benchmark:
Before:
After: