-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve {Set,Map}.fromDistinct{Asc,Desc}List #950
Conversation
A faster and fusion-friendly implemention of the current strategy. On GHC 9.2.5: For Set this takes 56% less time when there is fusion and 30% when not. For Map this takes 55% less time when there is fusion and 16% when not.
Hi @treeowl, do you mind taking a look at this? |
I haven't forgotten (yet)! I will look at it as soon as I can. |
Hi, did you get a chance to take a look? |
@@ -3410,8 +3413,7 @@ instance (Ord k) => GHCExts.IsList (Map k v) where | |||
-- If the list contains more than one value for the same key, the last value | |||
-- for the key is retained. | |||
-- | |||
-- If the keys of the list are ordered, linear-time implementation is used, | |||
-- with the performance equal to 'fromDistinctAscList'. | |||
-- If the keys of the list are ordered, a linear-time implementation is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the same approach (and share code) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a rough attempt but it performed worse for the not-sorted case. I'll try to see if it can be improved, but it'd probably be best in a separate PR.
containers/src/Data/Set/Internal.hs
Outdated
linkAll (State0 stk) = foldl'Stack (\l x r -> link x l r) Tip stk | ||
linkAll (State1 l0 stk) = foldl'Stack (\l x r -> link x l r) l0 stk | ||
|
||
{-# INLINE fromDistinctDescList #-} -- INLINE for fusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you break this up to reduce the amount of code that needs to be inlined at each call site? I expect the pieces just need to be INLINABLE
to specialize, and the top level bit should probably do with that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next
surely needs to be inlined but maybe we can make linkTop
and linkAll
top level inlinable and let GHC decide. I'll see if that affects bechmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Things become worse if linkTop
is not INLINABLE
, but with it I don't see any changes in benchmark results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@treeowl does the change look good?
I'm sorry I've been so slow. I finally added some review comments. |
And leave further optimization to GHC.
Let's do it. Thanks! |
As discussed in #949, this is a different implementation of the current strategy that is faster and is a good consumer of the input list. However there is an increase in allocations in the no-fusion case.
Benchmarks
On GHC 9.2.5:
Before
After
I also decided to try the latest GHC 9.6.2 and it turns out to be better there.
Before
After:
Closes #949.