-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fusible Set.fromDistinctAscList definition #949
Comments
I just realized it is safe to use
But I suspect this can be adapted to the current definition too, so I would not consider this in favor of the new definition. |
Your description reminds me of the way we used to do it for |
The How my implementation above works is very similar to counting up in binary:
But a little different because perfect binary trees have size Also note that my implementation constructs exactly the same tree as the current algorithm, linking all the same trees, it's just done in a different, and I would say slightly simpler, way. Here's a cleaned up version: data SetPart a
= PartL !(Set a) -- (PartL l) invariant: l is perfect
| PartLM !(Set a) !a -- (PartLM l x) invariant: l is a perfect and maximum l < x
fromDistinctAscList :: forall a. [a] -> Set a
fromDistinctAscList = List.foldl' mergePart Tip . List.foldl' next []
where
next :: [SetPart a] -> a -> [SetPart a]
next (PartL l : parts) !x = PartLM l x : parts
next parts0 x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
where
mergeInto !r (PartLM l x : parts)
| sz r == sz l = mergeInto (bin x l r) parts
mergeInto l parts = PartL l : parts
mergePart :: Set a -> SetPart a -> Set a
mergePart r (PartL l) = merge l r
mergePart r (PartLM l x) = link x l r
sz :: Set a -> Int
sz (Bin s _ _ _) = s
sz Tip = error "impossible"
There is a property that data SetPart a = PartLM !(Set a) !a
data SetBuildState a
= StateEven [SetPart a]
| StateOdd [SetPart a] !(Set a)
fromDistinctAscList :: [a] -> Set a
fromDistinctAscList = mergeParts . List.foldl' next (StateEven [])
where
next (StateOdd parts l) !x = StateEven (PartLM l x : parts)
next (StateEven parts0) x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
where
mergeInto !r (PartLM l x : parts)
| sz r == sz l = mergeInto (bin x l r) parts
mergeInto l parts = StateOdd parts l
mergeParts (StateOdd parts r) = List.foldl' mergePart r parts
mergeParts (StateEven parts) = List.foldl' mergePart Tip parts
mergePart r (PartLM l x) = link x l r
sz (Bin s _ _ _) = s
sz Tip = error "impossible" But this performs slightly worse.
About rewrite rules, I'll test it out but I'm not sure it's worth the complexity. It would also be atypical because I've only seen rewrite back rules for good consumer + producers, but here we're only working with a good consumer. |
Don't worry about "atypical". Such rules are justified whenever another implementation is faster, thriftier, or smaller when fusion doesn't occur. |
I was able to get the rewrite rules working as "Set.fromDistinctAscList" [~1]
forall xs. fromDistinctAscList xs = List.foldr foo bar xs baz
"Set.fromDistinctAscList back" [1]
forall xs. List.foldr foo bar xs baz = fromDistinctAscList xs
But I was wrong about this and I can't find a way to make the current definition as fast as the new one. So I say let's go with only the new definition. |
Other things being roughly equal, a substantial increase in allocation is bad, particularly for concurrent programs. So I would prefer to get the version with rewrite rules. |
I wouldn't say it's roughly equal. Current:
Current, that I tried to optimize to use
The new implementation (in my comment above):
Another version of the new implementation where I tried to adopt the
The new version takes at least 25% less time compared to current. |
Oh, it looks like I misread the timings. Impressive! Could you open a PR? |
I see that |
Thanks for the merge 🎉 |
I was curious if
Set.fromDistinctAscList
could be written to fuse with the input list, so I gave it a shot. Currently it looks like:containers/containers/src/Data/Set/Internal.hs
Lines 1175 to 1190 in 67752b2
And here's what I got:
The idea is that we keep a stack of partially constructed sets as we go along the list, and merge them whenever we get the chance.
We can do a similar thing for
Map
too.Now how does it compare to the original definition? Let's benchmark.
With GHC 9.2.5:
Current:
New:
It's a lot better in the second case because it doesn't construct the list. In the first case, the time doesn't change but it does allocate more, so it's not a clear win.
So, what do you think about this definition? Is it worth changing?
I would guess
fromDistinctAscList [a..b]
is a common usage and would benefit from this change.As an aside, I want to try the same thing with
fromList
, but this seemed simpler to try first.The text was updated successfully, but these errors were encountered: