-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define partitionKeys: fused version of restrictKeys and withoutKeys #975
base: master
Are you sure you want to change the base?
Conversation
@treeowl any chance to look at this please? |
Yeah, I'll take a look. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is added, it should also be added for IntMap
s.
That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys
and withoutKeys
). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k `member` s) m
, which is equally clear IMO, albeit maybe a bit slower.
containers/src/Data/Map/Internal.hs
Outdated
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys | ||
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys | |
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'. | |
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Partition the map according to a set. | |
-- The first map contains the input 'Map' restricted to those keys found in the 'Set', | |
-- the second map contains the input 'Map' without all keys in the 'Set'. | |
-- This is more efficient than using ' restrictKeys' and 'withoutKeys' together. |
containers/src/Data/Map/Internal.hs
Outdated
-- m \`partitionKeys\` s = (m ``restrictKeys`` s, m ``withoutKeys`` s) | ||
-- @ | ||
-- | ||
-- @since 0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.7 has already been released, so this needs to be updated.
I've updated the docs.
I agree that it would be nice to add If it's a hard requirement to have the same functions in
I see the attached benchmark as a clue that it's worthwhile to extend API with One of the synthetic benchmarks shows 40% speedup - looks like pretty good speedup. Another (arguably equivalent) benchmarks shows 20% speedup - these numbers motivated me to do the PR since I want to get those speedups in my programs. API growth is unfortunate, but what's the cost of using slower version? It affects all the users and the runtime cost is paid every time their programs run. Regarding many other operations that can be fused together I don't think it's realistic to foresee them all and add them beforehand. There're many of those and it's not clear whether anyone actually needs it. I'd advocate for reactive approach like this PR - when someone finds a usecase for fusing some operations and is motivated enough to implement it then it could be considered for inclusion. |
It depends on your application, I use There is a general API pattern |
I did benchmark
So far it doesn't look like |
@treeowl just a gentle reminder to review. |
I suppose you meant @treeowl I know that you are exceedingly busy, so I feel bad for being annoying, but I could benefit from a faster |
Yes, I meant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, now it's up to @treeowl
Bump please |
Hi, sorry for the delay. Though I reviewed the implementation previously and the code seems alright, as a maintainer now I am considering whether the function should be added to the library API. There have been similar requests for performing multiple set operations at once (#162, #944). Like we have the flexible Map-Map merge API, and a proposed Set-Set API, maybe a Map-Set API is the better way forward here. The user will be free to however complex an operation they need. What do you think? As an example, one can use import qualified Data.Map as M
import qualified Data.Map.Merge.Lazy as M
import qualified Data.Map.Internal as MI -- shame that we need internal but that's a different story
data Pair a = Pair !a !a
instance Functor Pair where
fmap f (Pair x1 x2) = Pair (f x1) (f x2)
instance Applicative Pair where
pure x = Pair x x
liftA2 f (Pair x1 x2) (Pair y1 y2) = Pair (f x1 y1) (f x2 y2)
partitionKeys :: Ord k => M.Map k a -> M.Map k b -> (M.Map k a, M.Map k a)
partitionKeys m1 m2 = (\(Pair x1 x2) -> (x1,x2)) $ MI.mergeA
(MI.WhenMissing (\t -> Pair M.empty t) (\_ x -> Pair Nothing (Just x)))
M.dropMissing
(M.zipWithMaybeAMatched (\_ x _ -> Pair (Just x) Nothing))
m1
m2 ghci> partitionKeys (M.fromList $ join zip [0,2..10]) (M.fromList $ join zip [0,3..10])
(fromList [(0,0),(6,6)],fromList [(2,2),(4,4),(8,8),(10,10)]) |
I like the idea of having a Map-Set API. Probably could be a bit of a challenge to make sure it optimizes the same as the concrete function but it feels like that should be attainable. |
As mentioned in #158, sometimes we'd like to get results from both
restrictKeys
andwithoutKeys
for the same map and set. It can be done more efficiently by fusing traversals.I named new function
partitionKeys
instead ofpartitionSet
because the originals it's fusing end in*Keys
so I believe this is more consistent.Benchmarks show that new version is around 20-40% faster, depending on inputs. Here's a run with locally modified containers benchmarking suite that measures with even and odd keys floating around (for some reason odd keys show more speedup hence that's what I committed):
In the process of checking generated core I noticed that
splitMember
gets called with explicitOrd
dictionary, so I changed it a bit so that it would specialize. I've only checked core on9.6.2
though.