Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define partitionKeys: fused version of restrictKeys and withoutKeys #975

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sergv
Copy link

@sergv sergv commented Oct 15, 2023

As mentioned in #158, sometimes we'd like to get results from both restrictKeys and withoutKeys for the same map and set. It can be done more efficiently by fusing traversals.

I named new function partitionKeys instead of partitionSet because the originals it's fusing end in *Keys so I believe this is more consistent.

Benchmarks show that new version is around 20-40% faster, depending on inputs. Here's a run with locally modified containers benchmarking suite that measures with even and odd keys floating around (for some reason odd keys show more speedup hence that's what I committed):

$ cabal run map-benchmarks -- -p '/restrictKeys+withoutKeys/ || /partitionKeys/'
All
  even
    restrictKeys+withoutKeys: OK
      75.5 μs ± 5.3 μs
    partitionKeys:            OK
      62.4 μs ± 5.6 μs, 0.83x
  odd
    restrictKeys+withoutKeys: OK
      194  μs ± 6.1 μs
    partitionKeys:            OK
      118  μs ±  12 μs, 0.61x

In the process of checking generated core I noticed that splitMember gets called with explicit Ord dictionary, so I changed it a bit so that it would specialize. I've only checked core on 9.6.2 though.

@sergv sergv changed the title Define partitionKEys: fused version of restrictKeys and withoutKeys Define partitionKeys: fused version of restrictKeys and withoutKeys Oct 15, 2023
@Bodigrim
Copy link
Contributor

Bodigrim commented Nov 1, 2023

@treeowl any chance to look at this please?

@treeowl
Copy link
Contributor

treeowl commented Nov 2, 2023

Yeah, I'll take a look. Sorry for the delay.

Copy link
Contributor

@konsumlamm konsumlamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is added, it should also be added for IntMaps.

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k `member` s) m, which is equally clear IMO, albeit maybe a bit slower.

Comment on lines 1949 to 1950
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'.
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Partition the map according to a set.
-- The first map contains the input 'Map' restricted to those keys found in the 'Set',
-- the second map contains the input 'Map' without all keys in the 'Set'.
-- This is more efficient than using ' restrictKeys' and 'withoutKeys' together.

-- m \`partitionKeys\` s = (m ``restrictKeys`` s, m ``withoutKeys`` s)
-- @
--
-- @since 0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.7 has already been released, so this needs to be updated.

@sergv
Copy link
Author

sergv commented Jan 1, 2024

I've updated the docs.

If this is added, it should also be added for IntMaps.

I agree that it would be nice to add partitionKeys for IntMap too. However thanks to different map structure the implementation's probably going to be more convoluted that for regular map. Given that I didn't find a usecase for this function on IntMaps I didn't implement it. I'd prefer to get partitionKeys for regular maps first - maybe someone else will be motivated to add one for int maps, who knows.

If it's a hard requirement to have the same functions in Data.Map and Data.IntMap I guess I can add a simple partitionKeys f xs = (withoutKeys f xs, restrictKeys f xs). More efficient version could come later, from me in case I'll need this function on IntMaps myself. Please let me know your perferences whether to do it.

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k member s) m, which is equally clear IMO, albeit maybe a bit slower.

I see the attached benchmark as a clue that it's worthwhile to extend API with partitionKeys. It does seem that fusion could be of benefit so library seems like a natural place to have it in.

One of the synthetic benchmarks shows 40% speedup - looks like pretty good speedup. Another (arguably equivalent) benchmarks shows 20% speedup - these numbers motivated me to do the PR since I want to get those speedups in my programs.

API growth is unfortunate, but what's the cost of using slower version? It affects all the users and the runtime cost is paid every time their programs run.

Regarding many other operations that can be fused together I don't think it's realistic to foresee them all and add them beforehand. There're many of those and it's not clear whether anyone actually needs it. I'd advocate for reactive approach like this PR - when someone finds a usecase for fusing some operations and is motivated enough to implement it then it could be considered for inclusion.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 1, 2024

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys).

It depends on your application, I use restrictKeys and withoutKeys a lot, they are crisp and idiomatic. In my line of work partitionKeys would be helpful as well.

There is a general API pattern take / drop / spanAt, takeWhile / dropWhile / span, filter / partition, and I think partitionKeys fits it nicely.

@sergv
Copy link
Author

sergv commented Jan 2, 2024

Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k member s) m, which is equally clear IMO, albeit maybe a bit slower.

I did benchmark partitionWithKey. My benchmark is at sergv@416888c. The results are:

$ TERM=dumb cabal run map-benchmarks --builddir /tmp/dist -- -p '/All.even/ || /All.odd/'
Created semaphore called cabal_semaphore_6 with 32 slots.
All
  even
    restrictKeys+withoutKeys: OK
      102  μs ± 6.0 μs
    partitionKeys:            OK
      83.3 μs ± 5.7 μs, 0.82x
    partitionWithKey:         OK
      284  μs ±  25 μs, 2.80x
  odd
    restrictKeys+withoutKeys: OK
      224  μs ±  21 μs
    partitionKeys:            OK
      140  μs ±  11 μs, 0.63x
    partitionWithKey:         OK
      298  μs ±  25 μs, 1.33x

All 6 tests passed (1.11s)

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower. In my particular use case I want faster alternative if it can be achieved with reasonable effort. I could have made a mistake in the benchmark though - please correct me if that's the case.

@Bodigrim
Copy link
Contributor

@treeowl just a gentle reminder to review.

@Bodigrim
Copy link
Contributor

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower.

I suppose you meant partitionWithKeys instead of partitionKeys?..


@treeowl I know that you are exceedingly busy, so I feel bad for being annoying, but I could benefit from a faster partitionKeys indeed. Could we possibly ask someone else to review (@meooow25?) to speed things up?

@sergv
Copy link
Author

sergv commented Mar 31, 2024

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower.

Yes, I meant partitionWithKeys (defined as \ks -> M.partitionWithKey (\k _ -> S.member k ks) m).

containers/src/Data/Set/Internal.hs Outdated Show resolved Hide resolved
containers/src/Utils/Containers/Internal/StrictTriple.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@meooow25 meooow25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now it's up to @treeowl

@sergv
Copy link
Author

sergv commented Oct 16, 2024

Bump please

@meooow25
Copy link
Contributor

meooow25 commented Oct 16, 2024

Hi, sorry for the delay. Though I reviewed the implementation previously and the code seems alright, as a maintainer now I am considering whether the function should be added to the library API.

There have been similar requests for performing multiple set operations at once (#162, #944). Like we have the flexible Map-Map merge API, and a proposed Set-Set API, maybe a Map-Set API is the better way forward here. The user will be free to however complex an operation they need. What do you think?


As an example, one can use Map merge to define paritionKeys on Maps.

import qualified Data.Map as M
import qualified Data.Map.Merge.Lazy as M
import qualified Data.Map.Internal as MI -- shame that we need internal but that's a different story

data Pair a = Pair !a !a
instance Functor Pair where
  fmap f (Pair x1 x2) = Pair (f x1) (f x2)
instance Applicative Pair where
  pure x = Pair x x
  liftA2 f (Pair x1 x2) (Pair y1 y2) = Pair (f x1 y1) (f x2 y2)

partitionKeys :: Ord k => M.Map k a -> M.Map k b -> (M.Map k a, M.Map k a)
partitionKeys m1 m2 = (\(Pair x1 x2) -> (x1,x2)) $ MI.mergeA
  (MI.WhenMissing (\t -> Pair M.empty t) (\_ x -> Pair Nothing (Just x)))
  M.dropMissing
  (M.zipWithMaybeAMatched (\_ x _ -> Pair (Just x) Nothing))
  m1
  m2
ghci> partitionKeys (M.fromList $ join zip [0,2..10]) (M.fromList $ join zip [0,3..10])
(fromList [(0,0),(6,6)],fromList [(2,2),(4,4),(8,8),(10,10)])

@sergv
Copy link
Author

sergv commented Oct 17, 2024

I like the idea of having a Map-Set API. Probably could be a bit of a challenge to make sure it optimizes the same as the concrete function but it feels like that should be attainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants