Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we de-duplicate Set/Map by leveraging common structure? #851

Open
alexfmpe opened this issue Aug 31, 2022 · 10 comments
Open

Can we de-duplicate Set/Map by leveraging common structure? #851

alexfmpe opened this issue Aug 31, 2022 · 10 comments

Comments

@alexfmpe
Copy link
Contributor

Looking at #817 's

argSet :: Map k a -> Set.Set (Arg k a)
argSet Tip = Set.Tip
argSet (Bin sz kx x l r) = Set.Bin sz (Arg kx x) (argSet l) (argSet r)

it seems the two structures are the same "up to parenthesization"
I wonder if one could exploit this to de-duplicate a good deal of code.

It's not as simple as

newtype Map k a = Map (Set (Arg k a))

since

type role Map nominal representational
type role Set nominal

causes

src/Data/Map/Internal.hs:465:1-38: error:
    • Role mismatch on variable a:
        Annotation says representational but role nominal is required
    • while checking a role annotation for ‘Map’
    |
465 | type role Map nominal representational
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

but maybe we could have the binary tree structure be its own type and then

newtype Set a = Set (BinTree a)
newtype Map k v = Map (BinTree (Arg k v))

allowing a good deal of functions to simply delegate

null :: Set a -> Bool
null (Set t) = BinTree.null t

null :: Map k a -> Bool
null (Map t) = BinTree.null t

There are obvious examples like null, size, member but Set and Map.Lazy may have more similarities - insert seems pretty much the same "up to parenthesization".

It's not obvious to me whether Map can be decomposed like this, since the extra indirection (in this case data Arg a b) interferes with lazyness and performance.

@treeowl
Copy link
Contributor

treeowl commented Aug 31, 2022

I believe it would be possible with Backpack, though I don't know how to use that. Unfortunately, stack didn't support Backpack last I checked, and I don't know if anyone's really developing it at the moment.

@alexfmpe
Copy link
Contributor Author

alexfmpe commented Aug 31, 2022

Currently we have

data Set a
  = Bin {-# UNPACK #-} !Size !a !(Set a) !(Set a)
  | Tip

data Map k a
  = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
  | Tip

which seems to be the same structure we'd get with

{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE UnliftedNewtypes #-}

import GHC.Exts

data BinTree (a :: TYPE r)
  = Bin {-# UNPACK #-} !Size !a !(BinTree a) !(BinTree a)
  | Tip

newtype Set a = Set (BinTree a)
newtype Map k v = Map (BinTree (Arg' k v))
newtype Arg' a b = Arg' (# !a, b #)

unfortunately UnliftedNewtypes requires GHC 8.10

@treeowl
Copy link
Contributor

treeowl commented Aug 31, 2022

I see no newtypes there. As far as I know, we still don't have a story for functions that actually work with that kind of RuntimeRep polymorphism. Even if you can define the type, you can't (AFAIK) define functions to share code across the representations.

@alexfmpe
Copy link
Contributor Author

I see no newtypes there.

Last 3 lines?

Even if you can define the type, you can't (AFAIK) define functions to share code across the representations.

Actually looks like I can't even define it. Something in my setup was making ghc extra lenient somehow - running ghc directly fails on two places

Main.hs:7:5: error:
    • A levity-polymorphic type is not allowed here:
        Type: a
        Kind: TYPE r
    • In the definition of data constructor ‘Bin’
      In the data type declaration for ‘BinTree’
  |
7 |   = Bin {-# UNPACK #-} !Size !a !(BinTree a) !(BinTree a)

Main.hs:13:28: error:
    • Unexpected strictness annotation: !a
      strictness annotation cannot appear nested inside a type
    • In the type ‘(# !a, b #)’
      In the definition of data constructor ‘Arg'’
      In the newtype declaration for ‘Arg'’
   |
13 | newtype Arg' a b = Arg' (# !a, b #)

Deadend it seems

@alexfmpe
Copy link
Contributor Author

If/when levity polymorphism supports the above, BinTree can probably also be re-used for Data.Dependent.Map.
Though to avoid performance regression from introducing some Data.Some equivalents it might also require first-class existentials to be approved/merged. Something like

newtype DMap k v = DMap (BinTree (exists x. Arg (k x) (v x)))

@AndreasPK
Copy link
Contributor

I believe it would be possible with Backpack, though I don't know how to use that. Unfortunately, stack didn't support Backpack last I checked, and I don't know if anyone's really developing it at the moment.

Nothing much has changed. See commercialhaskell/stack#2540 and commercialhaskell/stack#4745

@AndreasPK
Copy link
Contributor

Actually looks like I can't even define it. Something in my setup was making ghc extra lenient somehow - running ghc directly fails on two places

newtype Arg' a b = Arg' (# !a, b #)

newtypes can't change the semantics of the types they wrap in this way. Here you try to say "make a newtype of a unboxed tuple, but also make the tuple strict in it's first field.

Sadly there is currently also no type that can express the semantics you want here. You would either need strict unboxed tuples: https://gitlab.haskell.org/ghc/ghc/-/issues/17001 (seems unlikely to happen) or one of the things discussed in https://gitlab.haskell.org/ghc/ghc/-/issues/21617 (which I'm working on in my spare time)


In theory GHC could one day support a type like:

data BinTree (a :: TYPE (BoxedRep r))
    = Bin {-# UNPACK #-} !Int !a !(BinTree a) !(BinTree a)

But supporting functions which take levity polymorphic arguments is unlikely to happen as GHC often needs/wants to generate different code depending on the liftedness of the argument.

@Qqwy
Copy link

Qqwy commented Sep 28, 2023

What about doing it the other way?

-- Just like today:
data Map k a
  = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
  | Tip

-- New: Re-use Map, but with unit as value type
-- a has the role `nominal` as required :-)
newtype Set a = Set (Map a ()) 

This approach is used in many other programming languages, exactly to improve code-reusage between the two datastructures. Essentially the whole interface of Set can now directly be implemented in terms of Map.

I believe this alternate definition of Set does not differ in strictness of its contents from the current one (though I might have missed something of course), which makes it nicely backwards-compatible.

@AndreasPK
Copy link
Contributor

newtype Set a = Set (Map a ())

This would increase the size of the Set constructor by an additional pointer. Which is probably not worthwhile when the Set code is already written.

@alexfmpe
Copy link
Contributor Author

What about doing it the other way?

That's how it's done for HashSet / HashMap
https://github.com/haskell-unordered-containers/unordered-containers/blob/259dc9e4191d988a7f84c3d417764a32e7c8e29f/Data/HashSet/Internal.hs#L112-L114

It has the issue pointed above but I also wonder if it's morally "backwards". It's defining the more general thing first then specializing it, whereas newtype Map k v = Map (BinTree (Arg k v)) decomposes into two simpler constructs.

It's easier to obtain multiple shapes (e.g. Map, DMap as defined above) by recombining simple pieces than finding a sufficiently general one that can be specialized to all the shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants