Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length in unboxed instances #393

Open
Boarders opened this issue Jul 20, 2021 · 12 comments
Open

Length in unboxed instances #393

Boarders opened this issue Jul 20, 2021 · 12 comments

Comments

@Boarders
Copy link
Contributor

Unbox instances for tuples (defined in unbox-tuple-instances) are backed by vectors of the form:

data instance Vector (a, b)
    = V_2 {-# UNPACK #-} !Int !(Vector a) !(Vector b)

This stores the length of each vector. Is there a fundamental reason why we need this length? It seems, naively, like it wastes an extra word as each vector already contains the length and an invariant of the representation should be that both lengths remain identical.

@gksato
Copy link
Contributor

gksato commented Jul 20, 2021

We could define

instance (Generic.Vector Unboxed.Vector a, Generic.Vector Unboxed.Vector b) Generic.Vector Unboxed.Vector (a,b) where
   basicLength (V_2 {- length field is gone -} vfst vsnd) = basicLength vfst

However what if we had Unboxed.Vector ((...((a10000,a9999),a9998),....,a1),a0)? It would be a BIG performance penalty for merely asking for a length.
The reason for this weird iterated nested pair may be:

{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, DerivingStrategies #-}
data Nat = Zero | Succ !Nat
data family SnocLenList (a::Type) :: Nat -> Type
newtype instance SnocLenList a Zero = SLLNil ()
  deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox)
newtype instance SnocLenList a (Succ n) = SLLSnoc (SnocLenList a n,a)
  deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox)

@Boarders
Copy link
Contributor Author

Boarders commented Jul 20, 2021

Is it a good trade off to optimize for type-level lisp in the library?

@Daniel-Diaz
Copy link

Good point. With many vectors around, those bytes might add up to a noticeable waste. How many nested vectors are needed for these extra bytes to be worth it?

The example with the type-level naturals could be rewritten using the type-level naturals from GHC.TypeLits, using natVal to get the length.

@gksato
Copy link
Contributor

gksato commented Jul 20, 2021

My example is surely an extreme one and I neither think the library should take utmost care for this kind of extremity. However pointer indirection is always a small but non-ignorable time performance penalty, and the most frequently invoked operation would be length if a user doesn't love unsafe functions and C/C++-like nasal demons, i.e. undefined behavior of the whole program. On the other hand, extra one word for length is a space performance penalty, and we can't simply compare these two types of penalty, while they are connected through GC.

@Shimuuar
Copy link
Contributor

I agree that it's problem but trying to get rid of length field in V_2 and friends is wrong approach IMO. Let take for example 3-tuple (A,B,C). Representation of corresponding unboxed vector is roughly:

data V_3 A B C = V_3 !Length
  (Length, Buffer A)
  (Length, Buffer B)
  (Length, Buffer C)

We have 4 copies of length! And removing length field from V_3 does little^ it only removes 1 out of 3. What we really want is to separate buffer from length so we can represent product as

data V_3 A B C = V_3 !Length
  (Buffer A)
  (Buffer B)
  (Buffer C)

@gksato
Copy link
Contributor

gksato commented Jul 20, 2021

What kind of structure would that be....

data family UnboxedBufWithOff a
data instance UnboxedBufWithOff Int = UBOInt {-# UNPACK #-} !Int {-#UNPACK #-} !ByteArray
data instance UnboxedBufWithOff Word32 = UBOWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !ByteArray
data instance UnboxedBufWithOff (a,b,c) = UBOTup3 !(UnboxedBufWithOff a) !(UnboxedBufWithOff b) !(UnboxedBufWithOff c)

data Unboxed.Vector a = Unboxed {-# UNPACK #-} !Int !(UnboxedBufWithOff a)

This is an extra pointer indirection. Maybe we would want:

data family Unboxed.Vector a
data instance Unboxed.Vector Int = UnboxedInt {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Int)
data instance Unboxed.Vector Word32 = UnboxedWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Word32)
data instance Unboxed.Vector (a,b,c) = UnboxedTup3 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff (a,b,c))

This is a big incompatibility.

@gksato
Copy link
Contributor

gksato commented Jul 20, 2021

Ah, I forgot to say this. Good point.

@Shimuuar
Copy link
Contributor

Yes something along these lines. It is incompatible and it's not clear how it will interact with GND/DerivingVia but thankfully this sort of changes could be experimented with outside of vector

Event less compatible change I though about is to abandon data families in favor of something like:

data Vector a = Vector !Int (Buffer a)
type family Buffer a 

We only have few variants for representation of elements: primitive vectors, products, and represent me as another type.

@Boarders
Copy link
Contributor Author

Boarders commented Jul 20, 2021

@Shimuuar Ah yes, that would be way better than removing the one length, thanks for the insight. I think it is definitely worth thinking about how it could be done better since currently it seems like a waste. If one were to change things to make this more transparent then I think products could already be done with generics which would probably clear up many issues with GND anyway.

@Daniel-Diaz
Copy link

@Shimuuar Yes, indeed removing the duplication of length for each sub-vector is the right optimization.

@gksato
Copy link
Contributor

gksato commented Jul 22, 2021

@Shimuuar
The construction is beautiful and simple. However can Buffer a be UNPACKed in Vector a that way? I mean, if we can't, it's one two word extra for non-product Vector: 4 words of

[Unboxed Vector tag] [Length Int#] [Offset Int#] [Pointer to ByteArray#]

becomes 5=2+3 6=3+3 words of

Vector: [Unboxed Vector tag] [Length Int#] [Pointer to Buffer]
Buffer: [Buffer tag] [Offset Int#] [Pointer to ByteArray#]

We were talking about wasting a word in memory, right? Or did I misunderstand and were we talking about having duplicate storage of the same logical information?

@Shimuuar
Copy link
Contributor

That's very good point! I think that unboxing is possible when buffer type is monomorphic although that should be checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants