Length in unboxed instances #393

Boarders · 2021-07-20T00:15:05Z

Unbox instances for tuples (defined in unbox-tuple-instances) are backed by vectors of the form:

data instance Vector (a, b)
    = V_2 {-# UNPACK #-} !Int !(Vector a) !(Vector b)

This stores the length of each vector. Is there a fundamental reason why we need this length? It seems, naively, like it wastes an extra word as each vector already contains the length and an invariant of the representation should be that both lengths remain identical.

The text was updated successfully, but these errors were encountered:

gksato · 2021-07-20T00:34:35Z

We could define

instance (Generic.Vector Unboxed.Vector a, Generic.Vector Unboxed.Vector b) Generic.Vector Unboxed.Vector (a,b) where
   basicLength (V_2 {- length field is gone -} vfst vsnd) = basicLength vfst

However what if we had Unboxed.Vector ((...((a10000,a9999),a9998),....,a1),a0)? It would be a BIG performance penalty for merely asking for a length.
The reason for this weird iterated nested pair may be:

{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, DerivingStrategies #-}
data Nat = Zero | Succ !Nat
data family SnocLenList (a::Type) :: Nat -> Type
newtype instance SnocLenList a Zero = SLLNil ()
  deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox)
newtype instance SnocLenList a (Succ n) = SLLSnoc (SnocLenList a n,a)
  deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox)

Boarders · 2021-07-20T00:42:11Z

Is it a good trade off to optimize for type-level lisp in the library?

Daniel-Diaz · 2021-07-20T00:59:40Z

Good point. With many vectors around, those bytes might add up to a noticeable waste. How many nested vectors are needed for these extra bytes to be worth it?

The example with the type-level naturals could be rewritten using the type-level naturals from GHC.TypeLits, using natVal to get the length.

gksato · 2021-07-20T02:08:42Z

My example is surely an extreme one and I neither think the library should take utmost care for this kind of extremity. However pointer indirection is always a small but non-ignorable time performance penalty, and the most frequently invoked operation would be length if a user doesn't love unsafe functions and C/C++-like nasal demons, i.e. undefined behavior of the whole program. On the other hand, extra one word for length is a space performance penalty, and we can't simply compare these two types of penalty, while they are connected through GC.

Shimuuar · 2021-07-20T08:59:13Z

I agree that it's problem but trying to get rid of length field in V_2 and friends is wrong approach IMO. Let take for example 3-tuple (A,B,C). Representation of corresponding unboxed vector is roughly:

data V_3 A B C = V_3 !Length
  (Length, Buffer A)
  (Length, Buffer B)
  (Length, Buffer C)

We have 4 copies of length! And removing length field from V_3 does little^ it only removes 1 out of 3. What we really want is to separate buffer from length so we can represent product as

data V_3 A B C = V_3 !Length
  (Buffer A)
  (Buffer B)
  (Buffer C)

gksato · 2021-07-20T09:25:41Z

What kind of structure would that be....

data family UnboxedBufWithOff a
data instance UnboxedBufWithOff Int = UBOInt {-# UNPACK #-} !Int {-#UNPACK #-} !ByteArray
data instance UnboxedBufWithOff Word32 = UBOWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !ByteArray
data instance UnboxedBufWithOff (a,b,c) = UBOTup3 !(UnboxedBufWithOff a) !(UnboxedBufWithOff b) !(UnboxedBufWithOff c)

data Unboxed.Vector a = Unboxed {-# UNPACK #-} !Int !(UnboxedBufWithOff a)

This is an extra pointer indirection. Maybe we would want:

data family Unboxed.Vector a
data instance Unboxed.Vector Int = UnboxedInt {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Int)
data instance Unboxed.Vector Word32 = UnboxedWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Word32)
data instance Unboxed.Vector (a,b,c) = UnboxedTup3 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff (a,b,c))

This is a big incompatibility.

gksato · 2021-07-20T09:27:31Z

Ah, I forgot to say this. Good point.

Shimuuar · 2021-07-20T09:56:52Z

Yes something along these lines. It is incompatible and it's not clear how it will interact with GND/DerivingVia but thankfully this sort of changes could be experimented with outside of vector

Event less compatible change I though about is to abandon data families in favor of something like:

data Vector a = Vector !Int (Buffer a)
type family Buffer a

We only have few variants for representation of elements: primitive vectors, products, and represent me as another type.

Boarders · 2021-07-20T13:28:39Z

@Shimuuar Ah yes, that would be way better than removing the one length, thanks for the insight. I think it is definitely worth thinking about how it could be done better since currently it seems like a waste. If one were to change things to make this more transparent then I think products could already be done with generics which would probably clear up many issues with GND anyway.

Daniel-Diaz · 2021-07-20T13:58:20Z

@Shimuuar Yes, indeed removing the duplication of length for each sub-vector is the right optimization.

gksato · 2021-07-22T02:11:02Z

@Shimuuar
The construction is beautiful and simple. However can Buffer a be UNPACKed in Vector a that way? I mean, if we can't, it's ~~one~~ two word extra for non-product Vector: 4 words of

[Unboxed Vector tag] [Length Int#] [Offset Int#] [Pointer to ByteArray#]

becomes ~~5=2+3~~ 6=3+3 words of

Vector: [Unboxed Vector tag] [Length Int#] [Pointer to Buffer]
Buffer: [Buffer tag] [Offset Int#] [Pointer to ByteArray#]

We were talking about wasting a word in memory, right? Or did I misunderstand and were we talking about having duplicate storage of the same logical information?

Shimuuar · 2021-07-22T07:39:27Z

That's very good point! I think that unboxing is possible when buffer type is monomorphic although that should be checked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Length in unboxed instances #393

Length in unboxed instances #393

Boarders commented Jul 20, 2021

gksato commented Jul 20, 2021 •

edited

Loading

Boarders commented Jul 20, 2021 •

edited

Loading

Daniel-Diaz commented Jul 20, 2021

gksato commented Jul 20, 2021

Shimuuar commented Jul 20, 2021

gksato commented Jul 20, 2021 •

edited

Loading

gksato commented Jul 20, 2021

Shimuuar commented Jul 20, 2021

Boarders commented Jul 20, 2021 •

edited

Loading

Daniel-Diaz commented Jul 20, 2021

gksato commented Jul 22, 2021 •

edited

Loading

Shimuuar commented Jul 22, 2021

Length in unboxed instances #393

Length in unboxed instances #393

Comments

Boarders commented Jul 20, 2021

gksato commented Jul 20, 2021 • edited Loading

Boarders commented Jul 20, 2021 • edited Loading

Daniel-Diaz commented Jul 20, 2021

gksato commented Jul 20, 2021

Shimuuar commented Jul 20, 2021

gksato commented Jul 20, 2021 • edited Loading

gksato commented Jul 20, 2021

Shimuuar commented Jul 20, 2021

Boarders commented Jul 20, 2021 • edited Loading

Daniel-Diaz commented Jul 20, 2021

gksato commented Jul 22, 2021 • edited Loading

Shimuuar commented Jul 22, 2021

gksato commented Jul 20, 2021 •

edited

Loading

Boarders commented Jul 20, 2021 •

edited

Loading

gksato commented Jul 20, 2021 •

edited

Loading

Boarders commented Jul 20, 2021 •

edited

Loading

gksato commented Jul 22, 2021 •

edited

Loading