-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Length in unboxed instances #393
Comments
We could define instance (Generic.Vector Unboxed.Vector a, Generic.Vector Unboxed.Vector b) Generic.Vector Unboxed.Vector (a,b) where
basicLength (V_2 {- length field is gone -} vfst vsnd) = basicLength vfst However what if we had {-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, DerivingStrategies #-}
data Nat = Zero | Succ !Nat
data family SnocLenList (a::Type) :: Nat -> Type
newtype instance SnocLenList a Zero = SLLNil ()
deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox)
newtype instance SnocLenList a (Succ n) = SLLSnoc (SnocLenList a n,a)
deriving newtype (Generic.Mutable.MVector Unboxed.Mutable.MVector, Generic.Vector Unboxed.Vector, Unboxed.Unbox) |
Is it a good trade off to optimize for type-level lisp in the library? |
Good point. With many vectors around, those bytes might add up to a noticeable waste. How many nested vectors are needed for these extra bytes to be worth it? The example with the type-level naturals could be rewritten using the type-level naturals from GHC.TypeLits, using |
My example is surely an extreme one and I neither think the library should take utmost care for this kind of extremity. However pointer indirection is always a small but non-ignorable time performance penalty, and the most frequently invoked operation would be |
I agree that it's problem but trying to get rid of length field in data V_3 A B C = V_3 !Length
(Length, Buffer A)
(Length, Buffer B)
(Length, Buffer C) We have 4 copies of length! And removing length field from data V_3 A B C = V_3 !Length
(Buffer A)
(Buffer B)
(Buffer C) |
What kind of structure would that be.... data family UnboxedBufWithOff a
data instance UnboxedBufWithOff Int = UBOInt {-# UNPACK #-} !Int {-#UNPACK #-} !ByteArray
data instance UnboxedBufWithOff Word32 = UBOWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !ByteArray
data instance UnboxedBufWithOff (a,b,c) = UBOTup3 !(UnboxedBufWithOff a) !(UnboxedBufWithOff b) !(UnboxedBufWithOff c)
data Unboxed.Vector a = Unboxed {-# UNPACK #-} !Int !(UnboxedBufWithOff a) This is an extra pointer indirection. Maybe we would want: data family Unboxed.Vector a
data instance Unboxed.Vector Int = UnboxedInt {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Int)
data instance Unboxed.Vector Word32 = UnboxedWord32 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff Word32)
data instance Unboxed.Vector (a,b,c) = UnboxedTup3 {-# UNPACK #-} !Int {-# UNPACK #-} !(UnboxedBufWithOff (a,b,c)) This is a big incompatibility. |
Ah, I forgot to say this. Good point. |
Yes something along these lines. It is incompatible and it's not clear how it will interact with GND/DerivingVia but thankfully this sort of changes could be experimented with outside of vector Event less compatible change I though about is to abandon data families in favor of something like: data Vector a = Vector !Int (Buffer a)
type family Buffer a We only have few variants for representation of elements: primitive vectors, products, and represent me as another type. |
@Shimuuar Ah yes, that would be way better than removing the one length, thanks for the insight. I think it is definitely worth thinking about how it could be done better since currently it seems like a waste. If one were to change things to make this more transparent then I think products could already be done with generics which would probably clear up many issues with GND anyway. |
@Shimuuar Yes, indeed removing the duplication of length for each sub-vector is the right optimization. |
@Shimuuar
becomes
We were talking about wasting a word in memory, right? Or did I misunderstand and were we talking about having duplicate storage of the same logical information? |
That's very good point! I think that unboxing is possible when buffer type is monomorphic although that should be checked. |
Unbox
instances for tuples (defined inunbox-tuple-instances
) are backed by vectors of the form:This stores the length of each vector. Is there a fundamental reason why we need this length? It seems, naively, like it wastes an extra word as each vector already contains the length and an invariant of the representation should be that both lengths remain identical.
The text was updated successfully, but these errors were encountered: