Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using algebraic-graphs #11

Closed
thisiswhereitype opened this issue Jun 20, 2017 · 33 comments
Closed

Using algebraic-graphs #11

thisiswhereitype opened this issue Jun 20, 2017 · 33 comments
Assignees

Comments

@thisiswhereitype
Copy link
Collaborator

Seen as pangraph is concerned with parsing and serialisation of graphs. It seems wise to use another library to represent and manipulate the graphs. At the recommendation of @snowleopard Algebraic graphs.

I have implemented the API to produce these graph types with help from @geo2a's parser. However using alga makes Pangraph and Edge types obsolete. They only add boiler plate. Is possible they could be re implemented as a minimal representation for conversion but not manipulation of exceptionally large graph files?

@thisiswhereitype thisiswhereitype changed the title Using algabraic-graphs Using algebraic-graphs Jun 20, 2017
@thisiswhereitype
Copy link
Collaborator Author

Furthermore I also think the testing code should be converted to alga use alga construction primitives, from the paper found in its library it is not possible to create malformed graphs - which will allow the parser to be tested more rigorously as it reduces the chance of human error in the testing.

The current approach relies on humans parsing the graphs and embedding the graphs with the constructors which are then compared to the parsed result for equality.

@snowleopard
Copy link
Member

snowleopard commented Jun 20, 2017

It may still be useful to represent graphs abstractly in pangraph. That is, you can wrap the Graph datatype from the Alga library simply as newtype Pangraph = Pangraph (Graph ByteString). In this way you get access to all transformations in Alga, and at the same time you can still change/tweak the representation later without breaking the code of Pangraph users.

@snowleopard
Copy link
Member

One possible issue when using Alga for graph representation is edge properties. At the moment, Alga does not provide any support for this, so one may need to do something like:

data Pangraph = Pangraph
    { graph :: Graph ByteString
    , vertexProperties :: Map ByteString [ByteString]
    , edgeProperties :: Map (ByteString, ByteString) [ByteString] }

@thisiswhereitype thisiswhereitype self-assigned this Jun 20, 2017
@thisiswhereitype thisiswhereitype modified the milestone: pangraph-0.1 Jun 26, 2017
@thisiswhereitype
Copy link
Collaborator Author

My theoretical solution to combining alga and extra properties relies on abstracting a part of the alga interface into one which encompasses extra properties. But this to me feels out of the scope of pangraph (as a parser and serialiser) and also alga (as an algebra of graphs). I think I will give it a try despite my doubts, I will take the methods from Graph.hs in alga

@snowleopard
Copy link
Member

snowleopard commented Jul 4, 2017

@thisiswhereitype I think I agree with your doubts. I am starting to think that perhaps we should do the following.

Pangraph needs to store lists of vertices and lists of edges, along with their properties, as they come from the input file. So, the internal representation should match this, and does not need to rely on Alga. For example:

data Pangraph = Pangraph
    { vertexProperties :: Map ByteString VertexProperties
    , edgeProperties   :: Map (ByteString, ByteString) EdgeProperties }

So, it's just a map from vertex names to vertex properties, and the same for edges (pairs of vertex names).

Then, in order to provide a clean interface to the Alga library, we also need a couple functions like:

-- Extract the underlying graph by parsing each vertex and taking into account its properties
-- Note that the function may fail for a number of reasons, e.g. if Pangraph datatype contains
-- an edge referring to a non-existent vertex or if the parsing function has failed
toGraph :: (ByteString -> VertexProperties -> Maybe a) -> Pangraph -> Maybe (Graph a)

-- Turn an algebraic graph into a Pangraph representation, filling out the vertex properties 
-- using the provided function
fromGraph :: (a -> (ByteString, VertexProperties)) -> Graph a -> Pangraph

This is just a draft idea, so we can keep discussing this further in case of any doubts.

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Jul 6, 2017

I think this will work best to parse as many formats. Flexibility and performance should be the biggest concerns. Currently I am working with this implementation:

data Pangraph = Pangraph { unNodes :: Map Key Node
                         , unEdges :: Map Key Edge
                         } deriving (Eq)
newtype Node = Node { unNodeAttributes :: [Attribute]
                    } deriving (Eq)
newtype Edge = Edge { unEdgeAttributes :: [Attribute]
                    } deriving (Eq)
newtype Key = Key { unKey :: BS.ByteString
                  } deriving (Eq, Ord)
newtype Value = Value { unValue :: BS.ByteString
                      } deriving (Eq, Ord)
type Attribute = (Key, Value)

I think a list is most appropriate now as many graphs only have a few attributes. But this can be confirmed with later profiling. Another consideration is the possibility for metadata fields within the graph for example whether the graph is directed or not.

I think it will be necessary to enforce an unique key onto the Edges. Otherwise how the pair of nodes uniquely define an edge considering other properties?

Thinking about your suggestion for interface to alga, I think Pangraph should be an entirely independent intermediate type. I think a toAlga and fromAlga interface could be a good general model for converting graphs between libraries.

@snowleopard
Copy link
Member

Let's focus on the smallest/simplest API first, without going into implementation details.

The following is what, I think, a user should see, which is not very far from the current implementation. Note that no type classes are required.

-- Abstract data type (i.e. the constructor is not exported)
-- Do not provide any instances, such as Eq, Show, etc. yet -- we'll add them in future
data Pangraph

-- These type synonyms are visible to the user, they are added only for readability
type Key = ByteString
type Value = ByteString
type Identifier = ByteString

-- Abstract data types for attributes, vertices and edges
data Attribute
data Vertex
data Edge

-- Accessing the identifier and attributes of a vertex
vertexIdentifier :: Vertex -> Identifier
vertexAttributes :: Vertex -> [Attribute]

-- Similarly, accessing the source/target identifiers and attributes of an edge
edgeSource :: Edge -> Identifier
edgeTarget :: Edge -> Identifier
edgeAttributes :: Edge -> [Attribute]

-- Accessing key/value pairs of attributes
key :: Attribute -> Key
value :: Attribute -> Value

-- Extract the vertices/edges from the graph
vertexList :: Pangraph -> [Vertex]
edgeList :: Pangraph -> [Edge]

-- Also provide constructors for Pangraph, Vertex, Edge and Attribute
pangraph :: [Vertex] -> [Edge] -> Pangraph
vertex :: Identifier -> [Attribute] -> Vertex
edge :: Identifier -> Identifier -> [Attribute] -> Edge
attribute :: Key -> Value -> Attribute

The rest of the complexity (maps, algebraic graphs, etc.) can be easily built on top of this simple API.

Note: I prefer to use "vertex" instead of "node" to avoid confusion with Alga.

@thisiswhereitype Can you write an implementation and submit a PR?

@snowleopard
Copy link
Member

After further discussion with @thisiswhereitype, here is the current design (no more identifiers):

-- Abstract data type (i.e. the constructor is not exported)
-- Do not provide any instances, such as Eq, Show, etc. yet -- we'll add them in future
data Pangraph

-- These type synonyms are visible to the user, they are added only for readability
type Key = ByteString
type Value = ByteString

-- Abstract data types for attributes, vertices and edges
data Attribute
data Vertex
data Edge

-- Instances useful for putting vertices and edges in containers
instance Ord Vertex
instance Ord Edge

-- Accessing the identifier and attributes of a vertex
vertexAttributes :: Vertex -> [Attribute]

-- Similarly, accessing the source/target identifiers and attributes of an edge
edgeSource :: Edge -> Vertex
edgeTarget :: Edge -> Vertex
edgeAttributes :: Edge -> [Attribute]

-- Accessing key/value pairs of attributes
key :: Attribute -> Key
value :: Attribute -> Value

-- Extract the vertices/edges from the graph
vertexList :: Pangraph -> [Vertex]
edgeList :: Pangraph -> [Edge]

-- Also provide constructors for Pangraph, Vertex, Edge and Attribute
pangraph :: [Vertex] -> [Edge] -> Pangraph
vertex :: [Attribute] -> Vertex
edge :: Vertex -> Vertex -> [Attribute] -> Edge
attribute :: Key -> Value -> Attribute

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Jul 13, 2017

Here is my interface and data types implementation, I have made some comments below.

module Pangraph.Internal.Graph (
-- Abstract Types
Pangraph,
Vertex,
Edge,
Attribute, -- A type alias for (Key, Value)
Key,
Value,

-- Constructors
makePangraph,
makeVertex,
makeEdge,
makeKey,
makeValue,
-- Getters
vertices,
edges,
vertexAttributes,
edgeAttributes,
edgeEndpoints,
key,
value,
) where

data Pangraph = Pangraph
  { vertices' :: Map Identifier Vertex
  , edges' :: Map Identifier Edge
  } deriving (Eq)
data Vertex = Vertex
  { nodeID' :: Maybe Identifier
  , vertexAttributes' :: [Attribute]
  } deriving (Eq)
data Edge = Edge
  { edgeID' :: Maybe Identifier
  , edgeAttributes' :: [Attribute]
  , source :: Vertex
  , target :: Vertex
  } deriving (Eq)

type Identifier = Word
type Attribute = (Key, Value)

newtype Key = Key BS.ByteString deriving (Eq, Ord, Show)
newtype Value = Value BS.ByteString deriving (Eq, Ord, Show)

I think Attribute is best as an alias - it is natural to work with key, value pairs in the map. The constructor for Edge requiring endpoints means malformed graphs cannot be constructed. I think the problems are arising with deciding on an interface (wrt Identifier) are because we are not enforcing an unique field on the Vertex. I propose the constructor to be:
makeVertex :: Identifier -> [Attributes] -> Vertex
This gives us more assumptions about working with vertices, we can tell for example if a Vertex is being updated or added and provide functions accordingly. Similar to Data.Map's insert. Or two functions can be provided updateVertex and addVertex to protect the graph as the onus is now on the user to verify vertices.

insertVertex :: Pangraph -> Vertex -> Pangraph
-- or
addVertex :: Pangraph -> Vertex -> Maybe Pangraph
updateVertex :: Pangraph -> Vertex -> Maybe Pangraph

@thisiswhereitype
Copy link
Collaborator Author

Also discussing your idea that typeclasses were not ideal for implementing Attribute access. You primary reason iirc was because classes are exported and the user can extend them. But I have found that by not exporting the class and only its functions you cannot create any more instances but at the same time the functions are still polymorphic.

module Interfaces (
Circle,
Square,
makeCircle,
makeSquare,
volume
) where

data Circle = Circle Float deriving (Show,Eq)
data Square = Square Float Float Float deriving (Show,Eq)

makeCircle :: Float -> Circle
makeCircle = Circle

makeSquare :: Float -> Float -> Float -> Square
makeSquare = Square

class Shape a where
  volume :: a -> Float

instance Shape Square where
  volume (Square x y z) = x * y * z

instance Shape Circle where
  volume (Circle r) = r * r * 3.14
module Lib(
test
) where

import qualified Interfaces as I

data Triangle = Equilateral Int Int Int deriving (Eq, Show)

--  error: Not in scope: type constructor or class ‘Shape’

-- instance Shape Triangle where
--   volume (Equilateral hyp adj opp) = (adj + opp) / 2

test :: (Float, Float)
test = (I.volume square, I.volume circle)
  where
    square = I.makeSquare 6 3 5
    circle = I.makeCircle 100

I think that by doing this the API is cleaner as there is no duplication with functions like vertexAttributes and edgeAttributes

@snowleopard
Copy link
Member

@thisiswhereitype Please submit a pull request, I'll do a review and we'll go from there.

I suggest not to use type classes at first -- we can consider such API improvements in future. Let's implement the simplest thing that works first.

@thisiswhereitype
Copy link
Collaborator Author

#14

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Jul 14, 2017

However, in extension to my last comment, using type classes means we are unable to hide a particular types getter without hiding them all. As access is now controlled through one function.

@snowleopard
Copy link
Member

There are also other reasons for not using type classes: for example, you get more obscure error messages. If you apply attributes to a value of type Pangraph, you'll get something like "No instance of Attributes for Pangraph", which may be quite confusing, especially for a newcomer.

@thisiswhereitype
Copy link
Collaborator Author

Given the context of this issue changing in #14 and the interface now relatively settled upon.
My current thinking is that toAlga and fromAlga will be in a module Pangraph.Algebraic-Graphs
But alga is only capturing the structure of the graphs so I think the first iteration of the API should capture a very simple API. These are easy to implement with both libraries API's

toAlga :: Pangraph -> Graph Vertex
fromAlga :: Graph Vertex -> ([Vertex], [Edge])

@snowleopard
Copy link
Member

snowleopard commented Aug 21, 2017

My current thinking is that toAlga and fromAlga will be in a module Pangraph.Algebraic-Graphs

Instead of toAlga I suggest you implement the interface ToGraph, which is essentially just a polymorphic version of toAlga:

http://hackage.haskell.org/package/algebraic-graphs-0.0.5/docs/Algebra-Graph-Class.html#t:ToGraph.

This allows Pangraph to be used anywhere where a ToGraph data type is accepted. For example, it can be passed to DOT export functions without any intermediate conversions:

http://hackage.haskell.org/package/algebraic-graphs-0.0.5/docs/Algebra-Graph-Export-Dot.html

In this way you can add an export to DOT format very easily (you can try the simplest exportAsIs first).

This ToGraph instance should live in the module defining Pangraph as otherwise it would be an orphan instance, so no need to add a new module just yet.

Let's come back to fromAlga a bit later.

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Aug 21, 2017

After implementing this, I discovered alga's vertexList has an Ord typeclass restriction. I have explored orphan instances through Pangraph.Instances however the compiler warns on this and I see why. The solution it offers is move the instance into the module or wrap in a newtype.
I think the simplest solution is to enforce Ord as the VertexID by moving it into the module. Then the user can use sortBy to sort their own lists of attributes and order them accordingly.
I will provide a commit shortly.

@snowleopard
Copy link
Member

@thisiswhereitype I don't understand. Are you trying to implement ToGraph Pangraph? Then where is the Ord constraint comes from?

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Aug 22, 2017

After clarifying in person vertexList has type
vertexList :: Ord a => Graph a -> [a]
Where a is a vertex.
@snowleopard Suggests that an interface be provided and a newtype wrapped Vertex be added to allow for interfaces to be added, iirc?
So the file would look like this:

module Alga (
WrappedVertex(..),
...
) where
...
class Alga where
    toAlga :: [WrappedVertex] -> [(WrappedVertex, WrappedVertex)] -> Graph WrappedVertex
    toPan :: Graph WrappedVertex -> ([Vertex], [Edge])
newtype WrappedVertex = Vertex deriving (Eq, Show)
instance Ord WrappedVertex where
    ...

Is there a way users can define the instance?

@snowleopard
Copy link
Member

I don't think I suggested anything like this! :-)

Can you clarify what problem you are solving? Why do you need vertexList?

@thisiswhereitype
Copy link
Collaborator Author

To extract the list of Vertices and then edges from an algebraic graph and recreate a list of Edges and Vertices?

@snowleopard
Copy link
Member

Ah, I think you are jumping ahead a little. Let's do the ToGraph Pangraph instance first in a separate PR -- as far as I understand it doesn't require vertexList or any Ord constraints.

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Aug 22, 2017

I am afraid I am now confused.
This is my current implementation:

module Pangraph.AlgebraicGraphs
    ( toAlga
    ) where

import qualified Algebra.Graph as G
import Pangraph

toAlga :: Pangraph -> G.Graph Vertex
toAlga p = G.graph (vertices p) (map edgeEndpoints $ edges p)

I am not sure what behavior is intended for the ToGraph a Interface to capture?

@snowleopard
Copy link
Member

snowleopard commented Aug 22, 2017

@thisiswhereitype Well, that's almost exactly what you need. Let me tweak it:

{-# LANGUAGE TypeFamilies #-}
module Pangraph -- should be defined here as otherwise it will be an orphan instance

import qualified Algebra.Graph.Class as Alga

instance Alga.ToGraph Pangraph where
    type ToVertex = Vertex
    toGraph p = Alga.graph (vertices p) (map edgeEndpoints $ edges p)

@snowleopard
Copy link
Member

I believe that with the above instance definition you can apply DOT exportAsIs from Alga to a Pangraph value and it will just work.

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Aug 22, 2017

Given the definition form Alga:

class ToGraph t where
    type ToVertex t
    toGraph :: (Graph g, Vertex g ~ ToVertex t) => t -> g

Is the instance not:

instance Alga.ToGraph Pangraph where
    type ToVertex Pangraph = Vertex
    toGraph p = Alga.graph (vertices p) (map edgeEndpoints $ edges p)

@snowleopard
Copy link
Member

Ah, yes, you are right: I missed Pangraph in type ToVertex Pangraph = Vertex.

@thisiswhereitype
Copy link
Collaborator Author

@snowleopard
Copy link
Member

@thisiswhereitype Looks good -- please send a PR with it.

@thisiswhereitype
Copy link
Collaborator Author

thisiswhereitype commented Aug 31, 2017

I have concerns regarding size the binary* may grow too if all Graph libraries can be imported into Pangraph.hs. I think a newtype wrapper for the instance may be the way to go in this case. I have tried it out and this is what I have produced. Let my know what you think.

*Only imported modules are linked from my testing.

{-# LANGUAGE TypeFamilies #-}

module Pangraph.AlgebraicGraphs
    ( toAlga, toPangraph
    ) where

import Pangraph
import qualified Algebra.Graph.Class as AlgaClass

newtype Wrapper = Wrapper
  { unwrap' :: Pangraph
  } deriving (Eq)

-- I used ":t toAlga" to get this but can it be simplified?
toAlga :: (AlgaClass.Vertex g ~ Vertex, AlgaClass.Graph g) => Pangraph -> g
toAlga p = AlgaClass.toGraph (Wrapper p)

toPangraph = undefined

instance AlgaClass.ToGraph Wrapper where
  type ToVertex Wrapper = Vertex
  toGraph (Wrapper p) = AlgaClass.graph (vertexList p) (map edgeEndpoints $ edgeList p)

@snowleopard
Copy link
Member

snowleopard commented Aug 31, 2017

I don't understand what exactly you achieved by wrapping and unwrapping Pangraph.

Are you concerned that depending/importing algebraic-graphs increases the size of the pangraph binary? I don't see how the above changes anything. You still depend on the library, you still import Alga modules.

Furthermore, I don't see how a function which is not used by a binary can increase its size. In any case, if the size of the binary is a concern it should be discussed in a separate issue.

@thisiswhereitype
Copy link
Collaborator Author

I was mistaken in my interpretation of my binary experiments. The newtype allows for the interface instance to not be orphaned but that is now irrelevant.
Are you happy to close this thread, toPangraph can have another issue?

@snowleopard
Copy link
Member

Yes, sure, let me close this. We can discuss toPangraph in a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants