Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Edges are not necessarily Arcs #401

Closed
IssamT opened this issue Aug 4, 2016 · 20 comments
Closed

Edges are not necessarily Arcs #401

IssamT opened this issue Aug 4, 2016 · 20 comments

Comments

@IssamT
Copy link
Contributor

IssamT commented Aug 4, 2016

In simple graphs :

An arc (directed edge) is an ordered pair of vertices u-->v
A line (undirected edge) is a set of two vertices u---v

Directed graphs have arcs, while undirected graphs have lines(often called edges).

https://en.wikipedia.org/wiki/Glossary_of_graph_theory#edge

These two objects are not equal u-->v , v-->u
But these two are equal u---v , v---u

In both Graphs.jl and LightGraphs.jl there is only the notion of Arcs that are called Edges and are used for both directed and undirected graphs with a confusing equality test of edges in undirected graphs.

At least in Graphs.jl it is possible to do a work around by redefining the operator == of Edges in parts of the code where only undirected edges are used. But it becomes more problematic in LightGraphs since the edges are just pairs and thus the operator to redefine is this one
==(p::Pair, q::Pair) = (p.first==q.first) & (p.second==q.second)

@IssamT IssamT changed the title Edges are not arcs Edges are not necessarily Arcs Aug 4, 2016
@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

Inconsistency example :

The following code

g = Graph(2)
add_edge!(g,Edge(1,2))
has_edge(g,Edge(2,1))

returns true

but this one

for e in edges(g)
    e == Edge(2,1) && print("Edge found!")
end

prints nothing

@CarloLucibello
Copy link
Contributor

for undirected graps edges are always returned such that x=>y has always x<y. This could be of some help

I'm in favor of having a type

type Edge
  first::Int
  second::Int
end

also because I don't like for example having Base.show(p::Pair) overwritten and displaying "Edge 1 2" for hany pair.

I fail to see though how can this solve the problem of Arc/Edge comparison

@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

Well i don't mind having two types that are exactly the same from a structural point of view.

type Arc
  first::Int
  second::Int
end

type Edge
  first::Int
  second::Int
end

As far as they are different types we can have different function specialization for both. For example

 ==(p::Arc, q::Arc) = (p.first==q.first) & (p.second==q.second)

 ==(p::Edge, q::Edge) = ((p.first==q.first) & (p.second==q.second)) | (p.first==q.second) & (p.second==q.first)

@CarloLucibello
Copy link
Contributor

we should check what impact this would have thoughout the library, in particular regarding code duplication and type instability

@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

Sure and it would be better to keep all methods that do not make a difference between directed and undirected edges unique :

abstract Edge

type Arc <: Edge
  first::Int
  second::Int
end

type UEdge <: Edge
  first::Int
  second::Int
end

@sbromberger
Copy link
Owner

sbromberger commented Aug 4, 2016

It was an explicit design decision to keep the structure of the graph independent of the structure of the edges. That is, edges do not contain information relative to their graph. There are a few reasons for this, but the main one is that it really doesn't matter. We have an accepted interface for checking edge membership, and e in edges is not it. (This used to work, incidentally, before we moved to edge iterators, but it is not the accepted interface.)

Sometimes the lack of directedness on edges makes things more complex, but in general, it simplifies things a great deal.

That is: I'm not in favor of creating two different edge types based on directedness. We have a history with this issue where basic operations required two separate functions depending on directedness of edges, and it was unnecessarily complex.

Also: Edges used to just be a type with two Ints, as you're suggesting. It made sense to move to a primitive structure just for convenience, but if there's a compelling reason to move back, we can do it.

@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

@sbromberger There are a few reasons for this, but the main one is that it really doesn't matter

but for me it really does matter.

I wanted to write a mip that finds the shortest cycle in a graph using JuMP.jl

function shortest_cycle(g::Graph)
    m = Model(solver=CbcSolver())   

    @variable(m, b[v in vertices(g)], Bin)
    @variable(m, c[e in edges(g)], Bin)
    @constraint(m, sum{b[v] , v in vertices(g)} >= 1)
    @constraint(m , adjacency[v in vertices(g)],  sum{c[e] , e in out_edges(g,v)} == 2b[v])
    @objective(m, Min, sum{b[v] , v in vertices(g)})

    solve(m)   
    println("The shortest cycle has $(getobjectivevalue(m)) elements")
    getvalue(b)
end

In JuMP when you declare your variable, you also declare the set of its indexes then you can do mathematic expressions over subsets in this set.

Since out_edges(v,g) is not a subset of edges(g) there are a lot of mathematical programs that can not be written in a simple way anymore.

@sbromberger
Copy link
Owner

sbromberger commented Aug 4, 2016

the easiest ways to do this for undirected graphs are 1) to just ignore edges that are not in vertex order (use the unexported LightGraphs.is_ordered() to do this), or 2) explicitly "or" with reverse(e).

@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

Yes I saw noticed these solutions. but they have three drawbacks.

  1. they aren't intuitive and require the user to know about the implementation details.
  2. they can lead to bad performance.
  3. they aren't useful to write concise code.

After thinking half an hour one way I found to fix my model is to rewrite this

@constraint(m , adjacency[v in vertices(g)], sum{c[e] , e in out_edges(g,v)} == 2b[v])

into that

    @constraintref adjacency[vertices(g)]    
    for v in vertices(g)
        out_edges_subset_of_edges = []       
        for(e in out_edges(g,v))            
            e in edges(g) && push!(out_edges_subset_of_edges, e)
            reverse(e) in edges(g) && push!(out_edges_subset_of_edges, reverse(e))
        end               
        adjacency[v] = @constraint(m, sum{c[e] , e in out_edges_subset_of_edges} == 2b[v] )
    end

I think this illustrate the 3 points I mentioned.

Another way would be to transform pairs into sets that contain two elements and let the keys be Sets of vertices instead of Edges. but this would be even uglier in my opinion

I am pretty sure you have reasons behind your decision of making Edges always Pairs, and I would be grateful if you could share some of them.

@sbromberger
Copy link
Owner

sbromberger commented Aug 4, 2016

Collecting / set-testing edges like that is expensive, though it may be unavoidable given what you need to pass in to JuMP. If that's the case, then just union the edgeset and its reverse:

vec(hcat([[e,reverse(e)] for e in edges(g)]...))

~~(there's an equivalent for out_edges but I'll leave that as an exercise for the reader.)~~See below for out_edges.

As far as the reasons for making Edge a typealias of Pair, you can check the commit history for core.jl for some of it, but again - the tendency is to prefer primitives over other data structures wherever possible. But I'm sensing that's not the question you want to ask - my guess is that you want to ask why we're not differentiating between arcs and edges. The answer there is as I said above: this was a conscious decision to keep LG simple and memory-light; having two different edge types depending on the directedness of the graph leads to complications of its own. You might even go back into some of the commit history to see the machinations we had to perform when we kept full edge sets for undirected graphs. I don't really have the time to do any more right now except to say "this proposal was considered previously and we [I, since I was the primary contributor at the time] decided that it wasn't worth the complexity".

Bottom line: yes, you've run into a case where you have to make something more complex due to the "light" nature of LightGraphs, and ultimately, this package may not be well-suited for your application. The alternative, to reengineer LG so that your case is simple, will have the effect of making other stuff more complex. It's not (even) a net-zero tradeoff. The fact that you've got a few reasonable workarounds and we don't have to upend / reengineer all the native data structures to accommodate this request means that, absent any demonstration that this proposed change improves LG with respect to 1) correctness, 2) memory utilization, 3) code simplicity, and/or 4) performance, it's probably not going to happen.

If you want to submit a PR with appropriate benchmarks showing (at a minimum) no regressions in terms of those four goals, then you would then be presenting a more compelling reason to switch.

I hope that helps.

@sbromberger
Copy link
Owner

sbromberger commented Aug 4, 2016

Another option is to use in_edges as well as out_edges instead of reverse:

union(in_edges(g,1),out_edges(g,1))

in_edges and out_edges are very efficient.

@IssamT
Copy link
Contributor Author

IssamT commented Aug 4, 2016

Thank you for providing such a long explanation for something that was already discussed in the past (I m sorry i didn't know about that ). I actually did check all the issues posted even the closed ones before submitting this issue but i didn't check the commit history. I will go over it before eventually trying to submit a PR.

@sbromberger
Copy link
Owner

To be fair, you wouldn't have known where to search in the commit history, nor what to look for, so no worries.

I look forward to your PR. When you get closer to finalizing it, reach out to us to make sure you've got the proper benchmarking information included - this will represent a MAJOR change to the codebase and will require a ton of scrutiny.

@CarloLucibello
Copy link
Contributor

I changed the return type of in_edges in #403 , let me know what you think about it.

Another option is to define and export an order function in LG so that

@constraint(m , adjacency[v in vertices(g)], sum{c[order(e)] , e in out_edges(g,v)} == 2b[v])

would work correctly

@IssamT
Copy link
Contributor Author

IssamT commented Aug 5, 2016

@CarloLucibello I thought also about the same modification as what you suggested in #403 , but I didn't want to request it because, although it solves my issues, it could break some algorithms if they were relying on the fact that when looping thought e in out_edges(g,v), dest(e) is a neighbor of v.

However I really like your idea of adding the function order. It is pretty safe and unlikely to have a negative impact on any code using LightGraphs.jl. Also, out of the 3 drawbacks I have mentioned in a previous comment concerning the current implementation of edges, only one (arguably the least important) would partially persist:

  1. they aren't intuitive and require the user to know about the implementation details.

@IssamT
Copy link
Contributor Author

IssamT commented Aug 5, 2016

What do you think about leaving out_edges as it is and addingincident_edges(g,v) that would return edges as ordered Pairs ?

@sbromberger
Copy link
Owner

sbromberger commented Aug 5, 2016

wait. What does order() do? Does it rewrite an edge so that the lowest vid is first? If so, I am totally against this since there's no way to prevent it being used with directed graphs.

If you need this, then use is_ordered() and do it yourself so that it's at least explicit for your use case. It doesn't belong in LG.

@CarloLucibello
Copy link
Contributor

CarloLucibello commented Aug 5, 2016

I don't see any problem with order, why preventing to use it on directed graphs since we can't prevent also to use reverse?
The naming is bad though, a better one would be sort. Consider that reverse and issorted are already defined in Base for Pair, hence for Edge. These side effects advocate for Edge being its own type

@IssamT
Copy link
Contributor Author

IssamT commented Aug 5, 2016

@sbromberger,

You have asked about order() because the name was not appropriate.
This is the same problem a user of LightGraphs.jl faces since objects and functions do not have an intuitive behavior (respecting mathematical definitions) and require the user to refer to the doc/implementation continuously.

By the way is_ordered() is another exemple, it should be named is_sorted() in my opinion. All pairs are ordered (they have a "first" and "second" element)

Additionally I don't understand why you want to prevent users from sorting Edges when they are coming from directed graphs since you don t make a difference between edges of directed and undirected graphs. These are just "Pairs" so what you are saying is that you want to prevent a user from sorting a Pair.
Example for why sorting Pairs of vertices in directed graph can be useful:
In telecommunication networks, sometimes you can have different directed graphs (virtual networks) running over a physical network. One may want to get the underlying undirected subgraph on which all the directed graphs rely. An easy way to do that is to get all edges in directed graphs, sort them, and push them into a set (avoiding hence the addition of "similar" unordered Pairs)

I am just trying to help improve the library in which you guys have done really a great job. You probably know better about it than me and I will find it normal if my suggestions get refused.
I will have sort() and incident_edges() defined in my code only and that wouldn't be a big deal for me :)

@sbromberger
Copy link
Owner

sbromberger commented Aug 5, 2016

I don't see any problem with order, why preventing to use it on directed graphs since we can't prevent also to use reverse?

For the extremely simple reason that order has nondeterministic behavior on directed graphs only, while reverse has defined, consistent behavior on either type.

Additionally I don't understand why you want to prevent users from sorting Edges when they are coming from directed graphs since you don t make a difference between edges of directed and undirected graphs. These are just "Pairs" so what you are saying is that you want to prevent a user from sorting a Pair.

NOTHING is preventing you from doing this yourself. I have a stated objection to including this functionality in the public LG API because IT WILL CONFUSE PEOPLE.

Bottom line: you can continue to debate the merits (or demerits) of having arcs, edges, other functionality - but in the absence of a PR of the type I described above, I will not take further part.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants