-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pyos] The representaton of self-loops in undirected graphs #398
Comments
Yeah, this is worth attention, thanks @szhorvat. Right now we defer to So, at the very minimum, we should document how self-edges are handled in different cases. We should also consider adding a keyword argument to control behavior and changing the default behavior. See NetworkX discussion about this here: networkx/networkx#1078 As for how we handle self-edges in |
Does the GraphBLAS standard say anything about the interpretation of symmetric matrices as undirected graphs? Or does it sidestep the problem by effectively treating everything as directed? I'm new to GraphBLAS, but I guess the latter? Be careful about consistency, and with adding converters to/from other network packages (or other network packages implementing converters themselves). igraph and graph-tool both use 2. Delegating to the adjacency matrix function of each package would be problematic. NetworkX uses 1 but one may argue that this is inconsistent and if you do so, you should also define "degree" as the number of incident edges. So this graph would have degrees (1, 3, 1): But that's not how NetworkX handles degrees, it counts edge endpoints, as is typical in most textbooks. So the degrees it returns are (1, 4, 1). Then it would make sense for the adjacency matrix to also contain the number of edge endpoints, not number of edges and maintain the relation that row (or column) sums give the degrees. Or consider a random walk on an undirected graph. What is the probability of the random walker being in a given vertex? You'll find in many textbooks that this is proportional to the degree. This scenario is basically PageRank with a damping factor of 1. Do the calculation with |
I agree with your points @szhorvat, and thanks for the clear explanations. It's helpful to have a "paper trail", and I'm looking forward on your perspective on more things. We'll discuss this issue in our weekly meeting tomorrow, but here are my current thoughts:
Despite having "Graph" in the name, GraphBLAS says very little about the interpretation of matrices. The same matrix can be interpreted differently whether it's directed or undirected, and the same graph can have different associated matrices: adjacency, incidence, Laplacian, etc. So, I would say the GraphBLAS approach is to allow users to adhere to well-established theory relating matrices, linear algebra, and graphs. If you know you have an adjecency matrix for an undirected graph, you can treat it as such or wrap it in a So, we should follow the spirit of adhering to well-established theory, which I think you've convinced me that in this case means changing how we convert self-edges for undirected graphs from networkx. |
Perhaps the easiest thing for us to do here (at least for now) is to double the diagonals ourselves when converting from NetworkX, such as: if not G.is_directed():
A += gb.select.diag(A) |
If we double the diagonals by default when converting from NetworkX, then we ought to halve the diagonals when converting to a NetworkX undirected Graph by default. What are good keyword arguments to control this behavior? |
I have to make a correction here. The Python interface of igraph has not fully transitioned yet, and currently uses 1 by default. graph-tool does use 2.
What we are heading towards in igraph is using a https://python.igraph.org/en/stable/api/igraph.GraphBase.html#Adjacency |
I notice that For directed graphs, self-loops should not be counted twice. For simple undirected graphs, the weight simply needs to be halved, if you choose to go with taking self-loops twice. This brings up the question:
It seems to me that the GraphBLAS approach of using adjacency matrices is not very suitable for multigraphs. One may certainly represent multiple paralell edges with integers higher than 1 in the adjacency matrix, but then weights cannot be handled, and several parallel edges cannot be distinguished from each other. |
I wanted to give you a heads up that I won't be reachable until the week after next. I will think more carefully about these issues when I'm back and will try to finish the review by April 2nd. |
Thanks for the suggestions, questions, and heads up @szhorvat. I expect to be especially busy the next two weeks, so this is good timing for a pause. I do appreciate your scrutiny and involvement with your careful review--you're helping to improve things and push things forward :) We would like to redo
Right, adjacency matrices aren't the way to go for weighted multigraphs. For unweighted, using the multiplicity for the edge weight can sometimes work as you indicated. We'll need to use incidence matrices in general though. Now I wonder: how are self-edges handled by incidence matrices? Instead of a single row (or column) for an edge with two values to indicate the two end points, I suppose you'd need two rows (or columns) for an edge with one value each to indicate the self-loop. |
@jim22k, regarding the shape of the incidence matrix, since we assume row-oriented-ness, I think it makes the most sense to have |
@SultanOrazbayev @jim22k for the next meeting or two, let's try to work out a proposal for going to/from networkx and adjacency matrix. If we have the time, I think it would be helpful if we each come up with an API and behaviors that we think make the most sense, then we can compare. I think we can separate and postpone incidence matrix conversions until after we do adjacency matrix conversions, but of course we can consider incidence matrices too if anyone gets inspired! |
Wouldn't this be handled by a simple one-valued row? (so the column with the only value is the self-edge) |
This is not directly relevant to the discussion here, but since incidence matrices were brought up, there is a relationship between incidence and adjacency matrix given here:
I don't have the book referenced there, but at least for the special case of $ L=2I $, the corresponding value for incidence matrix is $ C=2I $.
|
The relationship I'm more familiar with the relationship connecting the incidence matrix For unweighted, undirected graph:
For unweighted, directed graph:
So, for these to hold, I think the single value in a column for a self-edge should be |
Hi @eriknw, I am unable to find an email address for you. Could you please contact me at szhorvat at gmail.com, to discuss a proposal using python-graphblas? |
I noticed that when converting an undirected and unweighted graph from NetworkX, each self loop is represented by 1 (and not 2) on the diagonal. Example:
The result is this matrix:
I would have expected:
Adding self-loops once to the diagonal is not incorrect per se, but it tends to be mathematically inconvenient as many identities that people expect to hold will be broken. For example, do we consider an undirected graph to be equivalent to a directed one where each undirected edge was replaced by two reciprocal directed ones? That would no longer be the case. Are the degrees obtained as row or column sums? That does not work. Is the number of edges in an undirected graph half the sum of the matrix elements? No. Consider a random walk process on the graph, which is the underlying model for many network analysis concepts such as PageRank. Can the walker traverse a self-loop in both directions? Adding one to the diagonal implies no.
One can certainly build a consistent system while considering self-loops only once in undirected graphs, but the outcome tends to be a bit less intuitive, and a bit harder to understand for people who just want to do network analysis. Several of the network analysis libraries are unfortunately inconsistent: some analysis functions implicitly consider self-loops once, some others twice. Thus be very careful with following others' example.
I would suggest considering this choice very carefully, and making sure that the systems you design (e.g. graphblas-algorithms) behave consistently and intuitively.
pyOpenSci/software-submission#81
The text was updated successfully, but these errors were encountered: