-
Notifications
You must be signed in to change notification settings - Fork 24
Test solver against random graphs #165
Comments
One way to construct "worst case DAGs" for version control operations is to generate a graph in the form of a N-dimensional grid or lattice structure. This effectively makes every "non-edge" commit in the DAG a merge commit. Here is a simple example: The numbers are the commits, edges indicate
VCS systems often have problems involving selecting ranges of commits and constructing these "worst-case" lattices can be used to test the performance of the operations. For the dependency resolution problem a similar structure could be used. However, there are some wrinkles.
However, we do not need to necessarily generate only hard instances. It may be reasonable to simply generate large instances. From reading the code it looks like the current testing infrastructure could be used to generate random graphs with some caveats. First, we may not know what the correct solution is but we could ensure only solvable graphs are generated. It could be a secondary test to ensure only non-solvable graphs are generated. |
Thanks, I do think this helps frame the problem.
Definitely - defining the structure of the random graphs strikes me as being what will probably take the most work. It also demands attention because it will need some design for flexibility - we are likely to add new dimensions to the solver over time, which the random graph generator would need to reflect, if it is to retain its utility. Whereas that worst-case lattice you describe for VCS operations describes only a single dimension - the Packages can't vary versions independently of their project, though - we enforce the requirement that all packages from a given project must use the same version. (I think the motivation for maintaining that invariant should be clear here, given the reduction in size of the search space it induces). The solver itself more or less treats the projects as vertices and the packages as properties of those vertices, but that doesn't necessarily mean that's the right model for a random generator. That's the base shape of the graph. Then there's the question of how to model the satisfiability constraints themselves.
Hah, interesting! I've got your paper bookmarked, I'll read through it when I get a chance. But...
This is crucial, and why I was inclined towards a random generator in the first place: I'd go further to say that we expressly don't care about generating hard instances, because I don't think that hard instances have much to do with the shape of real depgraphs. I suspect - but desperately want evidence - that if we were to analyze the universal Go import graph, we would find it is extremely sparse, with a relatively small number of densely-imported hotspots (e.g., github.com/sirupsen/logrus). I think an ideal approach to this problem would work from aggregate data about the global depgraph, including information like:
(by no means exhaustive list) With this kind of information on hand, we might be able to guide the random graph generator towards creating graphs that are more realistic.
Great! Though note that the "bimodal" fixtures are probably the better ones to go on, as they incorporate the crucial project/package relationship I described above. The basic ones collapse that distinction - they pretend that all projects have exactly one package.
If that's a guarantee we can make, it'd be amazing. 👍 👍 👍 |
@sdboyer That paper is really more about cryptography. Connamacher and Molloy's paper is a better reference. Doing some examples out by hand, the lattice approach looks promising as you can set it up to be:
However, that might actually be easy to solve. The hardest instances will be when there are many "promising" partial solutions but only one solution which is difficult to find by any heuristics. I will think more about how to model this. |
Great. It's the weird cases in the middle that'll get us. |
After giving this some more thought.... So previously we were thinking of dependency resolution as constraint solving and trying to generate conflicts. However, this is not the only way to look at. We can also look at in terms of a slightly wierd form of the subgraph isomorphism problem. This quick post will just go through an example. I am going to work on an actual algorithm maybe tomorrow. I came up with this idea over lunch and I thought I would put it here before I lost the scrap of paper I wrote it on. ExampleDependencies for project
This leads to the following graph of the dependencies: To find a solution we need to match (that is find a subgraph isomorphism mapping) the following "grammar." The grammar describes the shape of the possible solutions:
This leads to 4 candidate graphs to match against the graph of project dependencies To determine the candidate solutions, match the graphs and find the subgraph isomorphism mappings: The first middle candidate solution is invalid because version EDIT: There is a bug in the example above. I did this by hand and I forgot to put an edge in the graph to get the desired result. will fix later. |
Just to chime in: I think it's important to generate invalid graphs as well. Attempts will fail, and it'd be best if they completed in a sane amount of time. |
It certainly makes sense that there would be a way to express the problem here as a weird version of subgraph isomorphism - both are NP-complete. The jump I'm not quite clear on is why that's helpful for the random input generation problem; as-is, this reads to me more like a thought on changing the core algorithm itself. Which is a discussion we can have, sure, but not what this issue is for 😄 But let's assume I've just missed a connection there, which is fine. I've got a few thoughts on the basic model you seem to be headed for. First:
Honestly, the last image that really makes much sense to me is the second one. That makes it seem like you're trying for a sort of first-pass check to judge, by shape, as to whether a graph even could be a solution, without having to look at versions at all. But...well, yeah, I'm just not sure where it's going, or how it helps us generate random graphs 😄 Looking forward to seeing more when you have it! |
@sdboyer I will do a longer response later but I wanted to address one thing:
The selector (eg. |
Sure, ofc, the actual problem we're solving will always be finite. Sounds like this probably isn't significant to our current concerns, then :) Thanks! |
bump :) |
This issue was moved to golang/dep#422 |
Almost all of the tests I've written for gps' solver thus far are geared towards checking some aspect of correctness of the solver. Testing its performance has not been a priority. But the solver is tasked with constraint solving - an NP-hard search problem. Its performance (against real graphs) is something we have to care very much about.
Of course, there's a chicken-or-egg problem there - getting real graphs is difficult when the ecosystem doesn't exist yet for us to test against. Even when it does, though, the shape of the ecosystem is bound to change over time. It would be ideal if we had a way to get out ahead of solver performance issues, rather than having to wait for them to arise.
It seems to me that a potentially good approach to "getting ahead" in this way is to invest in a system for testing the solver against random dependency graphs. It's not trivial to do so - the generator would need to work across several dimensions of variables - but it shouldn't be crazy hard to do, either. The crucial bit will be exporting the generated graphs into valid Go code, probably in the same form used to declare the handcrafted correctness-checking fixtures that exist today.
The basic loop would be something like this:
This would be a great tool in our arsenal, though I'm not sure when I'd have time to get to it myself. Help would be awesome. I'm happy to add more info here, and generally provide guidance as needed to anyone interested.
The text was updated successfully, but these errors were encountered: