Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A faster Installed-First policy #55

Closed
wants to merge 35 commits into from

Conversation

johntyree
Copy link
Contributor

This PR introduces a Policy that provides "installed first" behavior to the dependency solver, as well as a few helper classes.

Benchmarks are in issue #52.

It is implemented as a giant priority queue where the priorities split into three bands, in order from highest to lowest:

  1. Installed packages
    • Packages which are already installed on the system.
  2. Required packages
    • Packages which have been explicitly requested.
    • This is lower priority than the installed packages because we don't want to unnecessarily upgrade things.
  3. Everything else.

Within each group, packages are ordered descending by version, such that newer packages are tried first.

I'm currently using the notion that version objects are always comparable to define an ordering over all packages and assign priorities based on that, rather than grouping packages by name. In addition to being fast, easy to write, and easy to understand, it leads to a few interesting properties:

Things it does

  • it suggests packages with large version numbers earlier, i.e. curl-7.3.0 comes well before Biggus-0.7.0.
  • This was very easy to implement, but there's no good reason to do it this way. At best it doesn't hurt us. There are plenty of ways to make this smarter.
  • it suggests packages interleaved with each other, i.e. pep8 1.4.6 then scipy 0.15.0 then pep8 0.6.1,
  • high priority packages are suggested repeatedly until all options are exhausted,
  • not crash on Crash with simple >= request #42,
  • it can totally ignore the clause literals and still be quite fast.

To see this play out, there's a PolicyLogger class that is currently wrapping InstalledFirstPolicy and keeping track of things.

Things it doesn't do

  • Anything smart by examining clause instances themselves,
  • Anything clever with package metadata (i.e. should we suggest packages with few or many dependencies first?)

It currently suffers from the unrelated backtracking bug that motivated #54, and passes all tox tests with that patch applied.

If this is suitable, I would think that this closes #42 and #52.

@johntyree
Copy link
Contributor Author

TravisCI is expected to fail until #54 is merged.

@@ -52,12 +52,7 @@ def _create_rules(self, request):
pool.package_id(package)
for package in pool.what_provides(requirement)
]
self._policy.add_packages_by_id(requirement_ids)

# Add installed packages.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_requirements now implies hard requirements, not just packages we should prefer if possible. The Policy gets a reference to installed_repository and can figure out on its own which packages are currently installed if it wants to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from an API POV, I wonder if it would not be simpler to remove add_packages_by_id altogether, and make it an implementation detail (called at creation time from a passed installed_repository).

@johntyree
Copy link
Contributor Author

I marked the iris test as an expected failure for now.

@johntyree
Copy link
Contributor Author

Failures are due to AssignmentSet having an unspecified iteration order. Having looked at all the places where it is used, it's not clear to me why we need that.

Surely the Policy should not be relying on whatever order MiniSatSolver has decided to put things into the assignment dict (otherwise it's really the MiniSatSolver that is doing the deciding) and it appears that everyone else only uses it for lookups.


import six

from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: let's put stdlib import first

@cournape
Copy link
Contributor

cournape commented Nov 5, 2015

The PR is a bit big, can we split into at least 2 ?

  • the assignment optimisation (that one should be easy to review/accept now)
  • the new policy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash with simple >= request
3 participants