Ticket #500 (new enhancement)

Opened 4 years ago

Last modified 4 years ago

develop a tool to find the maximal consistent set of hackage packages

Reported by: duncan Owned by:
Priority: normal Milestone:
Component: miscellaneous Version:
Severity: normal Keywords:
Cc: dbueno Difficulty: project(> week)
GHC Version: Platform:

Description

To a great extent the point of libraries is code re-use. It follows that is it useful to be able to install and use two packages simultaneously. The extension of this is that we would like to know the maximal set of packages that we can install consistently. Them being consistent implies that we can use any of them in another package and have consistent dependencies.

If we knew a package was within the maximal subset then we can give it a positive mark on hackage. We may also be able to encourage package authors to update their dependencies to as to get their packages within this set.

Calculating the maximal consistent subset is certainly an NP-complete problem. It is at least as hard as working out if any single package is installable and that is already NP-complete. However we do not necessarily have to get an exact maximal set, approximation would probably do. We would also like to influence the choice of the set by the importance of packages, especially when it comes to resolving choices about divisive packages (see #499).

We would also like to influence the choices by manually supplied preferences.

This can probably be implemented as an extension of the standard package dependency resolution algorithm and using the code developed for ticket #499.

Change History

in reply to: ↑ description ; follow-up: ↓ 5   Changed 4 years ago by michiexile

If we assume that we can have "installable with..." as a primitive operation (and package THAT instance of NP in there), then it should be easy to build a graph with vertices the packages and edges given by an edge package1 -- package2 precisely when there is some set of dependency fulfillment that is fulfilled by all of them.

We can label each edge with the dependency intervals under which it exists.

This graph can be extended to a simplicial complex by inserting a simplex (you can view it as a distinguished clique) whenever all the edges of it have a nontrivial intersection of their labels.

A dependency list, here, will end up being a sequence of intervals - one for each package in the dependency list. Simultaneous dependencies are given as intersections of these intervals; with an absent interval being replaced, when needed, by a full (i.e. [-∞,∞]) interval.

As a worked example, consider packages p1 through p5, and with the following dependencies:

p1: none
p2: 1 in [1.0,1.5]
p3: 1 in [1.0,1.0], 2 in [1.0,2.0]
p4: 1 in [1.5,1.5]
p5: 1 in [1.5,1.5], 4 in [1.0,2.0]

We find all edges between these packages: 12: 1 in [1.0,1.5]
13: 1 in [1.0,1.0], 2 in [1.0,2.0]
14: 1 in [1.5,1.5]
15: 1 in [1.5,1.5], 4 in [1.0,2.0]
23: 1 in [1.0,1.0], 2 in [1.0,2.0]
24: 1 in [1.5,1.5]
25: 1 in [1.5,1.5], 4 in [1.0,2.0]
34: conflicting dependencies for 1 - edge absent
35: conflicting dependencies for 1 - edge absent
45: 1 in [1.5,1.5], 4 in [1.0,2.0]

Now, we can look at 3-cliques in the resulting graph: 123: 1 in [1.0,1.0], 2 in [1.0,2.0]
124: 1 in [1.5,1.5]
125: 1 in [1.5,1.5], 4 in [1.0,2.0]
145: 1 in [1.5,1.5], 4 in [1.0,2.0]
245: 1 in [1.5,1.5], 4 in [1.0,2.0]

And 4-cliques - all of which should avoid any already disqualified edges: 1245: 1 in [1.5,1.5], 4 in [1.0,2.0]

and that's it for this example.

Now, there is a polynomial time clique-finding graph; and intersections of intervals is reasonably easy to do as well - so this looks like a possible way to structure something like this.

  Changed 4 years ago by dbueno

This would fit in as a standard query if one had a pseudo-boolean based package dependency framework, like what I began to develop on my own branch of cabal-install. The idea is to really use a SAT solver (or pseudo-boolean solver if you want to solve optimisation problems like this one). I have that branch on my local machine; there are still some bugs and only SAT support -- but I would like to continue development. If anyone would like to help, even just to discuss how to encode the package constraints as a SAT problem, please mail me. My email is my trac username at gmail.com.

  Changed 4 years ago by dbueno

  • cc dbueno@… added

  Changed 4 years ago by dbueno

  • cc dbueno added; dbueno@… removed

in reply to: ↑ 1   Changed 4 years ago by duncan

Replying to michiexile:

If we assume that we can have "installable with..." as a primitive operation (and package THAT instance of NP in there), then it should be easy to build a graph with vertices the packages and edges given by an edge package1 -- package2 precisely when there is some set of dependency fulfillment that is fulfilled by all of them.

Hmm, I'm not sure about this. Solutions do not combine easily here. Just because we can find a way to install package P1 and a way to install package P2 does not mean we can combine those solutions to make a solution for package P1 and P2. If we could then the problem would be polynomial.

Perhaps I don't understand the algorithm you're presenting but it looks like it could also be used to solve the normal problem of package planning in polynomial time. That's a bit suspicious because we know that problem is NP-complete.

Perhaps you can present the idea in more detail somewhere.

Note: See TracTickets for help on using tickets.