The COINST tool: extracting and visualizing coinstallability kernels for GNU/Linux distributions

From this page you can access a selection of results of the analysis of GNU/Linux distributions computed using the COINST tool. This tool has been written by Jérôme Vouillon.

The theory

COINST embodies the transformations described in article "On Software Component Co-Installability". You can also view an extended version of this paper with full proofs. These proofs have been machine-checked in Coq.

The practice

COINST reads the metadata of a GNU/Linux distribution (both RPM and DEB formats are supported), and computes the co-installability kernel, which is orders of magnitudes smaller than the original component graph, and yet contains all the information necessary to identify components which are not co-installable. The table below, extracted from the article, enumerate the number of packages, dependencies and conflicts in the input and the output of the tool.

Debian Ubuntu Mandriva
beforeafter beforeafter beforeafter
Packages 289191038 7277100 760184
Dependencies 124246619 3106929 385998
Conflicts 1146985 8260 7862

You can see in the following list the output graph produced for mainstream GNU/Linux distributions; by clicking on the slider bar on the left of each graph (or by using the mouse wheel), you can zoom in and out at leisure. Each node in the graph represent a set of packages which behave the same as far as co-installability. For instance, a node labelled maven-scm-test (x 59) stands for 59 packages, including package maven-scm-test. Dependencies between packages are indicated by blue arrows. A disjunctive dependency is represented by using an intermediate circular node with label . A red edge indicates a conflict between two packages. Clique of conflicts (that is, set of packages mutually in conflict) are displayed by connecting the conflicting packages to a circular node with label #.

Mandriva 2010.1
This graph corresponds to the Mandriva distribution. You can see (at the top) that 59 packages cannot be installed, presumably because they depend on contributed packages, not included in the core of the distribution. On the other hand, 7347 packages can always be installed (isolated node just below). One can see many cliques corresponding to different versions of a library, which are mutually in conflict.
Ubuntu 10.10 alpha 2
This graph corresponds to a beta release of the Ubuntu distribution. Several issues are immediately visible. For instance, the KDE packages libtaskmanager4 and libplasmaclock4 are both in conflict with the important package kde-minimal. Other packages such as libgl1-mesa-swx11 or libeina-svn-05 are similarly problematic. Near the bottom of the graph, one can see that package openoffice.org-thesaurus-fr has incorrect dependencies: it depends on either of two unrelated packages. Package openoffice.org-voikko also appears unusable, as it conflicts with openoffice.org-core.
Debian testing - 2010-08-22
This graph corresponds to the Debian distribution. The distribution is several time larger than the previous ones, and this is reflected in the output graph. Another point is that while few alternatives are left in other distributions (for instance, Ubuntu provides only two mail servers), the Debian distribution is very inclusive. Thus, it includes packages with a large number of conflicts. The most visible one is liboss-salsa-asound2, which is in direct conflict with 1350 other packages. Our tool makes it possible to ignore (manually) some of these packages in order to make the graph more readable.
Debian testing - 2010-08-22 (pruned)
This is the same Debian distribution as above, but with the following packages removed: liboss-salsa-asound2, libjpeg8-dev, heimdal-dev, libqt4-phonon, gdb-minimal, inetutils-ping. By just removing these few packages, one gets a much clearer graph, which can be inspected visually for problems.
Focusing on libhdf5-openmpi-1.8.4
The tool also makes it possible to focus on a package, showing only packages that may prevent its installation. Here, one can see that it is not possible to simultaneously install the three packages libhdf5-openmpi-1.8.4, libopal-dev, and octave-pkg-dev. However, any two of them can be installed together.

Installing the tool

The source code for this tool is available on Github. In order to compile it, you need the ocaml compiler and the Findlib library. If you want to build our graph viewers, you will need the Lablgtk2 and the Cairo bindings for the GTK viewer, and Js_of_ocaml for the web-based viewer.

Running the tool on your choice of metadata

You can perform the following steps to compute and vizualize a coinstallability kernel.

  1. Open a terminal window.

  2. Run the following command to compute a coinstallability kernel.

    zcat ~/snapshots/maverick-beta/* | coinst -all -o g.dot

    The coinst tool takes package information from standard input. It writes on standard output:

    The coinstallability kernel is written as a dot graph to file g.dot (-o option). The -all option tells coinst to include all packages in the output (packages with simple conflict configurations are omitted by default).

  3. Lay out the graph, using the Graphviz dot tool.

    dot g.dot -o graph.dot
  4. Display the graph. We use a custom graph viewer, which can deal with large graphs.

    coinst_viewer graph.dot

You need to use the -rpm option to parse Mandriva hdlist.cz files.

zcat ~/snapshots/Mandriva_2010.1/* | coinst -rpm -all -o g.dot

The -ignore option can be used to ignore a package.

zcat ~/snapshots/testing-2010-08-22/* | coinst -ignore liboss-salsa-asound2 -o g.dot

To focus on a particular package, use the -root option.

zcat ~/snapshots/testing-2010-08-22/* | coinst -root libhdf5-openmpi-1.8.4 -o g.dot

Finally, the following command generates a scene.json file suitable to use with the Web version of our viewer.

coinst_converter graph.dot > scene.json