Ticket #288 (new defect)

Opened 6 months ago

Last modified 4 months ago

the package indexes are very slow

Reported by: duncan Assigned to:
Priority: normal Milestone:
Component: cabal-install tool Version: HEAD
Severity: normal Keywords:
Cc: Difficulty: easy (<4 hours)
GHC Version: 6.8.2 Platform:

Description

In a large run, eg trying to make a plan to install 560 packages from hackage:

$ cabal install --dry-run $(cat pkgs)

it turns out (according to the ghc profile), 91% of the time is spent reading the index of installed and available packages.

The ghc package index is a couple of massive text files in Read/Show format so that takes for ever to read. The available package index is the tarball of .cabal files and our .cabal file parser is really slow.

For smaller runs it's not so bad:

$ cabal install --dry-run xmonad

since we only have to inspect the subset of the available package index that make up xmonad's transitive deps (any versions thereof), so that allows us to avoid forcing most of the index.

http://hackage.haskell.org/trac/ghc/ticket/2089

might help us, but then again maybe not if we still have to parse the result of calling ghc-pkg since that will give us another text format.

For our own package index, perhaps we should be generating a cache in some other format when we download the package index.

Change History

06/07/08 07:43:17 changed by duncan

Partially fixed for common cases with:

Sat Jun  7 15:39:13 BST 2008  Duncan Coutts <duncan@haskell.org>
  * Only inspect the needed parts of the installed and available indexes
  The available package index is loaded lazily so if we can avoid
  forcing all the packages then we can save a huge amount of slow text
  parsing. So select out the maximal subset of the index that we could
  ever need based on the names of the packages we want to install. For
  the common case of installing just one or two packages this cuts
  down the number of packages we look at by a couple orders of
  magnitude. This does not help with the installed index which is read
  strictly, though most people do not (yet) have hundreds of installed
  packages, so that's less of an immediate problem.

08/12/08 11:35:50 changed by duncan

  • difficulty changed from hard (< 1 day) to easy (<4 hours).
  • version changed from 1.2.3.0 to HEAD.

For installed packages index, we should parse the output of ghc-pkg dump lazily. See #311.

All it needs is to split on "\n---" as we do now, but then instead of directly parsing, we should extract only the name and version fields and then parse the rest lazily.