Ticket #1839 (closed merge: fixed)

Opened 6 years ago

Last modified 5 years ago

need ghc-pkg dump feature

Reported by: duncan Owned by: igloo
Priority: normal Milestone: 6.8.3
Component: Compiler Version: 6.8.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Cabal already needs quite a bit of information from ghc-pkg about the package database. In the future we would like even more information. It would be more efficient for Cabal to ask ghc-pkg once for all the information rather than to ask lots of little questions. (ghc-pkg is quite slow when lots of packages are registered)

For each package we want to know it's name, version, what packages it depends on, any C flags or include dirs, the exposed modules, the haddock html & interface dir.

We also need to be able to ask this information for a specific package database and get results for only that db. This is because Cabal needs to distinguish global and user packages. In future when we want to build several packages inplace we'll also want to use specific inplace package dbs.

So a dump command should probably just describe every package in a specified db. We already know how to parse an InstalledPackageDescription? which is the format that ghc-pkg describe produces. One potential problem is distinguishing package boundaries if all the package descriptions are concatenated. Perhaps something simple like a blank line would suffice.

So here's a concrete suggestion:

ghc-pkg dump --global

should produce the concatenation (with blank line separators) of ghc-pkg describe for each package in the global package db. Note that is should list only those packages in the global db, not those from the user db.

ghc-pkg dump --user
ghc-pkg dump --package-conf=foo.package.conf

So these should do similarly, but act only upon the specified package dbs.

Attachments

Main.hs.diff Download (10.7 KB) - added by guest 6 years ago.
diff -u patch to utils/ghc-pkg/Main.hs, implementing --bulk

Change History

follow-up: ↓ 2   Changed 6 years ago by igloo

  • milestone set to 6.10 branch

I'd expect a dump command to give data in a lower-level format, e.g. Show/Read.

I think the ghc-pkg flags have grown a bit crufty; perhaps we should rethink them for 6.10?

in reply to: ↑ 1   Changed 6 years ago by duncan

Replying to igloo:

I'd expect a dump command to give data in a lower-level format, e.g. Show/Read.

Well I don't see that it makes a great deal of difference. We have parsers and printer functions for both formats anyway. The Read/Show format is more fragile when it comes to adding fields. The Read/Show format is not necessarily faster. The non-Read/Show format is easier to read and debug I think.

I think the ghc-pkg flags have grown a bit crufty; perhaps we should rethink them for 6.10?

Yes. For example it is currently extremely hard to find the packages of a given name in a specific package database. The list and field commands always search all packages (global, user, explicitly specified).

  Changed 6 years ago by guest

instead of seeing this as a dump feature, one might see it as another example of "it would be nice if ghc-pkg could handle bulk queries", something i've often wanted myself (#1463, point 1, is another example).

i've implemented a -bulk option for ghc-pkg, which features the following changes (from usageHeader):

  "  $p list [pkg]\n" ++
  "    List registered packages in the global database, and also the\n" ++
  "    user database if --user is given. If a package name is given\n" ++
  "    All the registered versions will be listed in ascending order.\n" ++
  "    Accepts package patterns if --bulk is given.\n" ++
  "    Accepts the --simple-output flag.\n" ++
  "\n" ++
  "  $p find-module {module}\n" ++
  "    List registered packages exposing module {module} in the global\n" ++
  "    database, and also the user database if --user is given. \n" ++
  "    All the registered versions will be listed in ascending order.\n" ++
  "    Accepts module patterns if --bulk is given.\n" ++
  "    Accepts the --simple-output flag.\n" ++
  "  $p describe {pkg-id}\n" ++
  "    Give the registered description for the specified package. The\n" ++
  "    description is returned in precisely the syntax required by $p\n" ++
  "    register. Accepts package patterns if --bulk is given.\n" ++
  "\n" ++
  "  $p field {pkg-id} {field}\n" ++
  "    Extract the specified field of the package description for the\n" ++
  "    specified package. Accepts package patterns and comma-separated\n" ++
  "    multiple fields if --bulk is given.\n" ++

(without --bulk, all commands behave as they did before). that means we can

-- list all regex packages
$ ./ghc-pkg-inplace list regex --bulk
c:/fptools/ghc/driver/package.conf.inplace:
    regex-base-0.72.0.1, regex-compat-0.71.0.1, regex-posix-0.72.0.2

-- list all packages exposing Data.* modules
$ ./ghc-pkg-inplace find-module ^Data --bulk
c:/fptools/ghc/driver/package.conf.inplace:
    array-0.1, base-3.0, bytestring-0.9, containers-0.1, fgl-5.4.1.1,
    (ghc-6.9.20071106), packedstring-0.1, time-1.1.2.0

-- list all packages exposing Monad modules
$ ./ghc-pkg-inplace find-module Monad --bulk
c:/fptools/ghc/driver/package.conf.inplace:
    base-3.0, cgi-3001.1.5.1, fgl-5.4.1.1, (ghc-6.9.20071106),
    haskell-src-1.0.1.1, haskell98-1.0.1, mtl-1.1.0.0, stm-2.1.1.0

-- list all package maintainers or lack of them
$ ./ghc-pkg-inplace --bulk field . name,maintainer
name: rts
maintainer: glasgow-haskell-users@haskell.org
name: base
maintainer: libraries@haskell.org
name: array
maintainer: libraries@haskell.org
name: packedstring
maintainer: libraries@haskell.org
name: containers
maintainer: libraries@haskell.org
name: bytestring
maintainer: dons@cse.unsw.edu.au, duncan@haskell.org
name: old-locale
maintainer: libraries@haskell.org
name: old-time
maintainer: libraries@haskell.org
name: filepath
maintainer:
..

-- list all haddock-html fields (#1463)
$ ./ghc-pkg-inplace --bulk field . haddock-html
haddock-html:
haddock-html: c:\fptools\ghc\libraries\base\dist\doc\html\base
haddock-html: c:\fptools\ghc\libraries\array\dist\doc\html\array
haddock-html: c:\fptools\ghc\libraries\packedstring\dist\doc\html\packedstring
haddock-html: c:\fptools\ghc\libraries\containers\dist\doc\html\containers
haddock-html: c:\fptools\ghc\libraries\bytestring\dist\doc\html\bytestring
haddock-html: c:\fptools\ghc\libraries\old-locale\dist\doc\html\old-locale
haddock-html: c:\fptools\ghc\libraries\old-time\dist\doc\html\old-time
haddock-html: c:\fptools\ghc\libraries\filepath\dist\doc\html\filepath
haddock-html: c:\fptools\ghc\libraries\directory\dist\doc\html\directory
haddock-html: c:\fptools\ghc\libraries\Win32\dist\doc\html\Win32
haddock-html: c:\fptools\ghc\libraries\process\dist\doc\html\process
haddock-html: c:\fptools\ghc\libraries\pretty\dist\doc\html\pretty
haddock-html: c:\fptools\ghc\libraries\hpc\dist\doc\html\hpc
haddock-html: c:\fptools\ghc\libraries\template-haskell\dist\doc\html\template-haskell
haddock-html: c:\fptools\ghc\libraries\Cabal\dist\doc\html\Cabal
haddock-html: c:\fptools\ghc\libraries\random\dist\doc\html\random
haddock-html: c:\fptools\ghc\libraries\haskell98\dist\doc\html\haskell98
haddock-html: c:\fptools\ghc\libraries\regex-base\dist\doc\html\regex-base
haddock-html: c:\fptools\ghc\libraries\regex-posix\dist\doc\html\regex-posix
haddock-html: c:\fptools\ghc\libraries\regex-compat\dist\doc\html\regex-compat
haddock-html: c:\fptools\ghc\libraries\parsec\dist\doc\html\parsec
haddock-html: c:\fptools\ghc\libraries\haskell-src\dist\doc\html\haskell-src
haddock-html: c:\fptools\ghc\libraries\html\dist\doc\html\html
haddock-html: c:\fptools\ghc\libraries\network\dist\doc\html\network
haddock-html: c:\fptools\ghc\libraries\QuickCheck\dist\doc\html\QuickCheck
haddock-html: c:\fptools\ghc\libraries\HUnit\dist\doc\html\HUnit
haddock-html: c:\fptools\ghc\libraries\mtl\dist\doc\html\mtl
haddock-html: c:\fptools\ghc\libraries\fgl\dist\doc\html\fgl
haddock-html: c:\fptools\ghc\libraries\time\dist\doc\html\time
haddock-html: c:\fptools\ghc\libraries\OpenGL\dist\doc\html\OpenGL
haddock-html: c:\fptools\ghc\libraries\GLUT\dist\doc\html\GLUT
haddock-html: c:\fptools\ghc\libraries\stm\dist\doc\html\stm
haddock-html: c:\fptools\ghc\libraries\xhtml\dist\doc\html\xhtml
haddock-html: c:\fptools\ghc\libraries\cgi\dist\doc\html\cgi
haddock-html: c:\fptools\ghc\libraries\parallel\dist\doc\html\parallel
haddock-html: c:/fptools/ghc/libraries/ghc/html

of course, we can also do ./ghc-pkg-inplace describe . --bulk to get the kind of full dump this ticket asks for (the name: field starts each record).

the only problem i have is that this requires Text.Regex, which is in a non-boot package. any suggestions?

i'll attach the output of darcs diff -u Main.hs in utils/ghc-pkg/, in case someone wants to play or help out with the changes.

claus

Changed 6 years ago by guest

diff -u patch to utils/ghc-pkg/Main.hs, implementing --bulk

follow-up: ↓ 6   Changed 6 years ago by simonmar

My only criticism of --bulk is that it is strangely named. How about renaming it to --regex, or just allowing '/regex/' to be given as a package name?

I don't know what to do about the added dependency on Text.Regex, that is indeed a problem.

  Changed 6 years ago by duncan

I'm not clear as to why we need regexp matching. Seems to me we usually want to ask about a specific package, or all packages, or very occasionally about a specific list of packages.

Again, as far as Cabal is concerned I think it's simpler just to get all info and then Cabal can ask whatever questions it wants without having to come back and ask ghc-pkg repeatedly.

in reply to: ↑ 4   Changed 6 years ago by claus

Replying to simonmar:

My only criticism of --bulk is that it is strangely named. How about renaming it to --regex, or just allowing '/regex/' to be given as a package name? I don't know what to do about the added dependency on Text.Regex, that is indeed a problem.

i've emailed an alternative patch that uses substring matching only, no regex dependency. not quite as versatile, but probably sufficient for most uses.

 http://www.haskell.org/pipermail/cvs-ghc/2007-November/039599.html

follow-up: ↓ 8   Changed 5 years ago by duncan

I know we've made changes to ghc-pkg in the HEAD branch. I'd still very much like to see a simple dump feature included in ghc-pkg in time for ghc-6.8.3.

Cabal does not need any complex query interface but it does at a minimum need to know what versions of what dependencies each existing package uses. This is necessary to avoid or at least detect problems where we build a package that uses two other packages that were built against different versions of another common package.

The most common example in the wild at the moment is people adding bytestring-0.9.0.4 in addition to the bytestring-0.9.0.1 that came with ghc-6.8.2 and then half their packages are built against one version and half against the other and when we try to use packages built against different versions we get a type error about bytestring-0.9.0.1:Data.ByteString.ByteString not being the same as bytestring-0.9.0.4:Data.ByteString.ByteString.

Cabal could and should detect this situation, and cabal-install should plan around it in its dependency resolution algorithm. However in both cases we require the dependency information of the existing installed packages.

Not that many people are triping up over this just yet, but I think with more hackage automation we will. Especially with automated hackage QA where we want to build every version of every package to see if it works.

As I say, I think we want this on the 6.8.3 timescale, rather than having to wait for 6.10. If this 'bulk' feature is going to go in for 6.8.3 then fine, otherwise I suggest the simple 'dump' command that I suggested originally.

in reply to: ↑ 7   Changed 5 years ago by claus

Replying to duncan:

I know we've made changes to ghc-pkg in the HEAD branch. I'd still very much like to see a simple dump feature included in ghc-pkg in time for ghc-6.8.3. As I say, I think we want this on the 6.8.3 timescale, rather than having to wait for 6.10. If this 'bulk' feature is going to go in for 6.8.3 then fine, otherwise I suggest the simple 'dump' command that I suggested originally.

we had some more discussion in early december, and i sent my final version of the patch to cvs-ghc on 03/12/,

 http://www.haskell.org/pipermail/cvs-ghc/2007-December/040020.html

simon has since confirmed that the patch looks fine, but could do with some refinements of configuration, for mingw location detection, and perhaps without the provision for restoring globbing.

 http://www.haskell.org/pipermail/cvs-ghc/2008-January/040407.html

as i said in december, i leave those tweaks and decisions to the folks at ghc hq - as far as myself and functionality were concerned, the patch is done!-) there shouldn't be any need for a separate dump feature to get something soon.

  Changed 5 years ago by igloo

  • milestone changed from 6.10 branch to 6.8.3

  Changed 5 years ago by simonmar

It seems that CRT_noglob.o can be generated by compiling

unsigned long _CRT_glob = 0;

so to avoid needing to find the location of MingW or to ship CRT_noglob.o with GHC, I'm going to just add this to utils/ghc-pkg. That seems the easiest solution to me.

  Changed 5 years ago by simonmar

  • owner set to igloo
  • type changed from feature request to merge

To merge:

Mon Jan 21 08:17:44 PST 2008  claus.reinke@talk21.com
  * FIX #1839, #1463, by supporting ghc-pkg bulk queries with substring matching
Tue Jan 22 08:18:11 PST 2008  Simon Marlow <simonmar@microsoft.com>
  * This goes with the patch for #1839, #1463

  Changed 5 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Both merged.

  Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

  Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple
Note: See TracTickets for help on using tickets.