Ticket #447 (assigned enhancement)

Opened 4 years ago

Last modified 13 months ago

build multiple packages in parallel

Reported by: duncan Owned by: refold
Priority: high Milestone: cabal-install-0.16
Component: cabal-install tool Version:
Severity: normal Keywords:
Cc: Difficulty: normal
GHC Version: 6.8.3 Platform:

Description

The latest version of the gentoo portage tool is rather slick. It can do parallel builds and it displays a nice summary on the command line, eg:

# emerge -uD system -j --load-average=4.5
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Starting parallel fetch
>>> Emerging (1 of 14) dev-libs/expat-2.0.1-r1
>>> Emerging (2 of 14) sys-devel/autoconf-wrapper-6
>>> Emerging (3 of 14) sys-kernel/linux-headers-2.6.27-r2
>>> Installing sys-devel/autoconf-wrapper-6
>>> Jobs: 0 of 14 complete, 1 running  Load avg: 2.99, 1.59, 0.67

Note how they solve the problem of how to display what is going on when there are multiple builds happening. The answer is not to display it at all! This would have to go hand-in-hand with logging all builds so that we can still diagnose failures.

Note the final line, that gets updated to display the current number of jobs running, the number completed etc. It also shows the load average. The job scheduler has two parameters, one is a maximum number of jobs (or unlimited) and the other is a load average. It will only launch new jobs if the load average is less than the given maximum. That allows it to interact reasonably well with builds that use make -j internally. In the example above I set the load average to be just slightly more than the number of CPUs I've got.

It looks to me like it serialises some bits, like installing, since saturating the disk with multiple parallel installs is generally of no benefit, indeed it can be slower. Also downloads seem to be serialised, again because there is probably little benefit to making multiple connections to the same server.

Anyway, the point is, cabal-install ought to be able to do all this. Some bits we can do now. We already have a graph representation of the install plan and we recalculate when a package fails to install.

We will need an improved download api, probably involving sending requests off to a dedicated download thread (which would serialise them).

Attachments

parallel.patch Download (76.5 KB) - added by SamAnklesaria 2 years ago.
partial, hypothetical implimentation lacking suppressed output and command line flags
par-install.dpatch.gz Download (124.9 KB) - added by refold 19 months ago.
Implementation
par-install-take2.dpatch.gz Download (55.2 KB) - added by refold 14 months ago.

Change History

Changed 4 years ago by duncan

  • summary changed from do parallel builds to build multiple packages in parallel

Changed 2 years ago by SamAnklesaria

partial, hypothetical implimentation lacking suppressed output and command line flags

Changed 2 years ago by refold

  • owner set to refold
  • status changed from new to assigned

Changed 2 years ago by refold

Changed 19 months ago by refold

Current status (for those interested): Building multiple packages in parallel  was implemented, but the patches are not merged into the mainline as of yet; I'm now  working on parallelising 'cabal build'.

Changed 19 months ago by refold

Implementation

Changed 19 months ago by refold

Attached are my patches that  parallelise cabal-install's 'install' command.

Sorry for sending them as a single large bundle - ideally I would like to split the patch series, but darcs send makes it hard by ignoring depended-upon patches. Additionally, it's hard to destructively edit history in Darcs, so instead of obliterating two unnecessary patches (changes to README and cabal-install.cabal) I undid those changes with a "merge" patch.

The patch series logically consists of three parts (in chronological order):

1) From the first patch up to the "Parallelise the install command" patch

Implements the basic parallel framework as described  here. Changes are a bit more pervasive than expected because of Cabal's internal assumption that the current working directory is the same as the directory of the package currently being built.

2) From the end of the previous part up to the "Implement output serialisation (client bits)." patch

Implements output serialisation - since we don't want the console output to be garbled, all printing should be done from a single thread. This is done by changing all code called from D.C.I.executeInstallPlan to use callbacks instead of standard output functions (debug/info/...).

3) Bugfixes and polishing (remaining patches)

During this stage I was concentrated on testing and fixing bugs and didn't add any new functionality.

My patches are also available in a  separate Darcs repository.

Changed 15 months ago by kosmikus

  • priority changed from normal to high
  • milestone set to cabal-install-0.16

Changed 14 months ago by refold

Changed 14 months ago by refold

I've updated my parallel patches (see attachment). Patches apply cleanly to the current mainline. The parallel code path now always uses the external setup method (via Setup.hs), so the required changes to the Cabal lib are minimised. There are still some traces of output serialisation, though.

Some numbers:

$ time cabal install -j 1 alex happy
real	1m19.236s
user	1m1.330s
sys	0m10.510s

$ time cabal install -j 4 alex happy
real	0m52.106s
user	1m10.680s
sys	0m15.030s

$ time cabal install -j 1 yesod

real	19m14.913s
user	15m59.420s
sys	1m25.650s

$ time cabal install -j 4 yesod

real	14m8.599s
user	21m36.530s
sys	4m5.650s

I also tested the Nov 2011 version of the code (tries to use the internal setup method, requires pervasive changes to Cabal lib):

$ time cabal install -j 4 alex happy
real	0m45.503s
user	1m4.040s
sys	0m10.100s

$ time cabal install -j 4 yesod
real	10m41.840s
user	17m6.560s
sys	1m33.040s

Compiling and linking all these Setup.hs files does add some noticeable overhead.

If these patches get accepted, I'll start working on improving the UI.

Changed 13 months ago by refold

Parallel patches were  moved to GitHub:

git clone git://github.com/23Skidoo/cabal.git cabal-parallel-install
cd cabal-parallel-install
git checkout parallel-install
Note: See TracTickets for help on using tickets.