This module provides a monad Par
, for speeding up pure
computations using parallel processors. It cannot be used for
speeding up computations that use IO (for that, see
Control.Concurrent
). The result of a given Par
computation is
always the same - ie. it is deterministic, but the computation may
be performed more quickly if there are processors available to
share the work.
For example, the following program fragment computes the values of
(f x)
and (g x)
in parallel, and returns a pair of their results:
runPar $ do fx <- pval (f x) -- start evaluating (f x) gx <- pval (g x) -- start evaluating (g x) a <- get fx -- wait for fx b <- get gx -- wait for gx return (a,b) -- return results
Par
can be used for specifying pure parallel computations in
which the order of the computation is not known beforehand.
The programmer specifies how information flows from one
part of the computation to another, but not the order in which
computations will be evaluated at runtime. Information flow is
described using variables called IVar
s, which support put
and
get
operations. For example, suppose you have a problem that
can be expressed as a network with four nodes, where b
and c
require the value of a
, and d
requires the value of b
and c
:
a / \ b c \ / d
Then you could express this in the Par
monad like this:
runPar $ do [a,b,c,d] <- sequence [new,new,new,new] fork $ do x <- get a; put b (x+1) fork $ do x <- get a; put c (x+2) fork $ do x <- get b; y <- get c; put d (x+y) fork $ do put a (3 :: Int) get d
The result of the above computation is always 9. The get
operation
waits until its input is available; multiple put
s to the same
IVar
are not allowed, and result in a runtime error. Values
stored in IVar
s are usually fully evaluated (although there are
ways provided to pass lazy values if necessary).
In the above example, b
and c
will be evaluated in parallel.
In practice the work involved at each node is too small here to see
the benefits of parallelism though: typically each node should
involve much more work. The granularity is completely under your
control - too small and the overhead of the Par
monad will
outweigh any parallelism benefits, whereas if the nodes are too
large then there might not be enough parallelism to use all the
available processors.
Unlike Control.Parallel
, in Control.Monad.Par
parallelism is
not combined with laziness, so sharing and granulairty are
completely under the control of the programmer. New units of
parallel work are only created by fork
, par
, and a few other
combinators.
The implementation is based on a work-stealing scheduler that divides the work as evenly as possible betwen the available processors at runtime.
- data Par a
- runPar :: Par a -> a
- fork :: Par () -> Par ()
- data IVar a
- new :: Par (IVar a)
- newFull :: NFData a => a -> Par (IVar a)
- newFull_ :: a -> Par (IVar a)
- get :: IVar a -> Par a
- put :: NFData a => IVar a -> a -> Par ()
- put_ :: IVar a -> a -> Par ()
- pval :: NFData a => a -> Par (IVar a)
- spawn :: NFData a => Par a -> Par (IVar a)
- spawn_ :: Par a -> Par (IVar a)
- parMap :: (Traversable t, NFData b) => (a -> b) -> t a -> Par (t b)
- parMapM :: (Traversable t, NFData b) => (a -> Par b) -> t a -> Par (t b)
- parMapReduceRangeThresh :: NFData a => Int -> InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par a
- parMapReduceRange :: NFData a => InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par a
- data InclusiveRange = InclusiveRange Int Int
- parFor :: InclusiveRange -> (Int -> Par ()) -> Par ()
The Par
monad
fork :: Par () -> Par ()Source
forks a computation to happen in parallel. The forked
computation may exchange values with other computations using
IVar
s.
Communication: IVar
s
read the value in a IVar
. The get
can only return when the
value has been written by a prior or parallel put
to the same
IVar
.
put :: NFData a => IVar a -> a -> Par ()Source
put a value into a IVar
. Multiple put
s to the same IVar
are not allowed, and result in a runtime error.
put
fully evaluates its argument, which therefore must be an
instance of NFData
. The idea is that this forces the work to
happen when we expect it, rather than being passed to the consumer
of the IVar
and performed later, which often results in less
parallelism than expected.
Sometimes partial strictness is more appropriate: see put_
.
Operations
spawn :: NFData a => Par a -> Par (IVar a)Source
Like fork
, but returns a IVar
that can be used to query the
result of the forked computataion.
spawn p = do r <- new fork (p >>= put r) return r
spawn_ :: Par a -> Par (IVar a)Source
Like spawn
, but the result is only head-strict, not fully-strict.
parMap :: (Traversable t, NFData b) => (a -> b) -> t a -> Par (t b)Source
Applies the given function to each element of a data structure in parallel (fully evaluating the results), and returns a new data structure containing the results.
parMap f xs = mapM (pval . f) xs >>= mapM get
parMap
is commonly used for lists, where it has this specialised type:
parMap :: NFData b => (a -> b) -> [a] -> Par [b]
parMapM :: (Traversable t, NFData b) => (a -> Par b) -> t a -> Par (t b)Source
Like parMap
, but the function is a Par
monad operation.
parMapM f xs = mapM (spawn . f) xs >>= mapM get
:: NFData a | |
=> Int | threshold |
-> InclusiveRange | range over which to calculate |
-> (Int -> Par a) | compute one result |
-> (a -> a -> Par a) | combine two results (associative) |
-> a | initial result |
-> Par a |
Computes a binary map/reduce over a finite range. The range is recursively split into two, the result for each half is computed in parallel, and then the two results are combined. When the range reaches the threshold size, the remaining elements of the range are computed sequentially.
For example, the following is a parallel implementation of
foldl (+) 0 (map (^2) [1..10^6])
parMapReduceRangeThresh 100 (InclusiveRange 1 (10^6)) (\x -> return (x^2)) (\x y -> return (x+y)) 0
parMapReduceRange :: NFData a => InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par aSource
"Auto-partitioning" version of parMapReduceRangeThresh
that chooses the threshold based on
the size of the range and the number of processors..
parFor :: InclusiveRange -> (Int -> Par ()) -> Par ()Source
Parallel for-loop over an inclusive range. Semantically equivalent to
parFor (InclusiveRange n m) f = forM_ [n..m] f
except that the implementation will split the work into an unspecified number of subtasks in an attempt to gain parallelism. The exact number of subtasks is chosen at runtime, and is probably a small multiple of the available number of processors.
Strictly speaking the semantics of parFor
depends on the
number of processors, and its behaviour is therefore not
deterministic. However, a good rule of thumb is to not have any
interdependencies between the elements; if this rule is followed
then parFor
has deterministic semantics. One easy way to follow
this rule is to only use put
or put_
in f
, never get
.