This module provides a monad `Par`

, for speeding up pure
computations using parallel processors. It cannot be used for
speeding up computations that use IO (for that, see
`Control.Concurrent`

). The result of a given `Par`

computation is
always the same - ie. it is deterministic, but the computation may
be performed more quickly if there are processors available to
share the work.

For example, the following program fragment computes the values of
`(f x)`

and `(g x)`

in parallel, and returns a pair of their results:

runPar $ do fx <- pval (f x) -- start evaluating (f x) gx <- pval (g x) -- start evaluating (g x) a <- get fx -- wait for fx b <- get gx -- wait for gx return (a,b) -- return results

`Par`

can be used for specifying pure parallel computations in
which the order of the computation is not known beforehand.
The programmer specifies how information flows from one
part of the computation to another, but not the order in which
computations will be evaluated at runtime. Information flow is
described using variables called `IVar`

s, which support `put`

and
`get`

operations. For example, suppose you have a problem that
can be expressed as a network with four nodes, where `b`

and `c`

require the value of `a`

, and `d`

requires the value of `b`

and `c`

:

a / \ b c \ / d

Then you could express this in the `Par`

monad like this:

runPar $ do [a,b,c,d] <- sequence [new,new,new,new] fork $ do x <- get a; put b (x+1) fork $ do x <- get a; put c (x+2) fork $ do x <- get b; y <- get c; put d (x+y) fork $ do put a (3 :: Int) get d

The result of the above computation is always 9. The `get`

operation
waits until its input is available; multiple `put`

s to the same
`IVar`

are not allowed, and result in a runtime error. Values
stored in `IVar`

s are usually fully evaluated (although there are
ways provided to pass lazy values if necessary).

In the above example, `b`

and `c`

will be evaluated in parallel.
In practice the work involved at each node is too small here to see
the benefits of parallelism though: typically each node should
involve much more work. The granularity is completely under your
control - too small and the overhead of the `Par`

monad will
outweigh any parallelism benefits, whereas if the nodes are too
large then there might not be enough parallelism to use all the
available processors.

Unlike `Control.Parallel`

, in `Control.Monad.Par`

parallelism is
not combined with laziness, so sharing and granulairty are
completely under the control of the programmer. New units of
parallel work are only created by `fork`

, `par`

, and a few other
combinators.

The implementation is based on a work-stealing scheduler that divides the work as evenly as possible betwen the available processors at runtime.

- data Par a
- runPar :: Par a -> a
- fork :: Par () -> Par ()
- data IVar a
- new :: Par (IVar a)
- newFull :: NFData a => a -> Par (IVar a)
- newFull_ :: a -> Par (IVar a)
- get :: IVar a -> Par a
- put :: NFData a => IVar a -> a -> Par ()
- put_ :: IVar a -> a -> Par ()
- pval :: NFData a => a -> Par (IVar a)
- spawn :: NFData a => Par a -> Par (IVar a)
- spawn_ :: Par a -> Par (IVar a)
- parMap :: (Traversable t, NFData b) => (a -> b) -> t a -> Par (t b)
- parMapM :: (Traversable t, NFData b) => (a -> Par b) -> t a -> Par (t b)
- parMapReduceRangeThresh :: NFData a => Int -> InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par a
- parMapReduceRange :: NFData a => InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par a
- data InclusiveRange = InclusiveRange Int Int
- parFor :: InclusiveRange -> (Int -> Par ()) -> Par ()

# The `Par`

monad

fork :: Par () -> Par ()Source

forks a computation to happen in parallel. The forked
computation may exchange values with other computations using
`IVar`

s.

# Communication: `IVar`

s

read the value in a `IVar`

. The `get`

can only return when the
value has been written by a prior or parallel `put`

to the same
`IVar`

.

put :: NFData a => IVar a -> a -> Par ()Source

put a value into a `IVar`

. Multiple `put`

s to the same `IVar`

are not allowed, and result in a runtime error.

`put`

fully evaluates its argument, which therefore must be an
instance of `NFData`

. The idea is that this forces the work to
happen when we expect it, rather than being passed to the consumer
of the `IVar`

and performed later, which often results in less
parallelism than expected.

Sometimes partial strictness is more appropriate: see `put_`

.

# Operations

spawn :: NFData a => Par a -> Par (IVar a)Source

Like `fork`

, but returns a `IVar`

that can be used to query the
result of the forked computataion.

spawn p = do r <- new fork (p >>= put r) return r

spawn_ :: Par a -> Par (IVar a)Source

Like `spawn`

, but the result is only head-strict, not fully-strict.

parMap :: (Traversable t, NFData b) => (a -> b) -> t a -> Par (t b)Source

Applies the given function to each element of a data structure in parallel (fully evaluating the results), and returns a new data structure containing the results.

parMap f xs = mapM (pval . f) xs >>= mapM get

`parMap`

is commonly used for lists, where it has this specialised type:

parMap :: NFData b => (a -> b) -> [a] -> Par [b]

parMapM :: (Traversable t, NFData b) => (a -> Par b) -> t a -> Par (t b)Source

Like `parMap`

, but the function is a `Par`

monad operation.

parMapM f xs = mapM (spawn . f) xs >>= mapM get

:: NFData a | |

=> Int | threshold |

-> InclusiveRange | range over which to calculate |

-> (Int -> Par a) | compute one result |

-> (a -> a -> Par a) | combine two results (associative) |

-> a | initial result |

-> Par a |

Computes a binary map/reduce over a finite range. The range is recursively split into two, the result for each half is computed in parallel, and then the two results are combined. When the range reaches the threshold size, the remaining elements of the range are computed sequentially.

For example, the following is a parallel implementation of

foldl (+) 0 (map (^2) [1..10^6])

parMapReduceRangeThresh 100 (InclusiveRange 1 (10^6)) (\x -> return (x^2)) (\x y -> return (x+y)) 0

parMapReduceRange :: NFData a => InclusiveRange -> (Int -> Par a) -> (a -> a -> Par a) -> a -> Par aSource

"Auto-partitioning" version of `parMapReduceRangeThresh`

that chooses the threshold based on
the size of the range and the number of processors..

parFor :: InclusiveRange -> (Int -> Par ()) -> Par ()Source

Parallel for-loop over an inclusive range. Semantically equivalent to

parFor (InclusiveRange n m) f = forM_ [n..m] f

except that the implementation will split the work into an unspecified number of subtasks in an attempt to gain parallelism. The exact number of subtasks is chosen at runtime, and is probably a small multiple of the available number of processors.

Strictly speaking the semantics of `parFor`

depends on the
number of processors, and its behaviour is therefore not
deterministic. However, a good rule of thumb is to not have any
interdependencies between the elements; if this rule is followed
then `parFor`

has deterministic semantics. One easy way to follow
this rule is to only use `put`

or `put_`

in `f`

, never `get`

.