lazyarray-0.1.2: Efficient implementation of lazy monolithic arrays (lazy in indexes).

Portability non-portable (uses Data.Array.IO) experimental fox@ucw.cz

Data.LazyArray

Description

This is an implementation of monolithic lazy array. If you want to know what an monolithic lazy array is good for, you can read for example the paper "Efficient Graph Algorithms Using Lazy Monolithic Arrays" (http://citeseer.ist.psu.edu/95126.html).

Module Data.LazyArray implements the monolithic lazy array using `Array` with temporary buffer. For implementation without the temporary buffer see Data.LazyArray.Lowlevel module.

Don't forget to import some kind of arrays when using this module. Results of small benchmark and a discussion can be found at the bottom of this page.

Synopsis

# Generic lazy arrays

Arguments

 :: Ix i => (i, i) bounds of the array -> [(i, e)] list of associations -> Array i [e] resulting array

This function returns an array, which i-th element is a list, that contains all values associated with i in the list of associations, in that order. In other words, the result of `lArray bnds assoc` is

`lArray bnds assoc = array bnds [(i,[e|(i',e)<-assoc,i'==i] | i<-range bnds]`.

The important difference between the two sides of previous equation is that `lArray` works in linear time, ie. each member in the `assoc` list is accessed exactly once, and `lArray` is lazy in its second argument. It means the `assoc` list is left intact until an i-th element of the resulting array needs to be evaluated. At that time the `assoc` list is examined until the first pair (i,e') is found. Until such pair is found, processed elements of `assoc` are stored at appropriate indexes in the resulting array.

The implementation uses temporary array for such processed elements of `assoc` list, that were not yet 'requested' [or 'evaluated'] in the resulting array. This array can be freed when all lists of the resulting array are fully evaluated.

Here is an example of how to use the monolithic lazy array. Given a graph of type `Array` `Int` [`Int`] we construct the dfs numbering from a given vertex. Obviously, only the first element of the `lArray` is needed - `lArrayFirst`, `lArrayMaybe` or Data.LazyArray.Lowlevel are better choice here.

```dnum::Array Int [Int]->Int->Array Int (Maybe Int)
dnum g s = amap listToMaybe marks where
list = dfs' [s] 0
marks = lArray (bounds g) list
dfs' []     _ = []
dfs' (s:ss) n = (s,n) : if n == head (marks!s) then dfs' ((g!s)++ss) (n+1) else dfs' ss n
```

Arguments

 :: Ix i => ([e] -> e') function to apply to the list of elements belonging to the same index -> (i, i) bounds of the array -> [(i, e)] list of associations -> Array i e' resulting array

It is often needed to apply some function to the list of elements belonging to the same index. Function `lArrayMap` is provided, and could be defined like `lArrayMap f = (amap f) . lArray`. Obviously, `lArray = lArrayMap id`.

# Specialised lazy arrays

Only the first element from the list of elements belonging to the same element is often needed. In that case you can use one of the following methods:

Arguments

 :: Ix i => (i, i) bounds of the array -> [(i, e)] list of associations -> Array i (Maybe e) resulting array

This is equivalent to `lArrayMaybe = lArrayMap listToMaybe` but a bit faster.

Arguments

 :: Ix i => e 'zero' element, -> (i, i) bounds of the array -> [(i, e)] list of associations -> Array i e resulting array

This is equivalent to `lArrayFirst zero = lArrayMap (\x->case x of []->zero; e:_->e)` but a bit faster.

# Benchmark

For comparison, various implementations of DFS numbering have been benchmarked. The benchmark can be found in the lazyarray package, in the file `bench/bench.hs`. The DFS was implemented using Data.LazyArray, Data.LazyArray.Lowlevel, Data.Array.Diff, Data.Map, Data.IntMap and in the Control.Monad.ST monad using `Data.Array.ST.runSTArray` and `Data.Array.ST.runSTUArray`. Three graphs were used, a path, a graph with 20 edges at each vertex, and a complete graph; each graph was used once as connected and once with isolated vertex iv. The DFS number of the isolated vertex was never asked for. Details can be found in the source file. This whole benchmark is written in Haskell, so it is only approximate.

``` DFS implemented using     | path |path+iv|20-path|20path+iv|cmplt gph|cmplt gph+iv
--------------------------|------|-------|-------|---------|---------|-------------
`lArrayFirst`               | 5456 |  5572 |  1956 |   1968  |   1316  |    1296
`lArrayFirst` with `lArrayMap`| 5248 |  5248 |  3456 |   3452  |   3072  |    3040
`lArrayMaybe`               | 5772 |  5724 |  2044 |   2064  |   1308  |    1312
`lArrayMaybe` with `lArrayMap`| 5380 |  5352 |  3560 |   3564  |   3048  |    3024
`Data.LazyArray.Lowlevel.laCreate`                  | 4560 |  4700 |  4984 |   9360  |   5280  |   11140
`Data.LazyArray.Lowlevel.mlaCreate`                 | 5308 |  5824 |  5972 |  11508  |   6328  |   13712
Data.Array.Diff           |29089 | 29173 | 14652 |  14144  |  15072  |   14988
Data.Map                  | 7216 |  7292 |  4892 |   4912  |   3316  |    3300
Data.IntMap               | 3328 |  3280 |  2616 |   2640  |   2492  |    2524
`Data.Array.ST.runSTArray`                | 1112 |  1116 |  1188 |   1384  |   1460  |    1468
`Data.Array.ST.runSTUArray`               |  688 |   692 |  1164 |   1160  |   1444  |    1440
```

The numbers are elapsed time in ms, but use these numbers only to compare implementations in one column. The first two columns are quite inconclusive, because the DFS does not really do much when graph is a path and Data.Map and Data.IntMap implementations don't do exactly the same as the others implementations. The `Data.LazyArray.Lowlevel.laCreate` and `Data.LazyArray.Lowlevel.mlaCreate` perform very badly in the presence of isolated vertex - `Data.LazyArray.Lowlevel.laFreeze` tries hard to find a connection to the isolated vertex even if it is not asked for.