Safe Haskell | None |
---|---|

Language | Haskell2010 |

Vector of (small) words which adapt their representation to make them more compact when the elements are small.

This is data structure engineered to store large amount of small vectors of small elements compactly on memory.

For example the list `[1..14] :: [Int]`

consumes 560 bytes (14x5=70 words) on
a 64 bit machine, while the corresponding `WordVec`

takes only
16 bytes (2 words), and the one corresponding to `[101..115]`

still only
24 bytes (3 words).

Unboxed arrays or unboxed vectors are better, as they only have a constant overhead, but those constants are big: 13 words (104 bytes on 64 bit) for unboxed arrays, and 6 words (48 bytes) for unboxed vectors. And you still have to select the number of bits per element in advance.

Some operations may be a bit slower, but hopefully the cache-friendlyness
will somewhat balance that (a simple microbenchmark with `Map`

-s
indexed by `[Int]`

vs. `WordVec`

showed a 2x improvement in speed and
20x improvement in memory usage). In any case the primary goal
here is optimized memory usage.

This module should be imported qualified (to avoid name clashes with Prelude).

TODO: ability to add user-defined (fixed-length) header, it can be potentially useful for some applications

## Synopsis

- newtype WordVec = WordVec Blob
- data Shape = Shape {}
- vecShape :: WordVec -> Shape
- vecShape' :: WordVec -> (Bool, Shape)
- vecLen :: WordVec -> Int
- vecBits :: WordVec -> Int
- vecIsSmall :: WordVec -> Bool
- showWordVec :: WordVec -> String
- showsPrecWordVec :: Int -> WordVec -> ShowS
- null :: WordVec -> Bool
- empty :: WordVec
- singleton :: Word -> WordVec
- isSingleton :: WordVec -> Maybe Word
- fromList :: [Word] -> WordVec
- fromListN :: Int -> Word -> [Word] -> WordVec
- fromList' :: Shape -> [Word] -> WordVec
- toList :: WordVec -> [Word]
- toRevList :: WordVec -> [Word]
- unsafeIndex :: Int -> WordVec -> Word
- safeIndex :: Int -> WordVec -> Maybe Word
- head :: WordVec -> Word
- tail :: WordVec -> WordVec
- cons :: Word -> WordVec -> WordVec
- uncons :: WordVec -> Maybe (Word, WordVec)
- last :: WordVec -> Word
- snoc :: WordVec -> Word -> WordVec
- concat :: WordVec -> WordVec -> WordVec
- sum :: WordVec -> Word
- maximum :: WordVec -> Word
- eqStrict :: WordVec -> WordVec -> Bool
- eqExtZero :: WordVec -> WordVec -> Bool
- cmpStrict :: WordVec -> WordVec -> Ordering
- cmpExtZero :: WordVec -> WordVec -> Ordering
- lessOrEqual :: WordVec -> WordVec -> Bool
- partialSumsLessOrEqual :: WordVec -> WordVec -> Bool
- add :: WordVec -> WordVec -> WordVec
- subtract :: WordVec -> WordVec -> Maybe WordVec
- scale :: Word -> WordVec -> WordVec
- partialSums :: WordVec -> WordVec
- fold :: (a -> Word -> a) -> a -> WordVec -> a
- naiveMap :: (Word -> Word) -> WordVec -> WordVec
- boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec
- naiveZipWith :: (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
- boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec
- listZipWith :: (Word -> Word -> a) -> WordVec -> WordVec -> [a]
- bitsNeededFor :: Word -> Int
- bitsNeededFor' :: Word -> Int
- roundBits :: Int -> Int

# The dynamic Word vector type

Dynamic word vectors are internally `Blob`

-s, which the first few bits
encoding their shape, and after that their content.

- small vectors has 2 bits of "resolution" and 5 bits of length
- big vectors has 4 bits of "resolution" and 27 bits of length

Resolution encodes the number of bits per element. The latter is always a multiple of 4 (that is: 4 bits per element, or 8, or 12, etc. up to 64 bits per element).

We use the very first bit to decide which of these two encoding we use. (if we would make a sum type instead, it would take 2 extra words...)

About the instances:

- the
`Eq`

instance is strict:`x == y`

iff`toList x == toList y`

. For an equality which disregards trailing zeros, see`eqExtZero`

- the
`Ord`

instance first compares the length, then if the lengths are equal, compares the content lexicographically. For a comparison which disregards the length, and lexicographically compares the sequences extended with zeros, see`cmpExtZero`

## Instances

Eq WordVec Source # | The Eq instance is strict: |

Ord WordVec Source # | The Ord instance first compares the length, then if the lengths are equal,
compares the content lexicographically. For a different ordering, see |

Defined in Data.Vector.Compact.WordVec | |

Show WordVec Source # | |

The "shape" of a dynamic word vector

vecIsSmall :: WordVec -> Bool Source #

`True`

if the internal representation is the "small" one

# Show instance

showWordVec :: WordVec -> String Source #

# Empty vector, singleton

# Conversion to/from lists

This is faster than `fromList`

fromList' :: Shape -> [Word] -> WordVec Source #

If you know the shape in advance, it's faster to use this function

toRevList :: WordVec -> [Word] Source #

`toRevList vec == reverse (toList vec)`

, but should be faster (?)

# Indexing

# Head, tail, etc

# Specialized operations

These are are faster than the generic operations below, and should be preferred to those.

## Specialized "zipping folds"

eqStrict :: WordVec -> WordVec -> Bool Source #

Strict equality of vectors (same length, same content)

cmpStrict :: WordVec -> WordVec -> Ordering Source #

Strict comparison of vectors (first compare the lengths; if the lengths are the same then compare lexicographically)

cmpExtZero :: WordVec -> WordVec -> Ordering Source #

Lexicographic ordering of vectors extended with zeros to infinity

lessOrEqual :: WordVec -> WordVec -> Bool Source #

Pointwise comparison of vectors extended with zeros to infinity

partialSumsLessOrEqual :: WordVec -> WordVec -> Bool Source #

Pointwise comparison of partial sums of vectors extended with zeros to infinity

For example `[x1,x2,x3] <= [y1,y2,y3]`

iff (`x1 <=y1 && x1+x2 <= y1+y2 && x1+x2+x3 <= y1+y2+y3`

).

## Specialized zips

add :: WordVec -> WordVec -> WordVec Source #

Pointwise addition of vectors. The shorter one is extended by zeros.

subtract :: WordVec -> WordVec -> Maybe WordVec Source #

Pointwise subtraction of vectors. The shorter one is extended by zeros. If any element would become negative, we return Nothing

## Specialized maps

## Specialized scans

partialSums :: WordVec -> WordVec Source #

toList (partialSums vec) == tail (scanl (+) 0 $ toList vec)

# Generic operations

boundedMap :: Word -> (Word -> Word) -> WordVec -> WordVec Source #

If you have a (nearly sharp) upper bound to the result of your of function on your vector, mapping can be more efficient

boundedZipWith :: Word -> (Word -> Word -> Word) -> WordVec -> WordVec -> WordVec Source #

If you have a (nearly sharp) upper bound to the result of your of function on your vector, zipping can be more efficient

# Number of bits needed

bitsNeededFor :: Word -> Int Source #

Number of bits needed to encode a given number, rounded up to multiples of four

bitsNeededFor' :: Word -> Int Source #

Number of bits needed to encode a given number