monad-memo
Purpose
This package provides a convenient mechanism for adding memoization to Haskell monadic functions.
Memoization
Memoization is a well known way to speed up function evaluation by caching previously calculated results and reusing them whenever a memoized function is needed to be evaluated with the same arguments again.
It is usually associated with dynamic programming techiques.
Overview
Even though it is possible to manually add memoization to the code which would benefit from it, this ad-hoc approach has usual ad-hoc drawbacks: code pollution, bugs, resistance to changes.
This package however encapsulates the underlying plumbing behind its simple monadic interface MonadMemo
with a single combinator memo
which, when applied to monadic function, turns it into "memoized" one.
The package offers various implementation of MonadMemo
(which differs in terms of performance and requirements) and it is possible to choose/change the implementation without affecting the main function code.
The range of supported implementations "out of box" is limited by the range of containers provided by the standard packages installed by Haskel Platform:
from default pure "fit them all" Data.Map to very fast but limiting vector.
It is also possible to plug-in a custom container (from a third-party library) and run existing monadic code with it.
The default implementation of MonadMemo
is also monad transformer so it can be "mixed" with other monads.
The package also provides the "memoized" versions of most standard monads found in mtl.
Example of usage
A clasic example of function which greatelly benefits from memoization is a recursively defined Fibonacci number function.
A plain version of this function can be written in the following way:
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
which is very inefficient (impractical for n>40
).
We can rewrite this definition as a monad:
fibm :: Monad m => Integer -> m Integer
fibm 0 = return 0
fibm 1 = return 1
fibm n = do
f1 <- fibm (n-1)
f2 <- fibm (n-2)
return (f1+f2)
and even run it with Identity
monad with identical inefficiency:
evalFibmId :: Integer -> Integer
evalFibmId = runIdentity . fibm
But all we need to do to make this function "computable" for reasonable argument is to add memoization for both recursive branches with memo
combinator:
fibm :: (MonadMemo Integer Integer m) => Integer -> m Integer
fibm 0 = return 0
fibm 1 = return 1
fibm n = do
f1 <- memo fibm (n-1)
f2 <- memo fibm (n-2)
return (f1+f2)
then, to evaluate it with default Data.Map
based memoization cache we use the following "eval*" function:
evalFibm :: Integer -> Integer
evalFibm = startEvalMemo . fibm
Now the range of the arguments it can handle is limited only by Integer
computation complexity and stack memory limit.
More Examples
Slightly more complicated recursive function
Well known Ackerman function is a two arguments function.
To memoize two argument function for2
combinator can be used, giving the following generic code:
ackm :: (Num n, Ord n, MonadMemo (n, n) n m) => n -> n -> m n
ackm 0 n = return (n+1)
ackm m 0 = for2 memo ackm (m-1) 1
ackm m n = do
n1 <- for2 memo ackm m (n-1) -- 'for2' adapts 'memo' for 2-argument 'ackm'
for2 memo ackm (m-1) n1
evalAckm :: (Num n, Ord n) => n -> n -> n
evalAckm n m = startEvalMemo $ ackm n m
Mutually recursive function memoization
This example is taken from paper "Monadic Memoization Mixins" by Daniel Brown and William R. Cook
Given the following mutually recursive function definitions:
-- 'f' depends on 'g'
f :: Int -> (Int,String)
f 0 = (1,"+")
f (n+1) =(g(n,fst(f n)),"-" ++ snd(f n))
-- 'g' depends on 'f'
g :: (Int, Int) -> Int
g (0, m) = m + 1
g (n+1,m) = fst(f n)-g(n,m)
How can we memoize both functions?
Lets try to just add memo for both functions:
-- WRONG: Will NOT compile!
fm 0 = return (1,"+")
fm (n+1) = do
fn <- memo fm n
gn <- memo gm (n , fst fn)
return (gn , "-" ++ snd fn)
gm (0,m) = return (m+1)
gm (n+1,m) = do
fn <- memo fm n
gn <- memo gm (n,m)
return $ fst fn - gn
GHC complains:
"Occurs check: cannot construct the infinite type: t = (t, v)
Expected type: t
Inferred type: (t, v)"
which is understandable since we are trying to use the same cache for storing "key-value" pairs of the functions of different types (fm :: Int -> m (Int,String)
and gm :: (Int, Int) -> m Int
).
Obviously, to cache both function we will need two caches (even if the types of the functions were identical, it's not very good idea to share the same cache).
And this is precisely what we have to do - use two memoization caches! The way to achieve it is to use two MemoT monad transformers one nested in another:
-- Memo-cache for 'fm'
type MemoF = MemoT Int (Int,String)
-- Memo-cache for 'gm'
type MemoG = MemoT (Int,Int) Int
-- | Combined stack of caches (transformers)
-- Stacks two 'MemoT' transformers in one monad to be used in both 'gm' and 'fm' monadic functions
type MemoFG = MemoF (MemoG Identity)
NB As usually with Haskell it isn't necessary to specify types here (or restrict them to MemoT combinations for the given example).
Then, a little bit of complication, since we use two caches now (one from the current monad transformer and another from the next, nested in the current) we need to use memol_X_ set of functions: memol0, memol1 etc. Where X specifies "sequential number" of the transformer in stack for a given cache (starting from the current). Here we use the current (0) and the next (1) for fm
and gm
respectively:
fm :: Int -> MemoFG (Int,String)
fm 0 = return (1,"+")
fm (n+1) = do
fn <- memol0 fm n
gn <- memol1 gm (n , fst fn)
return (gn , "-" ++ snd fn)
gm :: (Int,Int) -> MemoFG Int
gm (0,m) = return (m+1)
gm (n+1,m) = do
fn <- memol0 fm n
gn <- memol1 gm (n,m)
return $ fst fn - gn
evalAll = startEvalMemo . startEvalMemoT
-- | Function to run 'fm' computation
evalFm :: Int -> (Int, String)
evalFm = evalAll . fm
-- | Function to run 'gm' computation
evalGm :: (Int,Int) -> Int
evalGm = evalAll . gm
In fact we can also define 'gm' function in curried form:
fm2 :: Int -> MemoFG (Int,String)
fm2 0 = return (1,"+")
fm2 n = do
fn <- memol0 fm2 (n-1)
gn <- for2 memol1 gm2 (n-1) (fst fn)
return (gn , "-" ++ snd fn)
-- 2-argument function now
gm2 :: Int -> Int -> MemoFG Int
gm2 0 m = return (m+1)
gm2 n m = do
fn <- memol0 fm2 (n-1)
gn <- for2 memol1 gm2 (n-1) m -- 'for2' adapts 'memol1' for 2-argument gm2
return $ fst fn - gn
evalFm2 :: Int -> (Int, String)
evalFm2 = evalAll . fm2
evalGm2 :: Int -> Int -> Int
evalGm2 n m = evalAll $ gm2 n m
Combining MemoT with other monads
Being monad transformer, memoization monad can be combined with most of existing monads.
Here we mix it with MonadWriter:
fibmw :: (Num n, MonadWriter String m, MonadMemo n n m) => n -> m n
fibmw 0 = tell "0" >> return 0
fibmw 1 = tell "1" >> return 1
fibmw n = do
f1 <- memo fibmw (n-1)
f2 <- memo fibmw (n-2)
tell $ show n
return (f1+f2)
-- To run combined monad we need to sequence both 'run' functions:
evalFibmw :: Integer -> (Integer, String)
evalFibmw = startEvalMemo . runWriterT . fibmw
res = evalFibmw 6 -- > produces (8,"1021310241021351021310246")
Custom pure cache container
From monad-memo version 0.3.0 it is possible to replace default Data.Map with another (more efficient?) implementation of internal cache-container
as long as there is an instance of Data.MapLike defined for this container.
The package currently defines these instances for Data.Map and Data.IntMap only.
For instance, should we decide to use unordered-containers all we need to do is to define the following instance for our container:
import Data.Hashable
import qualified Data.HashMap.Strict as H
instance (Eq k, Hashable k) => MapLike (H.HashMap k v) k v where
lookup = H.lookup
add = H.insert
then we just need to use (``evalMemoState``H.empty)
instead of startEvalMemo
and our memoized function will be evaluated using Hashmap
as an internal container hosted in MemoState.
There is usually no need to do any modification to the memoized function itself.
Mutable arrays and vectors as MonadCache
Array-based memoization cache
version 0.4.0 adds ArrayCache: a new MonadCache implementation based on mutable arrays (inside IO or ST s monad). The main benefit of this MonadCache
is its performance: it can be an order of magnitude faser than standard Data.Map
-based cache. This is due to the fact that arrays have O(1)
lookup time and in-place mutable arrays also have O(1)
for updates (i.e. the cache add
operation).
Unfortunatelly you cannot always use this MonadCache
due to array's natural limitations:
- The key must be an instance of Ix typeclass
- The bounds of the array must be known (and specified) beforehand and array cannot be resized
- Array is a continious space of values, so if the key distribution is wide and sparse the memory will be wasted (or array may not even fit into memory)
But if the nature of your memoized function permits the usage of ArrayCache
you can make your code much more faster by simply switching from Map-based MonadCache
to ArrayCache
especially if the value type of your function can be "unboxed" (i.e. it is one of primitive types like Int
or Double
). "Unboxed" values are packed in unboxed arrays UArray
which offer even faster execution and are the most efficient in terms of memory usage.
Normally you don't have to modify your monadic function definition to run ArrayCache
-based memoization: just use appropriate eval*
or run*
function. For instance our canonical fibm
function:
fibm 0 = return 0
fibm 1 = return 1
fibm n = do
n1 <- memo fibm (n-1)
n2 <- memo fibm (n-2)
return (n1+n2)
can be run using ST
array of Integers
with the following function:
evalFibmSTA :: Integer -> Integer
evalFibmSTA n = runST $ evalArrayMemo (fibm n) (0,n)
here the (0,n)
argument defines the bounds of cache array.
Is it equally easy to use unboxed version of the array, but Integer
cannot be unboxed (it isn't primitive type), so lets just use Double
for our function result:
evalFibmSTUA :: Integer -> Double
evalFibmSTUA n = runST $ evalUArrayMemo (fibm n) (0,n)
Instead of ST
you can use IO
monad:
evalFibmIOA :: Integer -> IO Integer
evalFibmIOA n = evalArrayMemo (fibm n) (0,n)
evalFibmIOUA :: Integer -> IO Double
evalFibmIOUA n = evalUArrayMemo (fibm n) (0,n)
Vector-based memoization cache
For even better performance use VectorCache and its flavours (unsafe version and dynamically expandable version) which are all based on very fast vector library.
Note however that this MonadCache
is even more limiting that ArrayCache
since vector
supports only Int
as an index.
The usage is very similar to ArrayCache
, but instead of range we need to specify the length of the vector:
evalFibmSTV :: Int -> Integer
evalFibmSTV n = runST $ evalVectorMemo (fibm n) n
evalFibmIOUV :: Int -> IO Double
evalFibmIOUV n = evalUVectorMemo (fibm n) n
Use "Expandable" version to avoid specifying length parameter:
import qualified Control.Monad.Memo.Vector.Expandable as VE
evalFibmSTVE :: Int -> Integer
evalFibmSTVE n = runST $ VE.startEvalVectorMemo (fibm n)
The difference in performance for different MonadCache
's with Fibonacci function is demonstrated by this criterion test.
The test runs memoized Fibonacci function using the following caches:
- default Map-based
- State-based with Data.IntMap
- array and unboxed array based (Array and UArray)
- vector, unsafe vector and expandable vector (both boxed and unboxed vectors)
Full report can be found here.
Custom mutable cache
It is also possible to use a mutable container as a MonadCache
not defined here.
For example if we wish to use mutable hash-table from hashtables package we can do so with the following code:
{-# LANGUAGE MultiParamTypeClasses, TypeSynonymInstances, FlexibleInstances #-}
import Data.Hashable
import Control.Monad.ST
import Control.Monad.Memo
import Control.Monad.Trans.Memo.ReaderCache
import qualified Data.HashTable.ST.Basic as H
newtype Container s k v = Container { toTable :: H.HashTable s k v }
type Cache s k v = ReaderCache (Container s k v)
instance (Eq k, Hashable k) => MonadMemo k v (Cache s k v (ST s)) where
{-# INLINE memo #-}
memo f k = do
c <- container
e <- lift $ H.lookup (toTable c) k
if isNothing e
then do
v <- f k
lift $ H.insert (toTable c) k v
return v
else return (fromJust e)
{-# INLINE fib1 #-}
fibm 0 = return 0
fibm 1 = return 1
fibm n = do
f1 <- memo fibm (n-1)
f2 <- memo fibm (n-2)
return (f1+f2)
evalFib :: Int -> Int
evalFib n = runST $ do
c <- H.new
evalReaderCache (fibm n) (Container c)
References