% Implementing Pointer Algorithms in Haskell 
% Péter Diviánszky
% CEFP, Budapest & Komárno, 25-30 May 2009

#

The code and the slides can be found as  
`linear-maps` on  
[http://hackage.haskell.org](http://hackage.haskell.org/packages/archive/pkg-list.html).


# Pointers

-   Pointers are well known.
    -   They are called mutable variables in functional languages.
    -   Some algorithms use them heavily.

-   Pointers can be modeled with a global store (heap).
    -   Efficient implementation on CPU and memory.

-   Hard to find a stateless / modular model for them.
    -   This would be the functional way.


# Pointers in C

~~~~~~~~~~~~~~~~~ {.c}
void swap(int *x, int *y) {
    int xv = *x;    // read
    int yv = *y;
    *x = yv;        // write
    *y = xv;
}

int main() {
    int a = 13;
    int b = 14;
    swap(&a, &b);   // references
    printf("%d, %d", a, b);
    return 0;
}
~~~~~~~~~~~~~~~~~


# Pointers in OCAML


~~~~~~~~~~~~~~~~~ {.ocaml}
let swap x y =
    let vx = !x     (* read *)
    and vy = !y in
    x := vy;        (* write *)
    y := vx;;

let a = ref 13;;    (* reference *)
let b = ref 14;;
swap a b;;
~~~~~~~~~~~~~~~~~

Primitives:

~~~~~~~~~~~~~~~~~ {.ocaml}
ref   :  'a          -> 'a ref
(!)   :  'a ref      -> 'a
(:=)  :  'a ref -> a ->  unit
~~~~~~~~~~~~~~~~~

# Pointers in Haskell

~~~~~~~~~~~~~~~~~ {.haskell}
swap :: IORef a -> IORef a -> IO ()
swap x y = do
    vx <- readIORef x
    vy <- readIORef y
    writeIORef x vy
    writeIORef y vx
~~~~~~~~~~~~~~~~~

Primitives:

~~~~~~~~~~~~~~~~~ {.haskell}
newIORef   ::       a      -> IO (IORef a)
readIORef  :: IORef a      -> IO  a
writeIORef :: IORef a -> a -> IO ()
~~~~~~~~~~~~~~~~~

Side effects are properly indicated with `IO` in types.


# ST Pointers in Haskell

`STRef`s are more safe than `IORef`s because they need less privileges.

~~~~~~~~~~~~~~~~~ {.haskell}
swap :: STRef s a -> STRef s a -> ST s ()
swap x y = do
    vx <- readSTRef x
    vy <- readSTRef y
    writeSTRef x vy
    writeSTRef y vx
~~~~~~~~~~~~~~~~~

Primitives:

~~~~~~~~~~~~~~~~~ {.haskell}
newSTRef   ::         a      -> ST s (STRef s a)
readSTRef  :: STRef s a      -> ST s  a
writeSTRef :: STRef s a -> a -> ST s ()
~~~~~~~~~~~~~~~~~

# ST Pointers in Haskell (continued)

Imperative style Fibonacci function:

~~~~~~~~~~~~~~~~~ {.haskell}
fib :: Integer -> ST s Integer
fib n = do
    a <- newSTRef 0
    b <- newSTRef 1

    replicateM_ n $ do
        av <- readSTRef a
        bv <- readSTRef b
        writeSTRef a  bv
        writeSTRef b (av + bv)

    readSTRef a
~~~~~~~~~~~~~~~~~

# ST Pointers in Haskell (continued)

Note that the return type of the `ST` computation does not depend on `s`:

~~~~~~~~~~~~~~~~~ {.haskell}
fib :: Integer -> ST s Integer
~~~~~~~~~~~~~~~~~

In this case the `ST` computation can be turned into a pure value:

~~~~~~~~~~~~~~~~~ {.haskell}
runST :: (forall s. ST s a) -> a
~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~ {.haskell}
fib' :: Integer -> Integer
fib' n = runST (fib n)
~~~~~~~~~~~~~~~~~

In that way pointers can be used in a pure function.  
Still, we need a strictly scheduled computation inside.

# Other Direction: Pointers in Clean 

~~~~~~~~~~~~~~~~~ {.clean}
swap :: (Ptr a) (Ptr a) *Heap -> *Heap
swap x y h1 = h5
where
    (vx, h2) = readPtr x h1
    (vy, h3) = readPtr y h2
    h4       = writePtr x vy h3
    h5       = writePtr y vx h4
~~~~~~~~~~~~~~~~~

Primitives:

~~~~~~~~~~~~~~~~~ {.clean}
newPtr     ::      a    *Heap -> (Ptr a, *Heap)
readPtr    :: (Ptr a)   *Heap -> (a,     *Heap)
writePtr   :: (Ptr a) a *Heap ->         *Heap
~~~~~~~~~~~~~~~~~


# Problems with Explicit Heap

The previous pointer interface is

- Typed.
- Functional.

However, an explicit heap value should be carried through the program
which determines the evaluation order overly.  
The result is an imperative program in a functional guise.


# Improvement: Interchangeable Pointer Reads

Reading a pointer does not alter the heap but it have to be done in time:

~~~~~~~~~~~~~~~~~ {.clean}
swap :: (Ptr a) (Ptr a) *Heap -> *Heap
swap x y h 
    #! vx = sreadPtr x h
       vy = sreadPtr y h
    = writePtr y vx (writePtr x vy h)
~~~~~~~~~~~~~~~~~

New primitive:

~~~~~~~~~~~~~~~~~ {.clean}
sreadPtr :: (Ptr a) Heap ->  a
~~~~~~~~~~~~~~~~~

Note that `Heap` is a subtype of `*Heap`.


# Improvement: Typed Heaps

An `Int`-pointer read and a `Char`-pointer write may be interchanged safely.  
This is modeled with typed heaps.

Primitives (as used in the Clean compiler sources):

~~~~~~~~~~~~~~~~~ {.clean}
newHeap     :: .(Heap a)
newPtr      ::      a    *(Heap a) -> (Ptr a, *(Heap a))
readPtr     :: (Ptr a)   *(Heap a) -> (a,     *(Heap a))
sreadPtr    :: (Ptr a)    (Heap a) ->  a
writePtr    :: (Ptr a) a *(Heap a) ->         *(Heap a)
~~~~~~~~~~~~~~~~~

Still a problem: Reading a `Ptr Char` in a `Heap Char` fails if the pointer was constructed in another `Heap Char`.

# Improvement: Use the ST Pointer Trick

We distinguish between different `Heap Char` values by adding a phantom type variable: `Heap k Char`.

~~~~~~~~~~~~~~~~~ {.haskell}
newPtr      :: a          -> Heap k a -> (Ptr k, Heap k a)
sreadPtr    :: Ptr k      -> Heap k a ->  a
writePtr    :: Ptr k -> a -> Heap k a ->  Heap k a
~~~~~~~~~~~~~~~~~

Note that the interface use `Ptr k` instead of `Ptr k a` because `a` is not needed.

If the result of a heap-consuming computation does not contain the phantom typevar then we get a heap for free:

~~~~~~~~~~~~~~~~~ {.haskell}
runHCC :: (forall k. Heap k a -> b) -> b
~~~~~~~~~~~~~~~~~


# Coming from Another Direction: Finite Maps

Finite maps are functions with finite domain.  
Related phrases: dictionary (Python), hash (Perl), association list.

~~~~~~~~~~~~~~~~~ {.haskell}
empty    ::           Map k a
lookup   :: Ord k =>  k ->      Map k a -> Maybe a
insert   :: Ord k =>  k -> a -> Map k a -> Map k a
delete   :: Ord k =>  k ->      Map k a -> Map k a
~~~~~~~~~~~~~~~~~

We will need an additional function:

~~~~~~~~~~~~~~~~~ {.haskell}
modify :: Ord k =>  k -> Maybe a -> Map k a -> Map k a
modify k Nothing  m = delete k   m
modify k (Just a) m = insert k a m
~~~~~~~~~~~~~~~~~


# Finite Maps vs Heaps

`Heap k (Maybe a)` ~ `Map (Id k) a`

~~~~~~~~~~~~~~~~~ {.haskell}
newtype Id k = Id Int   deriving (Eq, Ord)
~~~~~~~~~~~~~~~~~

We allow only `Maybe`-typed heaps, so we can use an interface similar to finite maps.

# Pointers with Finite Map Interface

`Map` here is the abstract heap (not a finite map):

~~~~~~~~~~~~~~~~~ {.haskell}
lookup   :: Id k ->      Map k a ->  Maybe a
insert   :: Id k -> a -> Map k a ->  Map k a
delete   :: Id k ->      Map k a ->  Map k a
~~~~~~~~~~~~~~~~~

Instead of including `newPtr`, pointers are created with the map (this decison pays back later):

~~~~~~~~~~~~~~~~~ {.haskell}
runICC  :: (forall k. Map k a -> [Id k] -> b) -> b
~~~~~~~~~~~~~~~~~

`runICC` runs an identifier consuming computation, which receives a map (heap) and an infinite list of identifiers (pointers)
allowed to be used with that map.


# Use Case: Doubly Linked Lists

~~~~~~~~~~~~~~~~~ {.haskell}
data DList k a
    = Empty
    | NonEmpty
        { first   :: Id k
        , last    :: Id k
        , nodes   :: Map k (DListNode k a)
        }

data DListNode k a =
    { previous :: Maybe (Id k)
    , next     :: Maybe (Id k)
    , value    :: a
    }

(<|) :: a -> DList k a -> Id k ->  DList k a
(|>) :: DList k a -> a -> Id k ->  DList k a
~~~~~~~~~~~~~~~~~

# Use Case: Doubly Linked Lists (v2)

It is a problem that at insertions free `Id`s are needed.
This new version solves that problem:

~~~~~~~~~~~~~~~~~ {.haskell}
data DList k a
    = Empty
    | NonEmpty
        { first   :: Id k
        , last    :: Id k
        , nodes   :: Map k (DListNode k a)
        , freeIds :: [Id k]     -- stored free Ids
        }

(<|) :: a -> DList k a ->  DList k a
(|>) :: DList k a -> a ->  DList k a
~~~~~~~~~~~~~~~~~

# Use Case: Doubly Linked Lists (v3)

This version simplifies the creation of `DList`s:

~~~~~~~~~~~~~~~~~ {.haskell}
data DList a
    = Empty
    | forall k . NonEmpty    -- encapsulated heap
        { first   :: Id k
        , last    :: Id k
        , nodes   :: Map k (DListNode k a)
        , freeIds :: [Id k]
        }

singleton :: a -> DList a

(<|) :: a -> DList a ->  DList a
(|>) :: DList a -> a ->  DList a
~~~~~~~~~~~~~~~~~


# Use Case: Doubly Linked Lists (v3, continued)

Code for `singleton`:

~~~~~~~~~~~~~~~~~ {.haskell}
singleton :: a -> DList a
singleton x = runICC $ \emptyMap (firstId: otherIds) ->
    NonEmpty
        { first   = firstId
        , last    = firstId
        , nodes   = insert firstId x emptyMap
        , freeIds = otherIds
        }
~~~~~~~~~~~~~~~~~

But `DList`s can not be joined because if we open two `NonEmpty` values, the phantom variables can not be unified by the type system (which is right).


# Improvement: Identifier Subtyping

If `k1` ≠ `k2` then `Id k1` can not be used instead of `Id k2`.
This is right, because this type variables marks "different regions of memory".  
But sometimes memory regions should be joined.

`Id (k1 :|: k2)` is the joined set of `Id k1` and `Id k2`.

`:|:` is an infix type constructor with kind `* -> * -> *`:

~~~~~~~~~~~~~~~~~ {.haskell}
data (a :|: b)
    -- no constructors
~~~~~~~~~~~~~~~~~

# Identifier Subtyping (continued)

`Id (k1 :|: k2)` is the joined set of `Id k1` and `Id k2`.

A value with type `Id k1` is acceptable when a value with type `Id (k1 :|: k2)` is needed.  
In other words, `Id k1` is a subtype of `Id (k1 :|: k2)`.  
There is no subtyping in Haskell so we use explicit conversion functions:

~~~~~~~~~~~~~~~~~ {.haskell}
left  :: Id k1 -> Id (k1 :|: k2)
right :: Id k2 -> Id (k1 :|: k2)
~~~~~~~~~~~~~~~~~

One can join two maps (two heaps or two "memory regions") with `union`:

~~~~~~~~~~~~~~~~~ {.haskell}
union :: Map k1 a -> Map k2 a -> Map (k1 :|: k2) a
~~~~~~~~~~~~~~~~~


# Use Case: Doubly Linked Lists (v4)

A simplification first: the `freeIds` field is not needed because any number of
free `Id`s can be obtained by joining a new "memory region":

~~~~~~~~~~~~~~~~~ {.haskell}
data DList a
    = Empty
    | forall k . NonEmpty
        { first   :: Id k
        , last    :: Id k
        , nodes   :: Map k (DListNode k a)
        -- , freeIds :: [Id k]    -- not needed
        }
~~~~~~~~~~~~~~~~~

# Use Case: Doubly Linked Lists (v4, continued)

~~~~~~~~~~~~~~~~~ {.haskell}
(><) :: DList a -> DList a -> DList a
Empty >< y = y
x >< Empty = x
x >< y = NonEmpty
    { first = left  (first x)
    , last  = right (last  y)
    , nodes = ... (fmap left (nodes x) 
                   `union` fmap right (nodes y))
    }
~~~~~~~~~~~~~~~~~

`...` contains code which redirects  
`next (last x)` to `first y` and  
`previous (first y)` to `(last x)`.

# Improvement: Split Maps

Redirecting `next (last x)` to `first y` is complicated because a `DListNode` record have to be updated.

This could be improved if three different maps were used for `previous`, `next` and `value` values.
But a pointer can only point to one object.

Solution: Maps are tagged with type-level integers. 
A pointer can be a key in several maps with different integers.

We will use  `(Map I0 k a, Map I1 k b, Map I2 k c)`  
instead of   `Map k (a, b, c)`.  

# Improvement: Split Maps (continued)

To understand the implementation:  
The finite map `Map i k a` represents a scattered memory fragment with the following properties:

-   The memory fragment contains an `a`-typed values.
-   The pieces of the memory fragment are some record's `i`th field.
    -   `i` is a type-level integer (`I0`, `I1`, `I2`, ... in the implementation).
    -   The records need not have the same type.
-   `k` is an additional tag (for example, to separate two doubly linked lists)


# Use Case: Doubly Linked Lists (v5)

~~~~~~~~~~~~~~~~~ {.haskell}
data DList a
    = Empty
    | forall k . NonEmpty
        { first    :: Id k
        , last     :: Id k
        , previous :: Map I0 k (Id k)
        , next     :: Map I1 k (Id k)
        , value    :: Map I2 k  a
        }
~~~~~~~~~~~~~~~~~

# Creation of Split Maps

Basic solution: There are a family of functions

~~~~~~~~~~~~~~~~~ {.haskell}
runICC1  :: (forall k. Map I0 k a -> [Id k] -> b) -> b
runICC2  :: (forall k. Map I0 k a -> Map I1 k a -> [Id k] -> b) -> b
runICC3  :: (forall k. Map I0 k a -> Map I1 k a -> Map I2 k a -> [Id k] -> b) -> b
...
~~~~~~~~~~~~~~~~~

Instead of that, the current implementation use a variant of the function

~~~~~~~~~~~~~~~~~ {.haskell}
runICC :: (forall k. Maps i k -> [Id k] -> b) -> b
~~~~~~~~~~~~~~~~~

where `Maps` is a GADT which can be unfolded into `i` maps.


# Conclusion / Efficiency 

The implementation is as efficient as if mutable references were used:

- `Map`s are not present in the generated code (for example, `NonEmpty` has two fields). 
- `Id`s are replaced by pointers to records (arrays actually).

TODOs:

- The implementation has to be reviewed.
- The `Maybe`s still cause some performance loss.


# Conclusion / Safety

Guarantees by the type system:

-   Pointers are typed (by the type of the pointed value).
-   Pointers can not escape their scope.
    - Pointer in "different regions" can not be exchanged by accident.

TODOs:

-   Linear use is checked *only in runtime*.
    - This is a big disadvantage.
    - Should be checked statically, which needs at least annotated types and a strictness analyzer.


# Conclusion / Usability

Pros:

-   Highly functional interface (similar to finite maps).
    - Less strict evaluation order (more possibility to parallel execution).
-   One can virtually join `i`th fields of different records (if the `i`th fields has the same type).

Cons:

-   Linear use should be obeyed.
-   Creation of maps is a bit uncomfortable (maps has to be carried).


# Conclusion / Semantics

The library has a simple semantics.

This is demonstrated by a small pure functional implementation of the interface functions.


# Further Extensions 

Sets can be modeled as maps to unit values. 

-   The current implementation is more efficient than that: 32 sets are packed into 1 integer map.
-   The interface of sets and maps are unified.

Identifiers can refer to static data.  
For example, if a sequence is implemented by a doubly linked map, 
`previous` and `next` are mutable but `value` is static. So two maps are sufficient.


# Related Work

-   [DDC](http://www.haskell.org/haskellwiki/DDC), The Disciplined Disciple Compiler
    -   An explicitly lazy dialect of Haskell.
    -   Supports destructive update, computational effects, type directed field projections.
-   [Monadic Regions](http://okmij.org/ftp/Haskell/regions.html)
    -   A technique for managing resources (memory areas, file handles, database connections).


# Forthcoming Use Cases

- Graph walks.
    - with tagging
    - with pointer reversal
- Strongly connected components computation.
- Linear time type inference algorithm with pointers.


# Thanks

Thanks for your attention!