judy-0.2: Fast, scalable, mutable dynamic arrays, maps and hashes




Very fast, mutable associative data types based on Judy arrays.

A good imperative, mutable replacement for IntMap.

Judy arrays are both speed- and memory-efficient, with no tuning or configuration required, across a wide range of index set types (sequential, periodic, clustered, random). Judy's speed and memory usage are typically better than other data storage models such as skiplists, linked lists, binary, ternary, b-trees, or even hashing, and improves with very large data sets.

The memory used by a Judy array is nearly proportional to the population (number of elements). Note that as Judy is allocated on C language side, GHC's profiling system won't report memory use by Judy arrays.

For further references to the implementation, see:

Building a simple word-index table. About 4x faster than using an IntMap

 import Control.Monad
 import qualified Data.Judy as J

 main = do
    j <- J.new :: IO (J.JudyL Int)
    forM_ [1..10000000] $ \n -> J.insert n (fromIntegral n :: Int) j
    v <- J.lookup 100 j
    print v

Running this:

 $ ghc -O2 --make A.hs
 [1 of 1] Compiling Main             ( A.hs, A.o )
 Linking A ...
 $ time ./A
 Just 100
 ./A  1.95s user 0.08s system 99% cpu 2.028 total


Basic types

data JudyL a Source

A JudyL array is a finite map from Word to Word values. A value is addressed by a key (Key). The array may be sparse, and the key may be any word-sized value. There are no duplicate keys.


Show (JudyL a) 

type Key = WordSource

The type of keys in the JudyL arrays. A word-sized type (64 or 32 bits)


new :: JE a => IO (JudyL a)Source

Allocate a new empty JudyL array. A finalizer is associated with the JudyL array, that will free it automatically once the last reference has been dropped. Note that if you store pointers in the Judy array we have no way of deallocating those. You need to track those yourself (e.g. via StableName or ForeignPtr).

The Haskell GC will track references to the foreign resource, but the foreign resource won't exert any heap pressure on the GC, meaning that finalizers will be run much later than you expect. An explicit performGC can help with this.

insert :: JE a => Key -> a -> JudyL a -> IO ()Source

Insert a key and value pair into the JudyL array. *If the key is already present in the map, the value is not modified*

lookup :: JE a => Key -> JudyL a -> IO (Maybe a)Source

Lookup a value associated with a key in the JudyL array.

delete :: Key -> JudyL a -> IO ()Source

Delete the Index/Value pair from the JudyL array.

Judy-storable types

class JE a whereSource

Class of things that can be stored in the JudyL array. You need to be able to convert the structure to a Word value, or a word-sized pointer.


toWord :: a -> IO WordSource

Convert the Haskell value to a word-sized type that may be stored in a JudyL

fromWord :: Word -> IO aSource

Reconstruct the Haskell value from the word-sized type.