-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | Atom (or symbol) datatype for fast comparision and sorting. -- -- This module provides an abstract datatype for atoms, such that: -- -- -- -- This module is thread-safe. @package simple-atom @version 0.1.0.1 -- | Symbols without a central symbol table. -- -- Symbols provide the following efficient operations: -- -- -- -- This can be implemented by using a global variable mapping strings to -- symbols and a counter assigning ids to symbols. However, this has two -- problems: -- --
    --
  1. It has a space leak. No symbols can ever be removed from this -- table. For example, if we add the symbol "foo" the first time -- it might get assigned id 1, if we then delete it and insert it again -- it might get assigned id 42. However, there may still be symbols in -- memory which got assigned id 1. Instead, symbols should be garbage -- collected like other data. Using weak pointers has bad effects on -- performance due to garbage collector overhead.
  2. --
  3. It is not reliable to compare symbols created using different -- symbol tables. They would most likely get assigned different ids.
  4. --
-- -- This implementation of symbols allows *optional* use of a symbol -- table. If a symbol table is used, this implementation will tend to use -- less memory and its operations will be a little bit faster at the -- beginning. For longer runs, it won't make a big difference though, -- since the representation is self-optimising. -- -- Inspired by Richard OKeefes message to Erlang's eeps mailing -- list http://www.erlang.org/cgi-bin/ezmlm-cgi/5/057, which in -- turn was inspired by the Logix implementation of Flat Concurrent -- Prolog. -- -- -- -- Each symbol is represented a pointer to the symbol info, which -- consists of: -- -- -- -- Creating the same symbol twice will at first be represented as two -- different entities. -- --
--            .----+-------+-----.
--   A -----> | 42 | foo | nil |
--            ----+-------+-----
--   B --.
--       '--> .----+-------+-----.
--   C -----> | 42 | foo | nil |
--            ----+-------+-----
--   
-- -- (Note that A, B and C are IORefs.) -- -- When comparing A and B we use the following -- properties: -- --
    --
  1. If A and B are identical then they must be -- equal.
  2. --
  3. If they point to the same object, they must equal.
  4. --
  5. If they have different hashes, they are different.
  6. --
-- -- Unless there is a hash collision, we can decide equality and ordering -- for all symbols that have been built with the same hash table. -- -- If the two objects have no parent, have the same hash, and the same -- string, we now make one the first the parent of the other and update -- the pointer of B accordingly. If there are no references to -- the second object left it can now be garbage collected. -- -- If an object already has a parent pointer we follow each object's -- parents to the roots and compare the roots. This process might again -- result in updates to A or B and various parent -- pointers. -- -- In the example above, after A == B we have: -- --
--            .----+-------+-----.
--   A -----> | 42 | foo | nil |
--       .--> ----+-------+-----
--   B --'                    ^
--            .----+-------+--|--.
--   C -----> | 42 | foo |  *  |
--            ----+-------+-----
--   
-- -- After C == A or C == B we have. -- --
--   A -----> .----+-------+-----.
--       .--> | 42 | foo | nil |
--   B --'.-> ----+-------+-----
--        |                   ^
--        |   .----+-------+--|--.
--   C ---'   | 42 | foo |  *  |
--            ----+-------+-----
--   
-- -- The second object will now be garbage collected. -- -- In fact, after the first A == B, the remaining updates could -- use some help from the garbage collector. This could be done by -- somehow forcibly (and unsafely) replacing the second object by an -- update frame and then rely on the GC's indirection shortening feature. -- This is very unsafe, since some code may rely "know" that the -- object is already evaluated. E.g., C's pointer could be tagged (c.f. -- "Faster Laziness Using Dynamic Pointer Tagging"). It might work -- if we can match the physical layout of both structures, but it's -- equally likely that hell freezes over, so I'll leave that as an -- exercise for more braver hackers. -- -- module Data.Atom.UF -- | A symbol. data Symbol -- | Create a new local symbol. For best performance use internInto -- together with a symbol table / map. intern :: String -> Symbol -- | Insert a symbol into an existing table. internInto :: SymTab s => s -> String -> (s, Symbol) class SymTab s lookupSymbol :: SymTab s => s -> String -> Maybe Symbol insertSymbol :: SymTab s => String -> Symbol -> s -> s instance Show Symbol instance Ord Symbol instance Eq Symbol module Data.Atom.Simple -- | A Symbol. This is essentially a String, but with -- different performance characteristics: -- -- -- -- It is currently implemented as follows. -- -- data Symbol -- | Turn a String into a Symbol. -- -- Note, however, that this function contains a space leak. It has -- internal state (the symbol table) but is referentially transparent. -- Unfortunately, there is no way to delete items from the symbol table. -- -- (This function is, of course, thread-safe.) intern :: String -> Symbol instance Ord Symbol instance Eq Symbol instance Show Symbol