-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | Atom (or symbol) datatype for fast comparision and sorting.
--
-- This module provides an abstract datatype for atoms, such that:
--
--
-- - Each atom string is only in memory once * O(n) creation
-- time * O(1) equality-comparison * O(1) (in practice)
-- ord-comparison * Ord-comparison results are independent on
-- evaluation order
--
--
-- This module is thread-safe.
@package simple-atom
@version 0.1.0.1
-- | Symbols without a central symbol table.
--
-- Symbols provide the following efficient operations:
--
--
-- - O(1) equality comparison (in practise) - O(1)
-- ordering comparison (in practise) - O(n) creation
--
--
-- This can be implemented by using a global variable mapping strings to
-- symbols and a counter assigning ids to symbols. However, this has two
-- problems:
--
--
-- - It has a space leak. No symbols can ever be removed from this
-- table. For example, if we add the symbol "foo" the first time
-- it might get assigned id 1, if we then delete it and insert it again
-- it might get assigned id 42. However, there may still be symbols in
-- memory which got assigned id 1. Instead, symbols should be garbage
-- collected like other data. Using weak pointers has bad effects on
-- performance due to garbage collector overhead.
-- - It is not reliable to compare symbols created using different
-- symbol tables. They would most likely get assigned different ids.
--
--
-- This implementation of symbols allows *optional* use of a symbol
-- table. If a symbol table is used, this implementation will tend to use
-- less memory and its operations will be a little bit faster at the
-- beginning. For longer runs, it won't make a big difference though,
-- since the representation is self-optimising.
--
-- Inspired by Richard OKeefes message to Erlang's eeps mailing
-- list http://www.erlang.org/cgi-bin/ezmlm-cgi/5/057, which in
-- turn was inspired by the Logix implementation of Flat Concurrent
-- Prolog.
--
--
--
-- Each symbol is represented a pointer to the symbol info, which
-- consists of:
--
--
-- - a String * a Hash * a null-able parent pointer to
-- an equivalent symbol info
--
--
-- Creating the same symbol twice will at first be represented as two
-- different entities.
--
--
-- .----+-------+-----.
-- A -----> | 42 | foo | nil |
-- ----+-------+-----
-- B --.
-- '--> .----+-------+-----.
-- C -----> | 42 | foo | nil |
-- ----+-------+-----
--
--
-- (Note that A, B and C are IORefs.)
--
-- When comparing A and B we use the following
-- properties:
--
--
-- - If A and B are identical then they must be
-- equal.
-- - If they point to the same object, they must equal.
-- - If they have different hashes, they are different.
--
--
-- Unless there is a hash collision, we can decide equality and ordering
-- for all symbols that have been built with the same hash table.
--
-- If the two objects have no parent, have the same hash, and the same
-- string, we now make one the first the parent of the other and update
-- the pointer of B accordingly. If there are no references to
-- the second object left it can now be garbage collected.
--
-- If an object already has a parent pointer we follow each object's
-- parents to the roots and compare the roots. This process might again
-- result in updates to A or B and various parent
-- pointers.
--
-- In the example above, after A == B we have:
--
--
-- .----+-------+-----.
-- A -----> | 42 | foo | nil |
-- .--> ----+-------+-----
-- B --' ^
-- .----+-------+--|--.
-- C -----> | 42 | foo | * |
-- ----+-------+-----
--
--
-- After C == A or C == B we have.
--
--
-- A -----> .----+-------+-----.
-- .--> | 42 | foo | nil |
-- B --'.-> ----+-------+-----
-- | ^
-- | .----+-------+--|--.
-- C ---' | 42 | foo | * |
-- ----+-------+-----
--
--
-- The second object will now be garbage collected.
--
-- In fact, after the first A == B, the remaining updates could
-- use some help from the garbage collector. This could be done by
-- somehow forcibly (and unsafely) replacing the second object by an
-- update frame and then rely on the GC's indirection shortening feature.
-- This is very unsafe, since some code may rely "know" that the
-- object is already evaluated. E.g., C's pointer could be tagged (c.f.
-- "Faster Laziness Using Dynamic Pointer Tagging"). It might work
-- if we can match the physical layout of both structures, but it's
-- equally likely that hell freezes over, so I'll leave that as an
-- exercise for more braver hackers.
--
--
-- - TODO
-- - generalise to arbitrary hashable objects. need not be restricted
-- to String.
-- - make thread-safe. (we only need a lock for the uncommon
-- cases)
-- - make sure the pointer update code is correct and has no bad
-- cases
-- - implement IntMap variant/wrapper that respects that two different
-- objects may have the same key (however unlikely).
--
module Data.Atom.UF
-- | A symbol.
data Symbol
-- | Create a new local symbol. For best performance use internInto
-- together with a symbol table / map.
intern :: String -> Symbol
-- | Insert a symbol into an existing table.
internInto :: SymTab s => s -> String -> (s, Symbol)
class SymTab s
lookupSymbol :: SymTab s => s -> String -> Maybe Symbol
insertSymbol :: SymTab s => String -> Symbol -> s -> s
instance Show Symbol
instance Ord Symbol
instance Eq Symbol
module Data.Atom.Simple
-- | A Symbol. This is essentially a String, but with
-- different performance characteristics:
--
--
-- - O(n) creation time (using insert)
-- - O(1) equality comparison.
-- - O(1) comparison (in practice). The result of
-- compare is independent of evaluation order.
--
--
-- It is currently implemented as follows.
--
--
-- - Each symbol contains a unique integer, which allows O(1)
-- comparison.
-- - Each symbol contains an infinite chain of hashes, these are used
-- for comparison. In practice, it is very rare that more than the first
-- of those hashes is ever evaluated. The first hash is cached, so that
-- most comparisons will not need any indirections.
-- - The String representation of the symbol. Use show to
-- return it. At any time, there will be only one symbol of a given name
-- in memory.
--
data Symbol
-- | Turn a String into a Symbol.
--
-- Note, however, that this function contains a space leak. It has
-- internal state (the symbol table) but is referentially transparent.
-- Unfortunately, there is no way to delete items from the symbol table.
--
-- (This function is, of course, thread-safe.)
intern :: String -> Symbol
instance Ord Symbol
instance Eq Symbol
instance Show Symbol