-- |
-- Module : Data.Edison
-- Copyright : Copyright (c) 2006 Robert Dockins
-- License : MIT; see COPYRIGHT file for terms and conditions
--
-- Maintainer : robdockins AT fastmail DOT fm
-- Stability : stable
-- Portability : GHC, Hugs (MPTC and FD)
--
-- Edison is a library of purely functional data structures written by
-- Chris Okasaki. It is named after Thomas Alva Edison and for the
-- mnemonic value /ED/i/S/on (/E/fficent /D/ata /S/tructures).
--
-- Edison provides several families of abstractions, each with
-- multiple implementations. The main abstractions provided by Edison are:
--
-- * /Sequences/ such as stacks, queues, and dequeues,
--
-- * /Collections/ such as sets, bags and heaps, and
--
-- * /Associative Collections/ such as finite maps and priority queues
-- where the priority and element are distinct.
--
--
--
-- /Conventions:/
--
-- Each data structure is implemented as a separate module. These modules
-- should always be imported @qualified@ to prevent a flood of name clashes,
-- and it is recommended to rename the module using the @as@ keyword to reduce
-- the overhead of qualified names and to make substituting one implementation
-- for another as painless as possible.
--
-- Names have been chosen to match standard usage as much as possible. This
-- means that operations for abstractions frequently share the same name
-- (for example, @empty@, @null@, @size@, etc). It also means that in many
-- cases names have been reused from the Prelude. However, the use of
-- @qualified@ imports will prevent name reuse from becoming name clashes. If
-- for some reason you chose to import an Edison data structure unqualified,
-- you will likely need to import the Prelude @hiding@ the relevant names.
--
-- Edison modules also frequently share type names. For example, most sequence
-- type constructors are named @Seq@. This additionally aids substituting
-- implementations by simply importing a different module.
--
-- Argument orders are selected with the following points in mind:
--
-- * /Partial application:/ arguments more likely to be static usually
-- appear before other arguments in order to facilitate partial
-- application.
--
-- * /Collection appears last:/ in all cases where an operation queries a
-- single collection or modifies an existing collection, the collection
-- argument will appear last. This is something of a de facto standard
-- for Haskell datastructure libraries
-- and lends a degree of consistency to the API.
--
-- * /Most usual order:/ where an operation represents a well-known
-- mathematical function on more than one datastructure, the arguments
-- are chosen to match the most usual argument order for the function.
--
--
-- /Type classes:/
--
-- Each family of abstractions is defined as a set of classes: a main class
-- that every implementation of that abstraction should support and several
-- auxiliary subclasses that an implementation may or may not support. However,
-- not all applications require the power of type classes, so each method
-- is also directly accessible from the implementation module. Thus you can
-- choose to use overloading or not, as appropriate for your particular
-- application.
--
-- Documentation about the behavior of data structure operations is defined
-- in the modules "Data.Edison.Seq", "Data.Edison.Coll" and
-- "Data.Edison.Assoc". Implementations are required to respect
-- the descriptions and axioms found in these modules. In some cases time
-- complexity is also given. Implementations may differ from these time
-- complexities; if so, the differences will be given in the documentation for
-- the individual implementation module.
--
--
--
-- /Notes on Eq and Ord instances:/
--
-- Many Edison data structures require @Eq@ or @Ord@ contexts to define equivalence
-- and total ordering on elements or keys. Edison makes the following assumptions
-- about all such required instances:
--
-- * An @Eq@ instance correctly defines an equivalence relation (but not necessarily
-- structural equality); that is, we assume @(==)@ (considered as a
-- relation) is reflexive, symmetric and transitive, but allow that equivalent
-- items may be distinguishable by other means.
--
-- * An @Ord@ instance correctly defines a total order which is consistent with
-- the @Eq@ instance for that type.
--
-- These assumptions correspond to the usual meanings assigned to these classes. If
-- an Edison data structure is used with an @Eq@ or @Ord@ instance which violates these
-- assumptions, then the behavior of that data structure is undefined.
--
--
--
-- /Notes on Read and Show instances:/
--
-- The usual Haskell convention for @Read@ and @Show@ instances (as championed by the
-- Haskell \"deriving\" mechanism), is that @show@ generates a string which is a
-- valid Haskell expression built up
-- using the data type's data constructors such that, if interpreted as Haskell code, the
-- string would generate an identical data item. Furthermore, the derived @Read@
-- instances are able to parse such strings, such that @(read . show) === id@.
-- So, derived instances of @Read@ and @Show@ exhibit
-- the following useful properties:
--
-- * @read@ and @show@ are complementary; that is, @read@ is a useful inverse for @show@
--
-- * @show@ generates a string which is legal Haskell code representing the data item
--
-- For concrete data types, the deriving mechanism is usually quite sufficient.
-- However, for abstract types the derived @Read@ instance may allow users to create data
-- which violates invariants. Furthermore, the strings resulting from @show@ reference hidden
-- data constructors which violates good software engineering principles and also
-- cannot be compiled because the constructors are not available outside the defining module.
--
-- Edison avoids most of these problems and still maintains the above useful properties by
-- doing conversions to and from lists and inserting explicit calls to the list conversion
-- operations. The corresponding @Read@ instance strips the list conversion call before
-- parsing the list. In this way, private data constructors are not revealed and @show@ strings
-- are still legal, compilable Haskell code. Furthermore, the showed strings gain a degree of
-- independence from the underlying datastructure implementation.
--
-- For example, calling @show@ on an empty Banker's queue will result in the following string:
--
-- > Data.Edison.Seq.BankersQueue.fromList []
--
-- Datatypes which are not native Edison data structures (such as StandardSet and StandardMap)
-- may or may not provide @Read@ or @Show@ instances and, if they exist, they may or may
-- not also provide the properties that Edison native @Read@ and @Show@ instances do.
--
--
-- /Notes on time complexities:/
--
-- Some Edison data structures (only the sequences currently) have detailed time complexity
-- information. Unless otherwise stated, these are amortized time complexities, assuming
-- persistent usage of the datastructure. Much of this data comes from:
--
-- Martin Holters. /Efficent Data Structures in a Lazy Functional Language/. Master's Thesis.
-- Chalmers University of Technology, Sweden. 2003.
--
-- /Notes on unsafe functions:/
--
-- There are a number of different notions of what constitutes an unsafe function.
-- In Haskell, a function is generally called \"unsafe\" if it can subvert
-- type safety or referential integrity, such as @unsafePerformIO@ or @unsafeCoerce#@.
-- In Edison, however, we downgrade the meaning of \"unsafe\" somewhat. An
-- \"unsafe\" Edison function is one which, if misused, can violate the structural
-- invariants of a data structure. Misusing an Edison \"unsafe\" function should
-- never cause your runtime to crash or break referential integrity, but it may cause
-- later uses of a data structure to behave in undefined ways. Almost all unsafe functions
-- in Edison are labeled with the @unsafe@ prefix. An exception to this rule is the
-- @With@ functions in the 'Set' class, which are also unsafe but do not have
-- the prefix. Unsafe functions will have explicit preconditions listed in their
-- documentation.
--
--
--
-- /Notes on ambiguous functions:/
--
-- Edison also contains some functions which are labeled \"ambiguous\". These
-- functions cannot violate the structural invariants of a data structure, but, under
-- some conditions, the result of applying an ambiguous function is not well defined.
-- For ambiguous functions, the result of applying the function may depend on otherwise
-- unobservable internal state of the data structure, such as the actual shape of a
-- balanced tree. For example, the 'AssocX' class contains the @fold@ function, which
-- folds over the elements in the collection in an arbitrary order. If the combining
-- function passed to @fold@ is not fold-commutative (see below), then the result of
-- the fold will depend on the actual order that elements are presented to the
-- combining function, which is not defined.
--
-- To aid programmers, each API function is labeled /ambiguous/ or /unambiguous/ in its
-- documentation. If a function is unambiguous only under some circumstances,
-- that will also be explicitly stated.
--
-- An \"unambiguous\" operation is one where all correct implementations of the operation
-- will return \"indistinguishable\" results. For concrete data types, \"indistinguishable\"
-- means structural equality. An instance of an abstract data type is considered
-- indistinguishable from another if all possible applications of unambiguous
-- operations to both yield indistinguishable results. (Note: this definition is
-- impredicative and rather imprecise. Should it become an issue, I will attempt to
-- develop a better definition. I hope the intent is sufficiently clear).
--
-- A higher-order unambiguous operation may be rendered ambiguous if passed a \"function\" which
-- does not respect referential integrity (one containing @unsafePerformIO@ for example).
-- Only do something like this if you are 110% sure you know what you are doing, and even then
-- think it over two or three times.
--
--
--
-- /How to choose a fold:/
--
-- /Folds/ are an important class of operations on data structures in a functional
-- language; they perform essentially the same role that iterators perform in
-- imperative languages. Edison provides a dizzying array of folds which (hopefully)
-- correspond to all the various ways a programmer might want to fold over a data
-- structure. However, it can be difficult to know which fold to choose for a
-- particular application. In general, you should choose a fold which provides
-- the /fewest/ guarantees necessary for correctness. The folds which have fewer
-- guarantees give data structure implementers more leeway to provide efficient
-- implementations. For example, if you which to fold a commutative, associative
-- function, you should chose @fold@ (which does not guarantee an order) over @foldl@
-- or @foldr@, which specify particular orders.
--
-- Also, if your function is strict in
-- the accumulating argument, you should prefer the strict folds (eg, @fold'@); they will
-- often provide better space behavior. /Be aware/, however, that the \"strict\" folds
-- are not /necessarily/ more strict than the \"non-strict\" folds; they merely give
-- implementers the option to provide additional strictness if it improves performance.
--
-- For associative collections, only use with @WithKey@ folds if you actually need the
-- value of the key.
--
--
-- /Painfully detailed information about ambiguous folds:/
--
-- All of the folds that are listed ambiguous are ambiguous because they do not or cannot
-- guarantee a stable order with which the folding function will be applied. However,
-- some functions are order insensitive, and the result will be unambiguous regardless
-- of the fold order chosen. Here we formalize this property, which we call
-- \"fold commutativity\".
--
-- We say @f :: a -> b -> b@ is /fold-commutative/ iff @f@ is unambiguous and
--
-- > forall w, z :: b; m, n :: a
-- >
-- > w = z ==> f m (f n w) = f n (f m z)
-- >
--
-- where @=@ means indistinguishability.
--
-- This property is sufficient (but not necessary) to ensure that, for any
-- collection of elements to
-- fold over, folds over all permutations of those elements will generate
-- indistinguishable results. In other words, an ambiguous fold applied to a
-- fold-commutative combining function becomes /unambiguous/.
--
-- Some fold combining functions take their arguments in the reverse order. We
-- straightforwardly extend the notion of fold commutativity to such functions
-- by reversing the arguments. More formally, we say @g :: b -> a -> b@ is fold
-- commutative iff @flip g :: a -> b -> b@ is fold commutative.
--
-- For folds which take both a key and an element value, we extend the notion of fold
-- commutativity by considering the key and element to be a single, uncurried argument.
-- More formally, we say @g :: k -> a -> b -> b@ is fold commutative iff
--
-- > \(k,x) z -> g k x z :: (k,a) -> b -> b
--
-- is fold commutative according to the above definition.
--
-- Note that for @g :: a -> a -> a@, if @g@ is unambiguous,
-- commutative, and associative, then @g@ is fold-commutative.
--
-- Proof:
--
-- > let w = z, then
-- > g m (g n w) = g m (g n z) g is unambiguous
-- > = g (g n z) m commutative property of g
-- > = g n (g z m) associative property of g
-- > = g n (g m z) commutative property of g
--
-- Qed.
--
-- Thus, many common numeric combining functions, including @(+)@ and @(*)@ at
-- integral types, are fold commutative and can be safely used with ambiguous
-- folds.
--
-- /Be aware/ however, that @(+)@ and @(*)@ at floating point types are only
-- /approximately/ commutative and associative due to rounding errors; using
-- ambiguous folds with these operations may result in subtle differences in
-- the results. As always, be aware of the limitations and numeric
-- properties of floating point representations.
--
--
--
-- /About this module:/
--
-- This module re-exports the various data structure abstraction classes, but
-- not their methods. This allows you to write type signatures which have
-- contexts that mention Edison type classes without having to import the
-- appropriate modules @qualified@. The class methods are not exported to
-- avoid name clashes. Obviously, to use the methods of these classes, you
-- will have to import the appropriate modules. This module additionally
-- re-exports the entire "Data.Edison.Prelude" module.
--
--
--
-- /Miscellaneous points:/
--
-- Some implementations export a few extra functions beyond those included
-- in the relevant classes. These are typically operations that are
-- particularly efficient for that implementation, but are not general enough
-- to warrant inclusion in a class.
--
-- Since qualified infix symbols are fairly ugly, they have been largely avoided.
-- However, the "Data.Edison.Sym" module defines a number of infix operators
-- which alias the prefix operators; this module is intended to be imported
-- unqualified.
--
-- Most of the operations on most of the data structures are strict. This is
-- inevitable for data structures with non-trivial invariants. Even given
-- that, however, many of the operations are stricter than necessary. In
-- fact, operations are never deliberately made lazy unless the laziness is
-- required by the algorithm, as can happen with amortized data structures.
--
-- Note, however, that the various sequence implementations are always lazy
-- in their elements. Similarly, associative collections are always lazy in
-- their elements (but usually strict in their keys). Non-associative
-- collections are usually strict in their elements.
module Data.Edison (
-- * Sequence class
Sequence
-- * Collection classes
-- ** Non-observable collections
, CollX
, OrdCollX
, SetX
, OrdSetX
-- ** Observable collections
, Coll
, OrdColl
, Set
, OrdSet
-- * Associative collection classes
-- ** Non-observable associative collections
, AssocX
, OrdAssocX
, FiniteMapX
, OrdFiniteMapX
-- ** Observable associative collections
, Assoc
, OrdAssoc
, FiniteMap
, OrdFiniteMap
, module Data.Edison.Prelude
) where
import Data.Edison.Prelude
import Data.Edison.Seq
import Data.Edison.Coll
import Data.Edison.Assoc