-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | A library for compiling pattern-matching to decision trees -- -- This library implements a transformation from pattern-matching to -- decision trees, for use by compiler writers. It provides support for -- generating useful error messages, as well as efficient trees through -- the use of heuristics. @package pattern-matcher @version 0.1.0.1 -- |

Overview and basic concepts

-- -- This library implements compilation and analysis facilities for -- language compilers supporting ML- or Haskell-style pattern matching. -- It provides a compiler from a matching to a decision tree, an -- efficient representation mapping easily into low level languages. It -- supports most features one would expect, such as variable bindings, -- or- and as-patterns, etc. and is also able to detect anomalies in the -- maching, such as non-exhaustivity or redundantness. It is based on -- Compiling pattern-matching to good decision trees and -- Warnings for pattern matching by Luc Maranget. -- --

Pattern matching

-- -- Patterns are assumed to be linear and matched “from top to bottom”. -- This library adopts a simplified view of patterns, or pattern -- Skeletons, that should be rich enough to accomodate most -- compilers need. It is either a catch-all pattern, eventually binding -- an identifier or a constructor pattern made of a tag and -- subpatterns, filtering only those expression sharing the same -- tag and whose subexpressions are also filtered by the -- subpatterns. -- -- As-patterns and or-patterns are also supported, while as-patterns have -- there own Skeleton, or-patterns must first be decomposed into -- distinct lists of patterns. -- -- In this documentation, a “row” is a list of patterns associated with -- an output, that will be selected if an expression matches all those -- patterns, while a “column” is a list of patterns that are tested -- against an expression from top to bottom. -- --

Decision trees

-- -- Decision trees can be thought of as cascading switches. Each -- Switch checks the constructor of an expression to decide what -- path to take, until it reaches a Leaf (success) or encounters a -- dead-end 'Fail. Consider this Haskell example: -- --
--   case e of
--    ([], 0)    -> 0
--    (_ : _, 1) -> 1
--   
-- -- A possible decision tree corresponding to this expression could be: -- --
--   Switch e +--- (,) ---> Switch e(,).0 +---  [] ----> Switch e(,).1 +---- 0 ----> Leaf 0
--                                        |                            |
--                                        |                            \---- _ ----> Fail [([], 1), ([], 2), ...]
--                                        |
--                                        \--- _:_ ----> Switch e(,).1 +---- 1 ----> Leaf 1
--                                                                     |
--                                                                     \---- _ ----> Fail [(_:_, 0), (_:_, 2), (_:_, 3), ...]
--   
-- -- First, the expression e is checked for the tag (,). -- Since there is no other constructor for (,), this always -- succeeds. Matching on a tuple yields to subexpression that we name -- e(,).0 and e(,).1 (the Select type handles -- subexpression selection), that must be matched to two “columns” of -- patterns: e(,).0 against [] or _:_ and -- e(,).1 against 0 or 1. Note that we have a -- choice which test to perform first. Here we decide to check -- e(,).0 against [] and _:_. Since this is -- the set of all possible constructors for lists, there is no -- possibility for the match to fail here. We are then left with -- e(,).1 to match against 0,in the branch where -- e(,).0 is [] and 1 when e(,).0 is -- _:_. In either case, the matching can fail since 0 -- and 1 do not cover the full range of integers. -- --

Characteristics of decision trees

-- -- A decision tree is only one possible target to compile -- pattern-matching. An alternative is to compile to backtracking -- automata (see, for instance Compiling pattern matching). Unlike -- decision trees, backtracking automata guarantee linear code size, -- however, as the name suggests, they may backtrack, thus testing more -- than once the same expression, which decision trees are guaranteed -- never to do. -- --

Heuristics

-- -- In the example above, we choose to test e(,).0 before -- e(,).1, but we could have made the opposite choice. Also, in -- the _:_ branch we entirely ommited to test -- e(,).0(_:_).0, e(,).0(_:_).1 (i.e. the head and the -- tail of the list introducing by matching on _:_) against the -- two wildcards of _:_. This would of course have been useless, -- since matching against a wildcard always succeeds. The algorithm can -- make similar choices as the one we did through the use of -- Heuristics. The role of Heuristics is to attribute a -- score to a given list of patterns, so that the algorithm will first -- match against the list of patterns with the best score. In this case, -- we attributed a bigger score to the pattern 1 than to the two -- wildcards. A detailed list of how heuristics work, as well as all the -- heuristics studied by Maranget are presented later. -- --

How to use?

-- -- The library is centered around the match and anomalies -- functions. match compiles a matching to a decision tree while -- anomalies simply gathers the anomalies in a matching. Note that -- the anomalies can only be retrieved from the structure of the decision -- tree. -- -- The documentation makes heavy use of polymorphism to accomodate the -- internal representation of most languages. The convention for the -- names of the parameters is: -- -- -- -- To work, these functions need three things from the user (apart from -- the actual matching): -- -- -- --

Complete example

-- -- Consider the following typed language with its own Pattern -- representation: -- --
--   data Typ     = TInt
--                | TList Typ
--   
--   data Pattern = VarPat Typ  String
--                | IntPat      Int
--                | NilPat  Typ                  -- NilPat typ has type TList typ
--                | ConsPat Typ Pattern Pattern  -- ConsPat typ _ _ has type TList typ
--                | OrPat       Pattern Pattern
--                | AsPat       Pattern String
--   
-- -- This language supports variables, integers and lists. It can have or- -- and as-patterns. -- -- This custom representation must first be converted into a -- Skel-based representation. This implies defining the -- tag of constructors: -- --
--   data Tag = NilTag | ConsTag Typ | IntTag Int
--   
-- -- and doing the conversion: -- --
--   toSkel :: Pattern -> [Skel String Tag]
--   toSkel (VarPat typ var)   = [WildSkel (rangeOfTyp typ) (Just var)]
--   toSkel (IntPat i)         = [ConsSkel (cons (IntTag i) [])]
--   toSkel (NilPat _)         = [ConsSkel (cons NilTag [])]
--   toSkel (ConsPat typ p ps) = [ ConsSkel (cons (ConsTag typ) [subp, subps])
--                               | subp  <- toSkel p
--                               , subps <- toSkel ps
--                               ]
--   toSkel (OrPat p1 p2)      = toSkel p1 ++ toSkel p2
--   toSkel (AsPat p i)        = [ AsSkel s i
--                               | s <- toSkel p
--                               ]
--   
-- -- where rangeOfTyp defines the range of Tags patterns -- of a certain type can have: -- --
--   rangeOfTyp :: Typ -> [Tag]
--   rangeOfTyp TInt        = fmap IntTag [minBound .. maxBound]
--   rangeOfTyp (TList typ) = [NilTag, ConsTag typ]
--   
-- -- Finally, Tag must be made an instance of IsTag. -- IsTag has two functions: tagRange :: tag -> -- [tag] that outputs the signature a given tag belongs to -- and subTags :: tag -> [[tag]]. subTags -- t defines the range of tags that can be found in the -- subpatterns of a constructor with tag t. For -- instance, a constructor tagged with ConsTag TInt will have -- two subfields: the first one (the head of the list), can contain any -- integers, the second one (the tail of the list), can be either the -- NilTag or another ConsTag. This gives us the -- following instance: -- --
--   instance IsTag Tag where
--    tagRange NilTag     = [NilTag, ConsTag]
--    tagRange ConsTag    = [NilTag, ConsTag]
--    tagRange (IntTag j) = fmap IntTag [minBound..maxBound]
--   
--    subTags NilTag        = []
--    subTags (ConsTag typ) = [rangeOf typ, rangeOf (TList typ)]
--    subTags (IntTag _)    = []
--   
-- -- and this all one needs to do (apart from choosing a Heuristic) -- to use the compiler. -- --

Preserving sharing

-- -- The presence of or-patterns, like in this example, can cause -- duplication of outputs in leaves of the decision tree. Consider this -- example in OCaml syntax: -- --
--   match e with
--   | 0 | 1 -> e1
--   | _ -> e2
--   
-- -- The resulting decision tree, would be: -- --
--   Switch e +--- 0 ---> e1
--            |
--            \--- 1 ---> e1
--            |
--            \--- _ ---> e2
--   
-- -- with e1 being duplicated, which is undesirable when compiling this -- decision tree further to machine code as it would lead to increased -- code size. As a result, it might be worth to consider using labels for -- outputs and a table linking these labels to expressions. This would -- make the decision tree suitable for compilation using jumps, avoiding -- duplication. module Language.Pattern.Compiler -- | Compiles a matching to a decision tree, using the given heuristic. match :: IsTag tag => Heuristic ident tag expr out -> (pat -> [Skel ident tag]) -> expr -> [(pat, out)] -> DecTree ident tag pat expr out -- | Gathers all the anomalies present in a matching. Nothing -- indicating the absence of an anomaly. data Anomalies ident tag pat Anomalies :: Maybe [pat] -> Maybe [Skel ident tag] -> Anomalies ident tag pat [redundantPatterns] :: Anomalies ident tag pat -> Maybe [pat] [unmatchedPatterns] :: Anomalies ident tag pat -> Maybe [Skel ident tag] -- | Simplified version of match, that simply gathers the anomalies -- of the decision tree. anomalies :: IsTag tag => (pat -> [Skel ident tag]) -> [pat] -> Anomalies ident tag pat -- | A generic description of a constructor pattern, made of a tag -- and subpatterns. data Cons ident tag pattern Cons :: tag -> [Skel ident tag] -> Cons ident tag -- | Smart constructor for Cons. asserts that the number of -- subpatterns matches the tag's arity. cons :: IsTag tag => tag -> [Skel ident tag] -> Cons ident tag data Skel ident tag -- | Carries the range of tags that could have been used in this pattern -- and, potentially, an identifier to bound. WildSkel :: [tag] -> Maybe ident -> Skel ident tag ConsSkel :: Cons ident tag -> Skel ident tag -- | AsSkel p i matches if p matches and binds i -- to the result of the expression it matches against AsSkel :: Skel ident tag -> ident -> Skel ident tag class Ord tag => IsTag tag -- | The range of tags a given tag could have had. t should always -- be an element of tagRange t. In other words: -- --
--   elem t (tagRange t) == True
--   
-- -- The range of a tag is used to generate missing patterns in -- non-exhaustive matches. It might be interesting to consider the order -- the range is given in, to improve the quality of error messages. For -- instance, if one allows pattern-matching on integers, instead of -- simply giving the range [minBound..maxBound], it could be better to -- give the range [0..maxBound] ++ [-1,-2..minBound] (or a range -- alternating positive and negative integers, starting at 0) -- could help generate simpler messages. tagRange :: IsTag tag => tag -> [tag] -- | The range of the tags that can appear in all the subfields of -- a constructor with the given tag. -- --

Example

-- -- Consider the _:_ tag for patterns of type [Bool] in -- Haskell. This pattern has two subpatterns, the head can be either -- True or False, while the tail can be either -- [] or _:_. Thus subTags would simply be, in -- pseudo-Haskell: -- --
--   [[trueTag, falseTag], [nilTag, consTag]]
--   
subTags :: IsTag tag => tag -> [[tag]] -- | Encodes the selection of a subexpression given a tag. data Select expr tag -- | An untouched expression NoSel :: expr -> Select expr tag -- | Sel e t n selects the n+1-th subexpression in -- e assuming e is caracterized by tag t. Such -- an expression will only be generated after it has been checked that -- e has indeed tag t. -- -- For example, Sel (e :: e') _::_ 1, would select the second -- field e :: e', in this case e'. Sel :: Select expr tag -> tag -> Int -> Select expr tag -- | A decision tree can be thought of as a cascade of switches, matching -- on the tag of expressions and subexpressions until reaching a -- result. They map fairly naturally to constructs in low level -- languages, such as C. data DecTree ident tag pat expr out -- | Pattern-matching failure, with a list of all the patterns that aren't -- matched. The list is lazily generated and may be infinite for -- tags with infinite ranges. Fail :: [Skel ident tag] -> DecTree ident tag pat expr out -- | Pattern-matching success Leaf :: [Binding ident (Select expr tag)] -> out -> Maybe [pat] -> DecTree ident tag pat expr out -- | The identifiers bound when reaching this leaf. The list of bindings is -- in the order of matching, as given by the heuristics. [leafBindings] :: DecTree ident tag pat expr out -> [Binding ident (Select expr tag)] -- | The result obtained when arriving at this leaf [leafOut] :: DecTree ident tag pat expr out -> out [leafRedundant] :: DecTree ident tag pat expr out -> Maybe [pat] -- | Branching on an tag expression Switch :: Select expr tag -> Map tag (DecTree ident tag pat expr out) -> Maybe (DecTree ident tag pat expr out) -> DecTree ident tag pat expr out -- | The expression whose tag is being scrutinised [switchOn] :: DecTree ident tag pat expr out -> Select expr tag -- | Branches to follow based on specific tags. Any expression not -- caracterized by any tag will fall back to the default branch. [switchBranches] :: DecTree ident tag pat expr out -> Map tag (DecTree ident tag pat expr out) -- | Branch to follow if the expression's tag is not present in -- the set of branches above. This branch may be absent if all -- tags are present in the switchBranches [switchCatchAll] :: DecTree ident tag pat expr out -> Maybe (DecTree ident tag pat expr out) -- | Binding of an identifier to an expression. Bindings of wildcards are -- conserved. data Binding ident expr (:=) :: Maybe ident -> expr -> Binding ident expr -- | The index of the column of patterns type Index = Int type Score = Int data Heuristic ident tag expr out -- | Computes the Score for a given column. It may use the entire -- pattern matrix but it is also given the index of the column, the -- expression being matched and the column being matched. Score :: ([[Skel ident tag]] -> Index -> Select expr tag -> [Skel ident tag] -> Score) -> Heuristic ident tag expr out -- | Combine two heuristics: compute an initial score with the first -- heuristic and, if several columns have obtained the same score, use -- the second heuristic to choose among them. Combine :: Heuristic ident tag expr out -> Heuristic ident tag expr out -> Heuristic ident tag expr out -- | Combine a list of heuristics from left-to-right, defaulting to using -- no heuristic. Defined as foldr Combine noHeuristic. seqHeuristics :: [Heuristic ident tag expr out] -> Heuristic ident tag expr out -- | This heuristic favours columns whose top pattern is a generalized -- constructor pattern. If the first pattern is a wildcard, the heuristic -- gives <math> and <math> otherwise. firstRow :: Heuristic ident tag expr out -- | This heuristic favours columns with the least number of wildcard -- patterns. If <math> is the number of wildcards in column -- <math>, then <math> is the score of column <math>. smallDefault :: Heuristic ident tag expr out -- | Favours columns resulting in smaller switches. The score of a column -- is the number of branches of the switch resulting of the compilation -- (including an eventually default branch), negated. smallBranchingFactor :: IsTag tag => Heuristic ident tag expr out -- | The sum of the arity of the constructors of this column, negated. arity :: Heuristic ident tag expr out -- | The score is the number of children of the emitted switch node that -- are leaves. leafEdge :: IsTag tag => Heuristic ident tag expr out -- | This heuristic favours columns that lead to fewer rows to test. fewerChildRule :: IsTag tag => Heuristic ident tag expr out -- | The score is the number of output needing this column. neededColumns :: IsTag tag => Heuristic ident tag expr out -- | The score is the number of consecutive outputs needing the column. neededPrefix :: IsTag tag => Heuristic ident tag expr out -- | A cheaper version of neededPrefix, where a pattern counts in -- the score if it is a constructor pattern. constructorPrefix :: IsTag tag => Heuristic ident tag expr out -- | Leaves the column in the same order by giving the score <math> -- to column <math>. noHeuristic :: Heuristic ident tag expr out -- | Reverse the order of the columns by giving the score <math> to -- column <math>. It can be useful in combination with another -- heuristic to reverse the left-to-right bias of this implementation. reverseNoHeuristic :: Heuristic ident tag expr out -- | This heuristic is called a pseudo-heuristic as it doesn't operate on -- the patterns but on the expression. It is most useful as a last resort -- heuristic in combination with others. shorterOccurence :: (Select expr tag -> Score) -> Heuristic ident tag expr out instance (GHC.Show.Show ident, GHC.Show.Show expr) => GHC.Show.Show (Language.Pattern.Compiler.Binding ident expr)