Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

In data mining, a decision tree describes data but not decisions; rather the resulting classification tree can be an input for decision making.

(https://en.wikipedia.org/wiki/Decision_tree_learning, Dec 6 2011)

- data DTree decider branch label
- data DTreeAlgebra decider branch label a = DTreeAlgebra {}
- buildDTree :: (Ord label, Ord branch, Decider decider attr branch) => DeciderGenerator attr decider -> (x -> attr) -> (x -> label) -> [x] -> DTree decider branch label
- foldD :: DTreeAlgebra dec branch label a -> DTree dec branch label -> a
- toDot :: (Show decider, Show branch, Show label) => DTree decider branch label -> String
- class Decider decider attr branch | decider -> attr branch where
- decide :: decider -> attr -> branch

- data Ord t => DecideOrd t = DecideOrd t
- data Eq t => DecideSet t = DecideSet [t]
- class AutoDecide attr decider | attr -> decider where
- autoDeciders :: [attr] -> [decider]

- genOrds :: Ord attr => [attr] -> [DecideOrd attr]
- genOrdsAvg :: Ord attr => (attr -> attr -> attr) -> [attr] -> [DecideOrd attr]
- genPair :: DeciderGenerator attra decidera -> DeciderGenerator attrb deciderb -> DeciderGenerator (attra, attrb) (Either decidera deciderb)
- genMany :: DeciderGenerator attr decider -> DeciderGenerator [attr] (Ixd decider)
- avgF :: Fractional a => a -> a -> a
- avgI :: Integral a => a -> a -> a

# Decision Tree

data DTree decider branch label Source

A decision tree data structure that allows arbitrary numbers of children. It has been proven that a binary tree is equally expressive, but considering that decision trees are a 'white box' model, we do not want to limit ourselves to the binary case because other numbers of children may make more sense to humans.

Converting between binary and arbitrary-child trees is feasible though, but probably not very interesting.

(Eq decider, Eq branch, Eq label) => Eq (DTree decider branch label) | |

(Show decider, Show branch, Show label) => Show (DTree decider branch label) | |

(Show decider, Show branch, Show label) => Convertible (DTree decider branch label) DisplayLatex | |

(Show decider, Show branch, Show label) => Convertible (DTree decider branch label) DisplayText | |

(Decider decider attr branch, Eq branch) => Classifier (DTree decider branch label) attr label |

buildDTree :: (Ord label, Ord branch, Decider decider attr branch) => DeciderGenerator attr decider -> (x -> attr) -> (x -> label) -> [x] -> DTree decider branch labelSource

foldD :: DTreeAlgebra dec branch label a -> DTree dec branch label -> aSource

fold on a DTree

toDot :: (Show decider, Show branch, Show label) => DTree decider branch label -> StringSource

Render a decision tree to Graphviz Dot format.

# Deciders

class Decider decider attr branch | decider -> attr branch whereSource

`decide`

defines the type and semantics of a split. For example,
the split "attr <= 20" is created by `DecideOrd 20`

.

For every possible value of type `branch`

, an actual tree branch
may be created. Allowing many distinct values in `branch`

is a bad
idea. Too many of these may have little predictive value and
exhaust the training database more quickly.

`decider`

: The representation of the decider
`attr`

: The data it needs
`branch`

: The key of that leads to a branch

data Ord t => DecideOrd t Source

Decide with Ord

AutoDecide Double (DecideOrd Double) | |

AutoDecide Float (DecideOrd Float) | |

AutoDecide Int (DecideOrd Int) | |

AutoDecide Integer (DecideOrd Integer) | |

(Ord t, Read t) => Read (DecideOrd t) | |

(Ord t, Show t) => Show (DecideOrd t) | |

Ord attr => Decider (DecideOrd attr) attr Bool | |

Integral a => AutoDecide (Ratio a) (DecideOrd (Ratio a)) |

class AutoDecide attr decider | attr -> decider whereSource

`AutoDecide`

is used to generate possible splits based on actual
attributes, in a straightforward fashion. Think of AutoDecide as a
default implementation for `Decider`

generation.

autoDeciders :: [attr] -> [decider]Source

AutoDecide Char (DecideSet Char) | |

AutoDecide Double (DecideOrd Double) | |

AutoDecide Float (DecideOrd Float) | |

AutoDecide Int (DecideOrd Int) | |

AutoDecide Integer (DecideOrd Integer) | |

AutoDecide [Char] (DecideSet [Char]) | |

Integral a => AutoDecide (Ratio a) (DecideOrd (Ratio a)) | |

(AutoDecide a xa, AutoDecide b xb) => AutoDecide (a, b) (Either xa xb) |

# Composing `Decider`

s

Though `autoDeciders`

may provide good results, deciders
may be out there that do not get generated. These
functions let deviate from autoDeciders and compose very
specific deciders.

genOrds :: Ord attr => [attr] -> [DecideOrd attr]Source

Decider generator implementation for any ordered data; considers all sensible `(<= pivot)`

s.

genOrdsAvg :: Ord attr => (attr -> attr -> attr) -> [attr] -> [DecideOrd attr]Source

Decider generator for any ordered data; considers all possible `(<= pivot)`

s.

genPair :: DeciderGenerator attra decidera -> DeciderGenerator attrb deciderb -> DeciderGenerator (attra, attrb) (Either decidera deciderb)Source

avgF :: Fractional a => a -> a -> aSource

`avgF a b = (a+b) / 2`

, to be used with genOrdsAvg