DecisionTree-0.0: A very simple implementation of decision trees for discrete attributes. (internal documentation)ContentsIndex
Data.DecisionTree
Description

This module provides a very simple implementation of a decisiontree. It is "optimized" for readability, not so much for performance. I doubt it can be used for real (=huge) datasets, but it should be ok for a couple of hundred (thousand?) items.

You are encouraged to have a look at the source

It is build (for now) using the ID3 algorithm (or at least something closely resembling that). That means the attributes you choose must have a finite set of possible values.

Synopsis
type PreLabeled a b = (b, Datum a)
data DecisionTree a b
= Leaf b
| Node {
att :: Attribute a
child :: a -> DecisionTree a b
}
data Attribute a = A {
aName :: String
possibleValues :: [a]
}
data Datum a = D {
dName :: String
attributes :: [(Attribute a, a)]
}
showTree :: (Show a, Show b) => DecisionTree a b -> ShowS
build :: (Ord a, Ord b) => [Attribute a] -> [PreLabeled a b] -> DecisionTree a b
getValue :: Datum a -> Attribute a -> a
label :: PreLabeled a b -> b
decide :: Eq a => DecisionTree a b -> Datum a -> b
partition :: Ord a => [PreLabeled a b] -> Attribute a -> Map a [PreLabeled a b]
entropy :: Ord b => [b] -> Double
groupLabels :: Ord b => [b] -> Map b Int
information :: (Ord b, Ord a) => [PreLabeled a b] -> Attribute a -> Double
bestAttribute :: (Ord b, Ord a) => [PreLabeled a b] -> [Attribute a] -> (Double, Attribute a)
groupWith :: Ord k => (a -> k) -> (a -> v) -> (v -> v -> v) -> [a] -> Map k v
Documentation
type PreLabeled a b = (b, Datum a)
data DecisionTree a b
The type for our DecisionTree
Constructors
Leaf bLeafs have labels
Node
att :: Attribute aa node asks for this attribute
child :: a -> DecisionTree a band has children which can be found with a value of the attribute
show/hide Instances
(Show a, Show b) => Show (DecisionTree a b)
data Attribute a
A Datum has Attributes
Constructors
A
aName :: StringAttributes have a name
possibleValues :: [a]and a set of possible values
show/hide Instances
data Datum a
Things we want to find labels for
Constructors
D
dName :: StringThey have names
attributes :: [(Attribute a, a)]and attributes
show/hide Instances
Show a => Show (Datum a)
showTree :: (Show a, Show b) => DecisionTree a b -> ShowS
build :: (Ord a, Ord b) => [Attribute a] -> [PreLabeled a b] -> DecisionTree a b
Build a DecisionTree from the given Trainingset
getValue :: Datum a -> Attribute a -> a
Which value does this Datum have for the given Attribute?
label :: PreLabeled a b -> b
Extract a label
decide :: Eq a => DecisionTree a b -> Datum a -> b
Decide which label belongs to this Datum
partition :: Ord a => [PreLabeled a b] -> Attribute a -> Map a [PreLabeled a b]
Partitions the Dataset according to the possible values of the attribute
entropy :: Ord b => [b] -> Double

Computes the entropy of a Dataset

the Entropy is defined as: sum (p_i * log_2 p_i) where p_i = |{ x | x has Label i}|/|Dataset|

groupLabels :: Ord b => [b] -> Map b Int
information
:: (Ord b, Ord a)
=> [PreLabeled a b]the data
-> Attribute athe Attribute
-> Doublethe Information

How much information does this Attribute give us for the given Dataset it is defined as

entropy(set) - sum p_i * entropy {dat_i | dat has value i for attribute a}

bestAttribute :: (Ord b, Ord a) => [PreLabeled a b] -> [Attribute a] -> (Double, Attribute a)
Return the attribute which gives us greatest gain in information
groupWith
:: Ord k
=> a -> khow to extract a key from a Datum
-> a -> vhow to make a Datum into a value for the map
-> v -> v -> vhow to fuse two values (should we have > 1 Data for this key)
-> [a]the list we want to group
-> Map k v
groups a Dataset using a Map. According to #haskell "efficient" grouping needs Ord. I agree with that
Produced by Haddock version 2.3.0