Data.DecisionTree

Description

This module provides a very simple implementation of a decisiontree. It is "optimized" for readability, not so much for performance. I doubt it can be used for real (=huge) datasets, but it should be ok for a couple of hundred (thousand?) items.

You are encouraged to have a look at the source

It is build (for now) using the ID3 algorithm (or at least something closely resembling that). That means the attributes you choose must have a finite set of possible values.

Synopsis

type PreLabeled a b = (b, Datum a)

data DecisionTree a b

= Leaf b

| Node {

att :: Attribute a

child :: a -> DecisionTree a b

}

data Attribute a = A {

aName :: String

possibleValues :: [a]

}

data Datum a = D {

dName :: String

attributes :: [(Attribute a, a)]

}

showTree :: (Show a, Show b) => DecisionTree a b -> ShowS

build :: (Ord a, Ord b) => [Attribute a] -> [PreLabeled a b] -> DecisionTree a b

getValue :: Datum a -> Attribute a -> a

label :: PreLabeled a b -> b

decide :: Eq a => DecisionTree a b -> Datum a -> b

partition :: Ord a => [PreLabeled a b] -> Attribute a -> Map a [PreLabeled a b]

entropy :: Ord b => [b] -> Double

groupLabels :: Ord b => [b] -> Map b Int

information :: (Ord b, Ord a) => [PreLabeled a b] -> Attribute a -> Double

bestAttribute :: (Ord b, Ord a) => [PreLabeled a b] -> [Attribute a] -> (Double, Attribute a)

groupWith :: Ord k => (a -> k) -> (a -> v) -> (v -> v -> v) -> [a] -> Map k v

Documentation

type PreLabeled a b = (b, Datum a)

data DecisionTree a b

The type for our DecisionTree

Constructors

Leaf b

Leafs have labels

Node

att :: Attribute a	a node asks for this attribute
child :: a -> DecisionTree a b	and has children which can be found with a value of the attribute

Instances

(Show a, Show b) => Show (DecisionTree a b)

data Attribute a

A Datum has Attributes

Constructors

aName :: String	Attributes have a name
possibleValues :: [a]	and a set of possible values

Instances

Eq (Attribute a)

Show (Attribute a)

data Datum a

Things we want to find labels for

Constructors

dName :: String	They have names
attributes :: [(Attribute a, a)]	and attributes

Instances

Show a => Show (Datum a)

showTree :: (Show a, Show b) => DecisionTree a b -> ShowS

build :: (Ord a, Ord b) => [Attribute a] -> [PreLabeled a b] -> DecisionTree a b

Build a DecisionTree from the given Trainingset

getValue :: Datum a -> Attribute a -> a

Which value does this Datum have for the given Attribute?

label :: PreLabeled a b -> b

Extract a label

decide :: Eq a => DecisionTree a b -> Datum a -> b

Decide which label belongs to this Datum

partition :: Ord a => [PreLabeled a b] -> Attribute a -> Map a [PreLabeled a b]

Partitions the Dataset according to the possible values of the attribute

entropy :: Ord b => [b] -> Double

Computes the entropy of a Dataset

the Entropy is defined as: sum (p_i * log_2 p_i) where p_i = |{ x | x has Label i}|/|Dataset|

groupLabels :: Ord b => [b] -> Map b Int

information

:: (Ord b, Ord a)
=> [PreLabeled a b]	the data
-> Attribute a	the Attribute
-> Double	the Information
How much information does this Attribute give us for the given Dataset it is defined as entropy(set) - sum p_i * entropy {dat_i \| dat has value i for attribute a}

bestAttribute :: (Ord b, Ord a) => [PreLabeled a b] -> [Attribute a] -> (Double, Attribute a)

Return the attribute which gives us greatest gain in information

groupWith

:: Ord k
=> a -> k	how to extract a key from a Datum
-> a -> v	how to make a Datum into a value for the map
-> v -> v -> v	how to fuse two values (should we have > 1 Data for this key)
-> [a]	the list we want to group
-> Map k v
groups a Dataset using a Map. According to #haskell "efficient" grouping needs Ord. I agree with that

Produced by Haddock version 2.3.0