- module Data.HInduce.Classifier
- module Data.HInduce.Classifier.DecisionTree
- module Data.List.HIUtils
- module Text.Layout
- module Data.Convertible
- readCSV :: [Char] -> IO (Either ParseError CSV)
- data Iris = Iris {}
- data IrisClass
- = Setosa
- | Versicolor
- | Virginica
- irisAttrs :: Iris -> [Double]
- irisAttrs' :: Iris -> ((Double, Double), (Double, Double))
- readIris :: IO [Iris]
- iris :: [Iris]
Re-exports
module Data.HInduce.Classifier
module Data.List.HIUtils
module Text.Layout
module Data.Convertible
Helpers (TODO move to module)
Iris data set
Taken from the UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/Iris
Let's build a decision tree and try it:
>>>
let model = buildDTree (genMany autoDeciders) irisAttrs irisClass iris
>>>
classify model [5,4,2,1]
Setosa>>>
iris !! 10
Iris {sepalLength = 5.4, sepalWidth = 3.7, petalLength = 1.5, petalWidth = 0.2, irisClass = Setosa}
Seems good! But can we really know that? Let's train and test on separate data
>>>
let model' = buildDTree (genMany autoDeciders) irisAttrs irisClass (oddIx iris)
>>>
dt $ confusion' model' (map (irisAttrs &&& irisClass) $ evenIx iris)
Table: Confusion Matrix ||-->Actual Predicted\/ Setosa Versicolor Virginica Setosa 0.3333333333333333 Versicolor 0.30666666666666664 4.0e-2 Virginica 2.666666666666667e-2 0.29333333333333333
Now we see that even though not the whole data set was available when the model was induced, only few misclassifications occur.
Iris | |
|