Scoring functions commonly used for evaluation of NLP
systems. Most functions in this module work on sequences which are
Foldable, but some take a precomputed table of
Counts. This will give a speedup if you want to compute multiple
scores on the same data. For example to compute the Mutual
Information, Variation of Information and the Adjusted Rand Index
on the same pair of clusterings:
let cs = counts $ zip "abcabc" "abaaba"
mapM_ (print . ($ cs)) [mi, ari, vi]
- accuracy :: (Eq a, Fractional c, Foldable t) => t a -> t a -> c
- recipRank :: (Eq a, Fractional b, Foldable t) => a -> t a -> b
- avgPrecision :: (Fractional n, Ord a, Foldable t) => Set a -> t a -> n
- ari :: (Ord a, Ord b) => Counts a b -> Double
- mi :: (Ord a, Ord b) => Counts a b -> Double
- vi :: (Ord a, Ord b) => Counts a b -> Double
- type Count = Double
- data Counts a b
- counts :: (Ord a, Ord b, Foldable t) => t (a, b) -> Counts a b
- sum :: (Foldable t, Num a) => t a -> a
- mean :: (Foldable t, Fractional n, Real a) => t a -> n
- jaccard :: (Fractional n, Ord a) => Set a -> Set a -> n
- entropy :: (Floating c, Foldable t) => t c -> c
Scores for classification and ranking
Accuracy: the proportion of elements in the first sequence equal to elements at corresponding positions in second sequence. Sequences should be of equal lengths.
Reciprocal rank: the reciprocal of the rank at which the first arguments occurs in the sequence given as the second argument.
Average precision. http://en.wikipedia.org/wiki/Information_retrieval#Average_precision
Scores for clustering
Adjusted Rand Index: http://en.wikipedia.org/wiki/Rand_index
Mutual information: MI(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X). Also known as information gain.
Variation of information: VI(X,Y) = H(X) + H(Y) - 2 MI(X,Y)
Auxiliary types and functions
Jaccard coefficient J(A,B) = |AB| / |A union B|