Safe Haskell	Safe-Infered

NLP.Scores

Contents

Scores for classification and ranking
Scores for clustering
Auxiliary types and functions

Description

Scoring functions commonly used for evaluation of NLP systems. Most functions in this module work on lists, but some take a precomputed table of Counts. This will give a speedup if you want to compute multiple scores on the same data. For example to compute the Mutual Information, Variation of Information and the Adujusted Rand Index on the same pair of clusterings:

>>> let cs = counts $ zip "abcabc" "abaaba"
>>> mapM_ (print . ($ cs)) [mi, ari, vi]

Synopsis

Scores for classification and ranking

accuracy :: (Eq a, Fractional n) => [a] -> [a] -> nSource

Accuracy: the proportion of elements in the first list equal to elements at corresponding positions in second list. Lists should be of equal lengths.

recipRank :: (Eq a, Fractional n) => a -> [a] -> nSource

Reciprocal rank: the reciprocal of the rank at which the first arguments occurs in the list given as the second argument.

avgPrecision :: (Fractional n, Ord a) => Set a -> [a] -> nSource

Average precision. http://en.wikipedia.org/wiki/Information_retrieval#Average_precision

Scores for clustering

ari :: (Ord a, Ord b) => Counts a b -> Double Source

Adjusted Rand Index: http://en.wikipedia.org/wiki/Rand_index

mi :: (Ord a, Ord b) => Counts a b -> Double Source

Mutual information: MI(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X). Also known as information gain.

vi :: (Ord a, Ord b) => Counts a b -> Double Source

Variation of information: VI(X,Y) = H(X) + H(Y) - 2 MI(X,Y)

Auxiliary types and functions

type Count = Double Source

A count

data Counts a b Source

Count table

sum :: Num a => [a] -> aSource

The sum of a list of numbers (without overflowing stack, unlike sum).

mean :: (Fractional n, Real a) => [a] -> nSource

The mean of a list of numbers.

jaccard :: (Fractional n, Ord a) => Set a -> Set a -> nSource

Jaccard coefficient J(A,B) = |AB| / |A union B|

entropy :: [Count] -> Double Source

Entropy: H(X) = -SUM_i P(X=i) log_2(P(X=i))