nlp-scores-0.5.2: Scoring functions commonly used for evaluation in NLP and IR

NLP.Scores

Contents

Description

Scoring functions commonly used for evaluation of NLP systems. Most functions in this module work on sequences which are instances of `Foldable`, but some take a precomputed table of `Counts`. This will give a speedup if you want to compute multiple scores on the same data. For example to compute the Mutual Information, Variation of Information and the Adjusted Rand Index on the same pair of clusterings:

````>>> ````let cs = counts "abcabc" "abaaba"
````>>> ````mapM_ (print . (\$ cs)) [mi, ari, vi]
````>>> ````0.9182958340544894
````>>> ````0.4444444444444445
````>>> ````0.6666666666666663
``````

Synopsis

# Scores for classification and ranking

accuracy :: (Eq a, Fractional c, Traversable t, Foldable s) => t a -> s a -> cSource

Accuracy: the proportion of elements in the first sequence equal to elements at corresponding positions in second sequence. Sequences should be of equal lengths.

recipRank :: (Eq a, Fractional b, Foldable t) => a -> t a -> bSource

Reciprocal rank: the reciprocal of the rank at which the first arguments occurs in the sequence given as the second argument.

# Scores for clustering

ari :: (Ord a, Ord b) => Counts a b -> DoubleSource

mi :: (Ord a, Ord b) => Counts a b -> DoubleSource

Mutual information: MI(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X). Also known as information gain.

vi :: (Ord a, Ord b) => Counts a b -> DoubleSource

Variation of information: VI(X,Y) = H(X) + H(Y) - 2 MI(X,Y)

# Auxiliary types and functions

type Count = DoubleSource

A count

data Counts a b Source

Count table

Instances

 (Ord a, Ord b) => Monoid (Counts a b)

counts :: (Ord a, Ord b, Traversable t, Foldable s) => t a -> s b -> Counts a bSource

Creates count table `Counts`

sum :: (Foldable t, Num a) => t a -> aSource

The sum of a sequence of numbers

mean :: (Foldable t, Fractional n, Real a) => t a -> nSource

The mean of a sequence of numbers.

jaccard :: (Fractional n, Ord a) => Set a -> Set a -> nSource

Jaccard coefficient J(A,B) = |AB| / |A union B|

entropy :: (Floating c, Foldable t) => t c -> cSource

Entropy: H(X) = -SUM_i P(X=i) log_2(P(X=i)). `entropy xs` is the entropy of the random variable represented by the sequence `xs`, where each element of `xs` is the count of the one particular value the random variable can take. If you need to compute the entropy from a sequence of outcomes, the following will work:

``` entropy . elems . histogram
```

histogram :: (Num a, Ord k, Foldable t) => t k -> Map k aSource

`histogram xs` is returns the map of the frequency counts of the elements in sequence `xs`

# Extracting joint and marginal counts from `Counts`

countJoint :: (Ord a, Ord b) => a -> b -> Counts a b -> CountSource

Joint count

countFst :: Ord k => k -> Counts k b -> CountSource

Count of first element

countSnd :: Ord k => k -> Counts a k -> CountSource

Count of second element

# Extracting lists of values from `Counts`

fstElems :: Counts k b -> [k]Source

List of values of first element

sndElems :: Counts a k -> [k]Source

List of values of second element