Safe Haskell	Safe-Inferred

Data.SearchEngine.BM25F

Contents

The ranking function
Explaining the score

Description

An implementation of BM25F ranking. See:

A quick overview: http://en.wikipedia.org/wiki/Okapi_BM25
The Probabilistic Relevance Framework: BM25 and Beyond http://www.soi.city.ac.uk/~ser/papers/foundations_bm25_review.pdf
An Introduction to Information Retrieval http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

Synopsis

The ranking function

score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Float Source

The BM25F score for a document for a given set of terms.

data Context term field feature Source

Constructors

Context
Fields numDocsTotal :: !Int avgFieldLength :: field -> Float numDocsWithTerm :: term -> Int paramK1 :: !Float paramB :: field -> Float fieldWeight :: field -> Float featureWeight :: feature -> Float featureFunction :: feature -> FeatureFunction

data FeatureFunction Source

Constructors

LogarithmicFunction Float	log (lambda_i + f_i)
RationalFunction Float	f_i / (lambda_i + f_i)
SigmoidFunction Float Float	1 / (lambda + exp(-(lambda' * f_i))

data Doc term field feature Source

Constructors

Doc
Fields docFieldLength :: field -> Int docFieldTermFrequency :: field -> term -> Int docFeatureValue :: feature -> Float

Explaining the score

data Explanation field feature term Source

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

Explanation

Fields

overallScore :: Float

The overall score is the sum of the termScores, positionScore and nonTermScore

termScores :: [(term, Float)]

There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).

nonTermScores :: [(feature, Float)]

The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.

termFieldScores :: [(term, [(field, Float)])]

This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.

However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

Functor (Explanation field feature)
(Show field, Show feature, Show term) => Show (Explanation field feature term)

explain :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Explanation field feature termSource