Safe Haskell	Safe-Inferred
Language	Haskell2010

Data.SearchEngine.BM25F

Contents

The ranking function
- Specialised variants
Explaining the score

Description

An implementation of BM25F ranking. See:

A quick overview: http://en.wikipedia.org/wiki/Okapi_BM25
The Probabilistic Relevance Framework: BM25 and Beyond http://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf
An Introduction to Information Retrieval http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

Synopsis

score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Float
data Context term field feature = Context {
- numDocsTotal :: !Int
- avgFieldLength :: field -> Float
- numDocsWithTerm :: term -> Int
- paramK1 :: !Float
- paramB :: field -> Float
- fieldWeight :: field -> Float
- featureWeight :: feature -> Float
- featureFunction :: feature -> FeatureFunction
}
data FeatureFunction
data Doc term field feature = Doc {
- docFieldLength :: field -> Int
- docFieldTermFrequency :: field -> term -> Int
- docFeatureValue :: feature -> Float
}
scoreTermsBulk :: forall field term feature. (Ix field, Bounded field) => Context term field feature -> Doc term field feature -> term -> (field -> Int) -> Float
data Explanation field feature term = Explanation {
- overallScore :: Float
- termScores :: [(term, Float)]
- nonTermScores :: [(feature, Float)]
- termFieldScores :: [(term, [(field, Float)])]
}
explain :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Explanation field feature term

The ranking function

score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Float Source #

The BM25F score for a document for a given set of terms.

data Context term field feature Source #

Constructors

Context
Fields numDocsTotal :: !Int avgFieldLength :: field -> Float numDocsWithTerm :: term -> Int paramK1 :: !Float paramB :: field -> Float fieldWeight :: field -> Float featureWeight :: feature -> Float featureFunction :: feature -> FeatureFunction

data FeatureFunction Source #

Constructors

LogarithmicFunction Float	log (lambda_i + f_i)
RationalFunction Float	f_i / (lambda_i + f_i)
SigmoidFunction Float Float	1 / (lambda + exp(-(lambda' * f_i))

data Doc term field feature Source #

Constructors

Doc
Fields docFieldLength :: field -> Int docFieldTermFrequency :: field -> term -> Int docFeatureValue :: feature -> Float

Specialised variants

scoreTermsBulk :: forall field term feature. (Ix field, Bounded field) => Context term field feature -> Doc term field feature -> term -> (field -> Int) -> Float Source #

Most of the time we want to score several different documents for the same set of terms, but sometimes we want to score one document for many terms and in that case we can save a bit of work by doing it in bulk. It lets us calculate once and share things that depend only on the document, and not the term.

To take advantage of the sharing you must partially apply and name the per-doc score functon, e.g.

let score :: term -> (field -> Int) -> Float
    score = BM25.bulkScorer ctx doc
 in sum [ score t (\f -> counts ! (t, f)) | t <- ts ]

Explaining the score

data Explanation field feature term Source #

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

Explanation

Fields

overallScore :: Float
The overall score is the sum of the termScores, positionScore and nonTermScore
termScores :: [(term, Float)]
There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).
nonTermScores :: [(feature, Float)]
The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.
termFieldScores :: [(term, [(field, Float)])]
This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.
However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

Instances details

Functor (Explanation field feature) Source #
Instance details Defined in Data.SearchEngine.BM25F Methods fmap :: (a -> b) -> Explanation field feature a -> Explanation field feature b # (<$) :: a -> Explanation field feature b -> Explanation field feature a #
(Show term, Show feature, Show field) => Show (Explanation field feature term) Source #
Instance details Defined in Data.SearchEngine.BM25F Methods showsPrec :: Int -> Explanation field feature term -> ShowS # show :: Explanation field feature term -> String # showList :: [Explanation field feature term] -> ShowS #

explain :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Explanation field feature term Source #