full-text-search-0.2.0.0: In-memory full text search engine

Safe HaskellSafe-Inferred

Data.SearchEngine.BM25F

Contents

Description

An implementation of BM25F ranking. See:

Synopsis

The ranking function

score :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> FloatSource

The BM25F score for a document for a given set of terms.

data Context term field feature Source

Constructors

Context 

Fields

numDocsTotal :: !Int
 
avgFieldLength :: field -> Float
 
numDocsWithTerm :: term -> Int
 
paramK1 :: !Float
 
paramB :: field -> Float
 
fieldWeight :: field -> Float
 
featureWeight :: feature -> Float
 
featureFunction :: feature -> FeatureFunction
 

data FeatureFunction Source

Constructors

LogarithmicFunction Float
log (lambda_i + f_i)
RationalFunction Float
f_i / (lambda_i + f_i)
SigmoidFunction Float Float
1 / (lambda + exp(-(lambda' * f_i))

data Doc term field feature Source

Constructors

Doc 

Fields

docFieldLength :: field -> Int
 
docFieldTermFrequency :: field -> term -> Int
 
docFeatureValue :: feature -> Float
 

Explaining the score

data Explanation field feature term Source

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

Explanation 

Fields

overallScore :: Float

The overall score is the sum of the termScores, positionScore and nonTermScore

termScores :: [(term, Float)]

There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).

nonTermScores :: [(feature, Float)]

The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.

termFieldScores :: [(term, [(field, Float)])]

This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.

However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

Functor (Explanation field feature) 
(Show field, Show feature, Show term) => Show (Explanation field feature term) 

explain :: (Ix field, Bounded field, Ix feature, Bounded feature) => Context term field feature -> Doc term field feature -> [term] -> Explanation field feature termSource