Safe Haskell	Safe-Inferred
Language	Haskell2010

Data.SearchEngine

Contents

Basic interface
Explain mode for query result rankings
Internal sanity check

Synopsis

type Term = Text
query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key]
queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)])
data ResultsFilter key
- = NoFilter
- | FilterPredicate (key -> Bool)
- | FilterBulkPredicate ([key] -> [Bool])
queryAutosuggestPredicate :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> key -> Bool
queryAutosuggestMatchingDocuments :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> [key]
initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature
data SearchEngine doc key field feature
data SearchConfig doc key field feature = SearchConfig {
- documentKey :: doc -> key
- extractDocumentTerms :: doc -> field -> [Term]
- transformQueryTerm :: Term -> field -> Term
- documentFeatureValue :: doc -> feature -> Float
}
data SearchRankParameters field feature = SearchRankParameters {
- paramK1 :: !Float
- paramB :: field -> Float
- paramFieldWeights :: field -> Float
- paramFeatureWeights :: feature -> Float
- paramFeatureFunctions :: feature -> FeatureFunction
- paramResultsetSoftLimit :: !Int
- paramResultsetHardLimit :: !Int
- paramAutosuggestPrefilterLimit :: !Int
- paramAutosuggestPostfilterLimit :: !Int
}
data FeatureFunction
- = LogarithmicFunction Float
- | RationalFunction Float
- | SigmoidFunction Float Float
data NoFeatures
noFeatures :: NoFeatures -> a
insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature
insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature
deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature
queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)]
data Explanation field feature term = Explanation {
- overallScore :: Float
- termScores :: [(term, Float)]
- nonTermScores :: [(feature, Float)]
- termFieldScores :: [(term, [(field, Float)])]
}
setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature
invariant :: (Ord key, Ix field, Bounded field) => SearchEngine doc key field feature -> Bool

Basic interface

Querying

type Term = Text Source #

Terms are short strings, usually whole words.

query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key] Source #

Execute a normal query. Find the documents in which one or more of the search terms appear and return them in ranked order.

The number of documents returned is limited by the paramResultsetSoftLimit and paramResultsetHardLimit paramaters. This also limits the cost of the query (which is primarily the cost of scoring each document).

The given terms are all assumed to be complete (as opposed to prefixes like with queryAutosuggest).

Query auto-completion / auto-suggestion

queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)]) Source #

Execute an "auto-suggest" query. This is where one of the search terms is an incomplete prefix and we are looking for possible completions of that search term, and result documents to go with the possible completions.

An auto-suggest query only gives useful results when the SearchEngine is configured to use a non-term feature score. That is, when we can give documents an importance score independent of what terms we are looking for. This is because an auto-suggest query is backwards from a normal query: we are asking for relevant terms occurring in important or popular documents so we need some notion of important or popular. Without this we would just be ranking based on term frequency which while it makes sense for normal "forward" queries is pretty meaningless for auto-suggest "reverse" queries. Indeed for single-term auto-suggest queries the ranking function we use will assign 0 for all documents and completions if there is no non-term feature scores.

data ResultsFilter key Source #

In some applications it is necessary to enforce some security or visibility rule about the query results (e.g. in a typical DB-based application different users can see different data items). Typically it would be too expensive to build different search indexes for the different contexts and so the strategy is to use one index containing everything and filter for visibility in the results. This means the filter condition is different for different queries (e.g. performed on behalf of different users).

Filtering the results after a query is possible but not the most efficient thing to do because we've had to score all the not-visible documents. The better thing to do is to filter as part of the query, this way we can filter before the expensive scoring.

We provide one further optimisation: bulk predicates. In some applications it can be quicker to check the security/visibility of a whole bunch of results all in one go.

Constructors

NoFilter
FilterPredicate (key -> Bool)
FilterBulkPredicate ([key] -> [Bool])

queryAutosuggestPredicate :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> key -> Bool Source #

Given an incomplete prefix query, return a predicate that indicates whether a key is in the set of documents that match possible completions of that query. This is equivalent to calling queryAutosuggestMatchingDocuments and testing whether the key is in the list, but should be more efficient.

This does not apply the pre-filter or post-filter limits.

queryAutosuggestMatchingDocuments :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> [key] Source #

Given an incomplete prefix query, find the set of documents that match possible completions of that query. This should be less computationally expensive than queryAutosuggest as it does not do any ranking of documents. However, it does not apply the pre-filter or post-filter limits, and the list may be large when the query terms occur in many documents. The order of returned keys is unspecified.

Making a search engine instance

initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature Source #

data SearchEngine doc key field feature Source #

data SearchConfig doc key field feature Source #

Constructors

SearchConfig
Fields documentKey :: doc -> key extractDocumentTerms :: doc -> field -> [Term] transformQueryTerm :: Term -> field -> Term documentFeatureValue :: doc -> feature -> Float

data SearchRankParameters field feature Source #

Constructors

SearchRankParameters
Fields paramK1 :: !Float paramB :: field -> Float paramFieldWeights :: field -> Float paramFeatureWeights :: feature -> Float paramFeatureFunctions :: feature -> FeatureFunction paramResultsetSoftLimit :: !Int paramResultsetHardLimit :: !Int paramAutosuggestPrefilterLimit :: !Int paramAutosuggestPostfilterLimit :: !Int

data FeatureFunction Source #

Constructors

LogarithmicFunction Float	log (lambda_i + f_i)
RationalFunction Float	f_i / (lambda_i + f_i)
SigmoidFunction Float Float	1 / (lambda + exp(-(lambda' * f_i))

Helper type for non-term features

data NoFeatures Source #

Instances

Instances details

Bounded NoFeatures Source #
Instance details Defined in Data.SearchEngine.Types Methods minBound :: NoFeatures # maxBound :: NoFeatures #
Ix NoFeatures Source #
Instance details Defined in Data.SearchEngine.Types Methods range :: (NoFeatures, NoFeatures) -> [NoFeatures] # index :: (NoFeatures, NoFeatures) -> NoFeatures -> Int # unsafeIndex :: (NoFeatures, NoFeatures) -> NoFeatures -> Int # inRange :: (NoFeatures, NoFeatures) -> NoFeatures -> Bool # rangeSize :: (NoFeatures, NoFeatures) -> Int # unsafeRangeSize :: (NoFeatures, NoFeatures) -> Int #
Show NoFeatures Source #
Instance details Defined in Data.SearchEngine.Types Methods showsPrec :: Int -> NoFeatures -> ShowS # show :: NoFeatures -> String # showList :: [NoFeatures] -> ShowS #
Eq NoFeatures Source #
Instance details Defined in Data.SearchEngine.Types Methods (==) :: NoFeatures -> NoFeatures -> Bool # (/=) :: NoFeatures -> NoFeatures -> Bool #
Ord NoFeatures Source #
Instance details Defined in Data.SearchEngine.Types Methods compare :: NoFeatures -> NoFeatures -> Ordering # (<) :: NoFeatures -> NoFeatures -> Bool # (<=) :: NoFeatures -> NoFeatures -> Bool # (>) :: NoFeatures -> NoFeatures -> Bool # (>=) :: NoFeatures -> NoFeatures -> Bool # max :: NoFeatures -> NoFeatures -> NoFeatures # min :: NoFeatures -> NoFeatures -> NoFeatures #

noFeatures :: NoFeatures -> a Source #

Managing documents to be searched

insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Explain mode for query result rankings

queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)] Source #

data Explanation field feature term Source #

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

Explanation

Fields

overallScore :: Float
The overall score is the sum of the termScores, positionScore and nonTermScore
termScores :: [(term, Float)]
There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).
nonTermScores :: [(feature, Float)]
The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.
termFieldScores :: [(term, [(field, Float)])]
This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.
However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

Instances details

Functor (Explanation field feature) Source #
Instance details Defined in Data.SearchEngine.BM25F Methods fmap :: (a -> b) -> Explanation field feature a -> Explanation field feature b # (<$) :: a -> Explanation field feature b -> Explanation field feature a #
(Show term, Show feature, Show field) => Show (Explanation field feature term) Source #
Instance details Defined in Data.SearchEngine.BM25F Methods showsPrec :: Int -> Explanation field feature term -> ShowS # show :: Explanation field feature term -> String # showList :: [Explanation field feature term] -> ShowS #

setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Internal sanity check

invariant :: (Ord key, Ix field, Bounded field) => SearchEngine doc key field feature -> Bool Source #