full-text-search-0.2.2.2: In-memory full text search engine
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.SearchEngine

Synopsis

Basic interface

Querying

type Term = Text Source #

Terms are short strings, usually whole words.

query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key] Source #

Execute a normal query. Find the documents in which one or more of the search terms appear and return them in ranked order.

The number of documents returned is limited by the paramResultsetSoftLimit and paramResultsetHardLimit paramaters. This also limits the cost of the query (which is primarily the cost of scoring each document).

The given terms are all assumed to be complete (as opposed to prefixes like with queryAutosuggest).

Query auto-completion / auto-suggestion

queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)]) Source #

Execute an "auto-suggest" query. This is where one of the search terms is an incomplete prefix and we are looking for possible completions of that search term, and result documents to go with the possible completions.

An auto-suggest query only gives useful results when the SearchEngine is configured to use a non-term feature score. That is, when we can give documents an importance score independent of what terms we are looking for. This is because an auto-suggest query is backwards from a normal query: we are asking for relevant terms occurring in important or popular documents so we need some notion of important or popular. Without this we would just be ranking based on term frequency which while it makes sense for normal "forward" queries is pretty meaningless for auto-suggest "reverse" queries. Indeed for single-term auto-suggest queries the ranking function we use will assign 0 for all documents and completions if there is no non-term feature scores.

data ResultsFilter key Source #

In some applications it is necessary to enforce some security or visibility rule about the query results (e.g. in a typical DB-based application different users can see different data items). Typically it would be too expensive to build different search indexes for the different contexts and so the strategy is to use one index containing everything and filter for visibility in the results. This means the filter condition is different for different queries (e.g. performed on behalf of different users).

Filtering the results after a query is possible but not the most efficient thing to do because we've had to score all the not-visible documents. The better thing to do is to filter as part of the query, this way we can filter before the expensive scoring.

We provide one further optimisation: bulk predicates. In some applications it can be quicker to check the security/visibility of a whole bunch of results all in one go.

Constructors

NoFilter 
FilterPredicate (key -> Bool) 
FilterBulkPredicate ([key] -> [Bool]) 

queryAutosuggestPredicate :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> key -> Bool Source #

Given an incomplete prefix query, return a predicate that indicates whether a key is in the set of documents that match possible completions of that query. This is equivalent to calling queryAutosuggestMatchingDocuments and testing whether the key is in the list, but should be more efficient.

This does not apply the pre-filter or post-filter limits.

queryAutosuggestMatchingDocuments :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> [key] Source #

Given an incomplete prefix query, find the set of documents that match possible completions of that query. This should be less computationally expensive than queryAutosuggest as it does not do any ranking of documents. However, it does not apply the pre-filter or post-filter limits, and the list may be large when the query terms occur in many documents. The order of returned keys is unspecified.

Making a search engine instance

initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature Source #

data SearchEngine doc key field feature Source #

data SearchConfig doc key field feature Source #

Constructors

SearchConfig 

Fields

data FeatureFunction Source #

Constructors

LogarithmicFunction Float
log (lambda_i + f_i)
RationalFunction Float
f_i / (lambda_i + f_i)
SigmoidFunction Float Float
1 / (lambda + exp(-(lambda' * f_i))

Helper type for non-term features

Managing documents to be searched

insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Explain mode for query result rankings

queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)] Source #

data Explanation field feature term Source #

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

Explanation 

Fields

  • overallScore :: Float

    The overall score is the sum of the termScores, positionScore and nonTermScore

  • termScores :: [(term, Float)]

    There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).

  • nonTermScores :: [(feature, Float)]

    The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.

  • termFieldScores :: [(term, [(field, Float)])]

    This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.

    However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

Instances details
Functor (Explanation field feature) Source # 
Instance details

Defined in Data.SearchEngine.BM25F

Methods

fmap :: (a -> b) -> Explanation field feature a -> Explanation field feature b #

(<$) :: a -> Explanation field feature b -> Explanation field feature a #

(Show term, Show feature, Show field) => Show (Explanation field feature term) Source # 
Instance details

Defined in Data.SearchEngine.BM25F

Methods

showsPrec :: Int -> Explanation field feature term -> ShowS #

show :: Explanation field feature term -> String #

showList :: [Explanation field feature term] -> ShowS #

setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Internal sanity check

invariant :: (Ord key, Ix field, Bounded field) => SearchEngine doc key field feature -> Bool Source #