full-text-search-0.2.1.4: In-memory full text search engine

Data.SearchEngine

Synopsis

Basic interface

Querying

type Term = Text Source #

Terms are short strings, usually whole words.

query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key] Source #

Execute a normal query. Find the documents in which one or more of the search terms appear and return them in ranked order.

The number of documents returned is limited by the paramResultsetSoftLimit and paramResultsetHardLimit paramaters. This also limits the cost of the query (which is primarily the cost of scoring each document).

The given terms are all assumed to be complete (as opposed to prefixes like with queryAutosuggest).

Query auto-completion / auto-suggestion

queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)]) Source #

Execute an "auto-suggest" query. This is where one of the search terms is an incomplete prefix and we are looking for possible completions of that search term, and result documents to go with the possible completions.

An auto-suggest query only gives useful results when the SearchEngine is configured to use a non-term feature score. That is, when we can give documents an importance score independent of what terms we are looking for. This is because an auto-suggest query is backwards from a normal query: we are asking for relevant terms occurring in important or popular documents so we need some notion of important or popular. Without this we would just be ranking based on term frequency which while it makes sense for normal "forward" queries is pretty meaningless for auto-suggest "reverse" queries. Indeed for single-term auto-suggest queries the ranking function we use will assign 0 for all documents and completions if there is no non-term feature scores.

data ResultsFilter key Source #

In some applications it is necessary to enforce some security or visibility rule about the query results (e.g. in a typical DB-based application different users can see different data items). Typically it would be too expensive to build different search indexes for the different contexts and so the strategy is to use one index containing everything and filter for visibility in the results. This means the filter condition is different for different queries (e.g. performed on behalf of different users).

Filtering the results after a query is possible but not the most efficient thing to do because we've had to score all the not-visible documents. The better thing to do is to filter as part of the query, this way we can filter before the expensive scoring.

We provide one further optimisation: bulk predicates. In some applications it can be quicker to check the security/visibility of a whole bunch of results all in one go.

Constructors

 NoFilter FilterPredicate (key -> Bool) FilterBulkPredicate ([key] -> [Bool])

Making a search engine instance

initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature Source #

data SearchEngine doc key field feature Source #

data SearchConfig doc key field feature Source #

Constructors

 SearchConfig FieldsdocumentKey :: doc -> key extractDocumentTerms :: doc -> field -> [Term] transformQueryTerm :: Term -> field -> Term documentFeatureValue :: doc -> feature -> Float

data SearchRankParameters field feature Source #

Constructors

 SearchRankParameters FieldsparamK1 :: !Float paramB :: field -> Float paramFieldWeights :: field -> Float paramFeatureWeights :: feature -> Float paramFeatureFunctions :: feature -> FeatureFunction paramResultsetSoftLimit :: !Int paramResultsetHardLimit :: !Int paramAutosuggestPrefilterLimit :: !Int paramAutosuggestPostfilterLimit :: !Int

Constructors

 LogarithmicFunction Float log (lambda_i + f_i) RationalFunction Float f_i / (lambda_i + f_i) SigmoidFunction Float Float 1 / (lambda + exp(-(lambda' * f_i))

Helper type for non-term features

Instances

 Source # Methods Source # Methods Source # Methods Source # MethodsshowList :: [NoFeatures] -> ShowS # Source # Methodsrange :: (NoFeatures, NoFeatures) -> [NoFeatures] #index :: (NoFeatures, NoFeatures) -> NoFeatures -> Int #unsafeIndex :: (NoFeatures, NoFeatures) -> NoFeatures -> IntinRange :: (NoFeatures, NoFeatures) -> NoFeatures -> Bool #rangeSize :: (NoFeatures, NoFeatures) -> Int #

Managing documents to be searched

insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Explain mode for query result rankings

queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)] Source #

data Explanation field feature term Source #

A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.

Constructors

 Explanation FieldsoverallScore :: FloatThe overall score is the sum of the termScores, positionScore and nonTermScoretermScores :: [(term, Float)]There is a score contribution from each query term. This is the score for the term across all fields in the document (but see termFieldScores).nonTermScores :: [(feature, Float)]The document can have an inate bonus score independent of the terms in the query. For example this might be a popularity score.termFieldScores :: [(term, [(field, Float)])]This does not contribute to the overallScore. It is an indication of how the termScores relates to per-field scores. Note however that the term score for all fields is not simply sum of the per-field scores. The point of the BM25F scoring function is that a linear combination of per-field scores is wrong, and BM25F does a more cunning non-linear combination.However, it is still useful as an indication to see scores for each field for a term, to see how the compare.

Instances

 Functor (Explanation field feature) Source # Methodsfmap :: (a -> b) -> Explanation field feature a -> Explanation field feature b #(<\$) :: a -> Explanation field feature b -> Explanation field feature a # (Show field, Show feature, Show term) => Show (Explanation field feature term) Source # MethodsshowsPrec :: Int -> Explanation field feature term -> ShowS #show :: Explanation field feature term -> String #showList :: [Explanation field feature term] -> ShowS #

setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #

Internal sanity check

invariant :: (Ord key, Ix field, Bounded field) => SearchEngine doc key field feature -> Bool Source #