Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Synopsis
- type Term = Text
- query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key]
- queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)])
- data ResultsFilter key
- = NoFilter
- | FilterPredicate (key -> Bool)
- | FilterBulkPredicate ([key] -> [Bool])
- queryAutosuggestPredicate :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> key -> Bool
- queryAutosuggestMatchingDocuments :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> [key]
- initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature
- data SearchEngine doc key field feature
- data SearchConfig doc key field feature = SearchConfig {
- documentKey :: doc -> key
- extractDocumentTerms :: doc -> field -> [Term]
- transformQueryTerm :: Term -> field -> Term
- documentFeatureValue :: doc -> feature -> Float
- data SearchRankParameters field feature = SearchRankParameters {
- paramK1 :: !Float
- paramB :: field -> Float
- paramFieldWeights :: field -> Float
- paramFeatureWeights :: feature -> Float
- paramFeatureFunctions :: feature -> FeatureFunction
- paramResultsetSoftLimit :: !Int
- paramResultsetHardLimit :: !Int
- paramAutosuggestPrefilterLimit :: !Int
- paramAutosuggestPostfilterLimit :: !Int
- data FeatureFunction
- data NoFeatures
- noFeatures :: NoFeatures -> a
- insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature
- insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature
- deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature
- queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)]
- data Explanation field feature term = Explanation {
- overallScore :: Float
- termScores :: [(term, Float)]
- nonTermScores :: [(feature, Float)]
- termFieldScores :: [(term, [(field, Float)])]
- setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature
- invariant :: (Ord key, Ix field, Bounded field) => SearchEngine doc key field feature -> Bool
Basic interface
Querying
query :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [key] Source #
Execute a normal query. Find the documents in which one or more of the search terms appear and return them in ranked order.
The number of documents returned is limited by the paramResultsetSoftLimit
and paramResultsetHardLimit
paramaters. This also limits the cost of the
query (which is primarily the cost of scoring each document).
The given terms are all assumed to be complete (as opposed to prefixes
like with queryAutosuggest
).
Query auto-completion / auto-suggestion
queryAutosuggest :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> ResultsFilter key -> [Term] -> Term -> ([(Term, Float)], [(key, Float)]) Source #
Execute an "auto-suggest" query. This is where one of the search terms is an incomplete prefix and we are looking for possible completions of that search term, and result documents to go with the possible completions.
An auto-suggest query only gives useful results when the SearchEngine
is
configured to use a non-term feature score. That is, when we can give
documents an importance score independent of what terms we are looking for.
This is because an auto-suggest query is backwards from a normal query: we
are asking for relevant terms occurring in important or popular documents
so we need some notion of important or popular. Without this we would just
be ranking based on term frequency which while it makes sense for normal
"forward" queries is pretty meaningless for auto-suggest "reverse"
queries. Indeed for single-term auto-suggest queries the ranking function
we use will assign 0 for all documents and completions if there is no
non-term feature scores.
data ResultsFilter key Source #
In some applications it is necessary to enforce some security or visibility rule about the query results (e.g. in a typical DB-based application different users can see different data items). Typically it would be too expensive to build different search indexes for the different contexts and so the strategy is to use one index containing everything and filter for visibility in the results. This means the filter condition is different for different queries (e.g. performed on behalf of different users).
Filtering the results after a query is possible but not the most efficient thing to do because we've had to score all the not-visible documents. The better thing to do is to filter as part of the query, this way we can filter before the expensive scoring.
We provide one further optimisation: bulk predicates. In some applications it can be quicker to check the security/visibility of a whole bunch of results all in one go.
NoFilter | |
FilterPredicate (key -> Bool) | |
FilterBulkPredicate ([key] -> [Bool]) |
queryAutosuggestPredicate :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> key -> Bool Source #
Given an incomplete prefix query, return a predicate that indicates whether
a key is in the set of documents that match possible completions of that
query. This is equivalent to calling queryAutosuggestMatchingDocuments
and
testing whether the key is in the list, but should be more efficient.
This does not apply the pre-filter or post-filter limits.
queryAutosuggestMatchingDocuments :: (Ix field, Bounded field, Ord key) => SearchEngine doc key field feature -> [Term] -> Term -> [key] Source #
Given an incomplete prefix query, find the set of documents that match
possible completions of that query. This should be less computationally
expensive than queryAutosuggest
as it does not do any ranking of documents.
However, it does not apply the pre-filter or post-filter limits, and the list
may be large when the query terms occur in many documents. The order of
returned keys is unspecified.
Making a search engine instance
initSearchEngine :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchConfig doc key field feature -> SearchRankParameters field feature -> SearchEngine doc key field feature Source #
data SearchEngine doc key field feature Source #
data SearchConfig doc key field feature Source #
SearchConfig | |
|
data SearchRankParameters field feature Source #
SearchRankParameters | |
|
data FeatureFunction Source #
LogarithmicFunction Float | log (lambda_i + f_i) |
RationalFunction Float | f_i / (lambda_i + f_i) |
SigmoidFunction Float Float | 1 / (lambda + exp(-(lambda' * f_i)) |
Helper type for non-term features
data NoFeatures Source #
Instances
noFeatures :: NoFeatures -> a Source #
Managing documents to be searched
insertDoc :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => doc -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #
insertDocs :: (Ord key, Ix field, Bounded field, Ix feature, Bounded feature) => [doc] -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #
deleteDoc :: (Ord key, Ix field, Bounded field) => key -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #
Explain mode for query result rankings
queryExplain :: (Ix field, Bounded field, Ix feature, Bounded feature) => SearchEngine doc key field feature -> [Term] -> [(Explanation field feature Term, key)] Source #
data Explanation field feature term Source #
A breakdown of the BM25F score, to explain somewhat how it relates to the inputs, and so you can compare the scores of different documents.
Explanation | |
|
Instances
Functor (Explanation field feature) Source # | |
Defined in Data.SearchEngine.BM25F fmap :: (a -> b) -> Explanation field feature a -> Explanation field feature b # (<$) :: a -> Explanation field feature b -> Explanation field feature a # | |
(Show term, Show feature, Show field) => Show (Explanation field feature term) Source # | |
Defined in Data.SearchEngine.BM25F showsPrec :: Int -> Explanation field feature term -> ShowS # show :: Explanation field feature term -> String # showList :: [Explanation field feature term] -> ShowS # |
setRankParams :: SearchRankParameters field feature -> SearchEngine doc key field feature -> SearchEngine doc key field feature Source #