Condor-0.3: Information retrieval library

Portabilityportable
Stabilityalpha
MaintainerKrzysztof Langner <klangner@gmail.com>
Safe HaskellSafe-Inferred

Condor.Search.Index

Description

Memory based index. This module contains functions which create, update and search index. Default implementation uses algorithms for english language (stemming, stop words etc.)

Functions in this module (for performance reasons) are based on unicode strings from module Data.Text.

Basic usage:

 import Condor.Search.Index (addDocument, search)
 import Condor.Commons.Document (docFromStrings)

 let idx = addDocument emptyIndex $ docFromStrings "My document 1" "This is a document content."
 search idx "content"
 ["My document 1"]

Synopsis

Documentation

data Index Source

Inverted index

Instances

Binary Index

An instance of Binary to encode and decode an IndexParams in binary

type Term = TextSource

Single term. Could be normalized word

addDocument :: Document -> Index -> IndexSource

Add document to the index. This function uses algorithms for english language to split document content into index terms.

addDocTerms :: DocName -> [Term] -> Index -> IndexSource

Add document to the index. This function should be used if document content should be splitted into terms with custom algorithms.

emptyIndex :: IndexSource

Create empty index. This index will be configured for english language.

docCount :: Index -> IntSource

Get the number of documents in the index

search :: Index -> Text -> [DocName]Source

Search terms given as single string in the index This function uses algorithms for english language to split query into tokens.

searchTerms :: Index -> [Term] -> [DocName]Source

Search terms given as array in the index. This function should be used if query should be splitted into terms with custom algorithms

termCount :: Index -> IntSource

Get the number of terms in the index