snowball-1.0.0: Bindings to the Snowball library.

PortabilityHaskell2010
Safe HaskellTrustworthy

NLP.Snowball

Contents

Description

Bindings to the Snowball library.

Synopsis

Pure interface

data Algorithm Source

Snowball algorithm used for stemming words.

stem :: Algorithm -> Text -> TextSource

Compute the stem of a word using the specified algorithm.

>>> stem English "fantastically"
"fantast"

stems :: Algorithm -> [Text] -> [Text]Source

Compute the stems of several words in one go. This can be more efficient than map stem because it uses a single Stemmer instance, however the map version is rewritten to use this function with a rewrite rule. You can still use this function though if you want to make sure it is used or if you find it more convenient.

IO interface

data Stemmer Source

A thread and memory safe Snowball stemmer instance.

newStemmer :: Algorithm -> IO StemmerSource

Create a new reusable Stemmer instance.

stemWith :: Stemmer -> Text -> IO TextSource

Use a Stemmer to stem a word. This can be used more efficiently than stem because you can keep a stemmer around and reuse it, but it requires IO to ensure thread safety.

In my benchmarks, this (and stemsWith) is faster than stem for a few hundred words, but slower for larger number of words. I don't know if this is a problem with my benchmarks, with these bindings or with the Snowball library itself, so make sure to benchmark yourself if speed is a concern, and consider caching stems with e.g. a HashMap.

stemsWith :: Stemmer -> [Text] -> IO [Text]Source

Use a Stemmer to stem multiple words in one go. This can be more efficient than mapM stemWith because the Stemmer is only locked once.