Safe Haskell	None
Language	Haskell2010

NLP.ML.AvgPerceptron

Description

Average Perceptron implementation of Part of speech tagging, adapted for Haskell from this python implementation, which is described on the blog post:

http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/

The Perceptron code can be found on github:

https://github.com/sloria/TextBlob/blob/dev/text/_perceptron.py

Synopsis

Documentation

data Perceptron Source

The perceptron model.

Constructors

Perceptron

Fields

weights :: Map Feature (Map Class Weight): Each feature gets its own weight vector, so weights is a dict-of-dicts
totals :: Map (Feature, Class) Weight: The accumulated values, for the averaging. These will be keyed by feature/clas tuples
tstamps :: Map (Feature, Class) Int: The last time the feature was changed, for the averaging. Also keyed by feature/clas tuples (tstamps is short for timestamps)
instances :: Int: Number of instances seen

Instances

Eq Perceptron
Read Perceptron
Show Perceptron
Generic Perceptron
Serialize Perceptron
NFData Perceptron
type Rep Perceptron

newtype Class Source

The classes that the perceptron assigns are represnted with a newtype-wrapped String.

Eventually, I think this should become a typeclass, so the classes can be defined by the users of the Perceptron (such as custom POS tag ADTs, or more complex classes).

Constructors

Class String

Instances

Eq Class
Ord Class
Read Class
Show Class
Generic Class
Serialize Class
type Rep Class

type Weight = Double Source

Typedef for doubles to make the code easier to read, and to make this simple to change if necessary.

newtype Feature Source

Constructors

Feat Text

Instances

Eq Feature
Ord Feature
Read Feature
Show Feature
Generic Feature
Serialize Feature
type Rep Feature

emptyPerceptron :: Perceptron Source

An empty perceptron, used to start training.

predict :: Perceptron -> Map Feature Int -> Maybe Class Source

Predict a class given a feature vector.

Ported from python:

def predict(self, features):
    '''Dot-product the features and current weights and return the best label.'''
    scores = defaultdict(float)
    for feat, value in features.items():
        if feat not in self.weights or value == 0:
            continue
        weights = self.weights[feat]
        for label, weight in weights.items():
            scores[label] += value * weight
    # Do a secondary alphabetic sort, for stability
    return max(self.classes, key=lambda label: (scores[label], label))

train :: Int -> Perceptron -> [(Map Feature Int, Class)] -> IO Perceptron Source

update :: Perceptron -> Class -> Class -> [Feature] -> Perceptron Source

Update the perceptron with a new example.

update(self, truth, guess, features)
   ...
        self.i += 1
        if truth == guess:
            return None
        for f in features:
            weights = self.weights.setdefault(f, {}) -- setdefault is Map.findWithDefault, and destructive.
            upd_feat(truth, f, weights.get(truth, 0.0), 1.0)
            upd_feat(guess, f, weights.get(guess, 0.0), -1.0)
        return None

averageWeights :: Perceptron -> Perceptron Source

Average the weights

Ported from Python:

def average_weights(self):
    for feat, weights in self.weights.items():
        new_feat_weights = {}
        for clas, weight in weights.items():
            param = (feat, clas)
            total = self._totals[param]
            total += (self.i - self._tstamps[param]) * weight
            averaged = round(total / float(self.i), 3)
            if averaged:
                new_feat_weights[clas] = averaged
        self.weights[feat] = new_feat_weights
    return None