Say we want to estimate a conditional distribution based on a very large set of observed data.
Naively, we could just collect all the data and estimate a large table, but
our table would have little or no counts for a feasible future observations.
In practice, we use smoothing to supplement rare contexts with data from similar, more often seen contexts. For instance,
using bigram probabilities when the given trigrams observations are too sparse.
Most of these smoothing techniques are special cases of general linear interpolation, which chooses the weight of
each level of smoothing based on the sparsity of the current context.
In this module, we give an implementation of this process that separates out count collection
from the smoothing model, using a Trie. The user specifies a Context instance that relates the full conditional context
to a sequences of SubContexts that characterize the levels of smoothing and the transitions in the Trie. We also give a small set of smoothing techniques
to combine these levels.
This work is based on Chapter 6 of ''Foundations of Statistical Natural Language Processing''
by Chris Manning and Hinrich Schutze.
|The set of observations of event conditioned on context. event must be an instance of Event and context of Context
|A CondObserved set for a single event and context.
|Events are conditioned on Contexts. When Contexts are sparse, we need a way to decompose into simpler SubContexts.
This class allows us to separate this decomposition from the collection of larger contexts.
|type SubMap a :: * -> * -> *||Source|
|A map over subcontexts (for efficiency)
|A function to enumerate subcontexts of a context
General Linear Interpolation. Produces a Conditional Distribution from observations.
It requires a GeneralLambda function which tells it how to weight each level of smoothing.
The GeneralLambda function can observe the counts of each level of context.
Note: We include a final level of backoff where everything is given an epsilon likelihood. To
ignore this, just give it lambda = 0.
|Weight each level by the likelihood that a new event will be seen at that level.
t / ((n * d) + t) where t is the total count, d is the number of distinct observations,
and n is a user defined constant.
|Weight each level by a fixed predefined amount.
|Produced by Haddock version 2.6.0|