record-encode-0.2.3: Generic encoding of records

Data.Record.Encode

Description

This library provides generic machinery (via GHC.Generics and generics-sop) to encode values of some algebraic type as points in a vector space.

Processing datasets that have one or more categorical variables (which in other words are values of a sum type) typically requires a series of boilerplate transformations, and the encodeOneHot function provided here does precisely that.

## Internals

This library makes use of generic programming to analyze both values and types (see the Generics module).

Synopsis

# One-hot encoding

encodeOneHot :: forall a. G a => a -> OneHot Source #

Computes the one-hot encoding of a value of a sum type. A sum type is defined as a choice between N type constructors, each having zero or more fields.

The number of constructors becomes the dimensionality of the embedding space, and the constructor position (as defined in its implementation) is interpreted as the index of the nonzero coordinate.

NB : This function computes the generic representation only up to the outermost constructor (see examples below).

The type of the input value must be an instance of Generic (from GHC.Generics) and of Generic (from the generics-sop library).

> :set -XDeriveGeneric

> import qualified GHC.Generics as G
> import qualified Generics.SOP as SOP
> import Data.Record.Encode

> data X = A | B | C deriving (Enum, G.Generic)
> instance SOP.Generic X


The B constructor is the second (i.e. position 1 counting from 0) of a choice of three :

>>> encodeOneHot B
OH {oDim = 3, oIx = 1}


The Just constructor is the second of a choice of two:

>>> encodeOneHot \$ Just B
OH {oDim = 2, oIx = 1}


The Nothing constructor is the first:

>>> encodeOneHot (Nothing :: Maybe Int)
OH {oDim = 2, oIx = 0}


## Types

data OneHot Source #

A one-hot encoding is a d-dimensional vector having a single component equal to 1 and all others equal to 0. We represent it here compactly as two integers: an integer dimension and an index (which must both be nonnegative).

Constructors

 OH FieldsoDim :: !IntDimension of embedding space (i.e. number of categories)oIx :: !IntIndex of nonzero coordinate
Instances
 Source # Instance detailsDefined in Data.Record.Encode Methods(==) :: OneHot -> OneHot -> Bool #(/=) :: OneHot -> OneHot -> Bool # Source # Instance detailsDefined in Data.Record.Encode MethodsshowsPrec :: Int -> OneHot -> ShowS #showList :: [OneHot] -> ShowS #

## Utilities

Compares two one-hot encodings for equality. Returns Nothing if the operand dimensions are not equal.

>>> compareOH (OH 3 2) (OH 3 1)
Just GT

>>> compareOH (OH 3 2) (OH 5 1)
Nothing


oneHotV :: Num a => OneHot -> Vector a Source #

Create a one-hot vector

# Generics-related

type G a = (GVariants (Rep a), Generic a, Generic a) Source #

Constraints necessary to encodeOneHot a value.

NB: GVariants is an internal typeclass, and this constraint is automatically satisfied if the type is an instance of Generic