hmep-0.1.1: HMEP Multi Expression Programming – a genetic programming variant

AI.MEP

Description

Copyright Bogdan Penkovsky (c) 2017

# Multiple Expression Programming

## Example application: trigonometry cheating

Suppose, you forgot certain trigonometric identities. For instance, you want to express cos^2(x) using sin(x). No problem, set the target function cos^2(x) in the dataset and add sin to the arithmetic set of operators `{+,-,*,/}`. See app/Main.hs.

After running

``` \$ stack build && stack exec hmep-demo
```

We obtain

``` Average loss in the initial population 15.268705681244962
Population 10: average loss 14.709728527360586
Population 20: average loss 13.497114190675477
Population 30: average loss 8.953185872653737
Population 40: average loss 8.953185872653737
Population 50: average loss 3.3219954564955856e-15
```

The average value of 3.3e-15 is close to zero, indicating that the exact expression was found!

The produced output was:

``` Interpreted expression:
v1 = sin x0
v2 = v1 * v1
result = 1 - v2
```

From here we can infer that ``` cos^2(x) = 1 - v2 = 1 - v1 * v1 = 1 - sin^2(x). ```

Sweet!

Synopsis

# Documentation

type Chromosome a = Vector (Gene a Int) Source

A chromosome is a vector of genes

data Gene a i Source

Either a terminal symbol or a three-address code (a function and two pointers)

Instances

 (Eq a, Eq i) => Eq (Gene a i) Source `Eq` instance for `Gene` (Show a, Show i) => Show (Gene a i) Source `Show` instance for `Gene`

type Population a = [Chromosome a] Source

List of chromosomes

type Phenotype a = (Double, Chromosome a, Vector Int) Source

Loss value, chromosome, and the best expression indices vector

data Config a Source

MEP configuration

Constructors

 Config Fieldsp'const :: DoubleProbability of constant generationp'var :: DoubleProbability of variable generation. The probability of operator generation is inferred automatically as `1 - p'const - p'var`.p'mutation :: DoubleMutation probabilityp'crossover :: DoubleCrossover probabilityc'length :: IntThe chromosome lengthc'popSize :: IntA (sub)population sizec'popN :: IntNumber of subpopulations (1 or more) [not implemented]c'ops :: Vector (F a)Functions pool with their symbolic representationsc'vars :: IntThe input dimensionality
```defaultConfig = Config
{
p'const = 0.1
, p'var = 0.4
, p'mutation = 0.1
, p'crossover = 0.9

, c'length = 50
, c'popSize = 100
, c'popN = 1
, c'ops = V.empty  -- <-- To be overridden
, c'vars = 1
}
```

type LossFunction a = (Vector a -> Vector a) -> (Vector Int, Double) Source

A function to minimize.

The argument is a vector evaluation function whose input is a vector (length `c'vars`) and ouput is a vector with a different length `c'length`.

The result is a vector of the best indices and a scalar loss value.

# Genetic algorithm

initialize :: PrimMonad m => Config Double -> RandT m (Population Double) Source

Randomly generate a new population

evaluatePopulation :: Num a => LossFunction a -> Population a -> Generation a Source

Using `LossFunction`, find how fit is each chromosome in the population

Arguments

 :: (Num result, Ord result) => (b -> b -> result) Distance function -> [(a, b)] Dataset -> (Vector a -> Vector b) Chromosome evaluation function (partially applied `evaluate`) -> (Vector Int, result)

Loss function for regression problems with one input and one output. Not normalized with respect to the dataset size.

Average population loss

The best phenotype in the generation

The worst phenotype in the generation

Arguments

 :: PrimMonad m => Config Double Common configuration -> LossFunction Double Custom loss function -> (Chromosome Double -> RandT m (Chromosome Double)) Mutation -> (Chromosome Double -> Chromosome Double -> RandT m (Chromosome Double, Chromosome Double)) Crossover -> (Generation Double -> RandT m (Chromosome Double)) A chromosome selection algorithm. Does not need to be random, but may be. -> Generation Double Evaluated population -> RandT m (Generation Double) New generation

Selection operator that produces the next evaluated population.

Standard algorithm: the best offspring O replaces the worst individual W in the current population if O is better than W.

binaryTournament :: (PrimMonad m, Ord a) => Generation a -> RandT m (Chromosome a) Source

Binary tournament selection

crossover :: PrimMonad m => Chromosome a -> Chromosome a -> RandT m (Chromosome a, Chromosome a) Source

Uniform crossover operator

Arguments

 :: PrimMonad m => Config Double Common configuration -> Chromosome Double -> RandT m (Chromosome Double)

Mutation operator with up to three mutations per chromosome

Arguments

 :: PrimMonad m => Double Probability of gene mutation -> Config Double Common configuration -> Chromosome Double -> RandT m (Chromosome Double)

Mutation operator with a fixed mutation probability of each gene

Arguments

 :: PrimMonad m => Config Double Common configuration -> RandT m (Chromosome Double)

Randomly initialize a new chromosome. By definition, the first gene is terminal (a constant or a variable).

# Expression interpretation

Generate code for the functions with a single output

# Random

data RandT m a :: (* -> *) -> * -> *

Instances

 Monad m => Monad (RandT m) Monad m => Functor (RandT m) Monad m => Applicative (RandT m)

runRandIO :: RandT IO a -> IO a Source

Alias for `mwc`: Take a `RandT` value and run it in `IO`, generating all the random values described by the `RandT`.

It initializes the random number generator. For performance reasons, it is recommended to minimize the number of calls to `runRandIO`.