hmep-0.1.1: HMEP Multi Expression Programming – a genetic programming variant

Safe HaskellNone




Copyright Bogdan Penkovsky (c) 2017

Multiple Expression Programming

Example application: trigonometry cheating

Suppose, you forgot certain trigonometric identities. For instance, you want to express cos^2(x) using sin(x). No problem, set the target function cos^2(x) in the dataset and add sin to the arithmetic set of operators {+,-,*,/}. See app/Main.hs.

After running

 $ stack build && stack exec hmep-demo

We obtain

 Average loss in the initial population 15.268705681244962
 Population 10: average loss 14.709728527360586
 Population 20: average loss 13.497114190675477
 Population 30: average loss 8.953185872653737
 Population 40: average loss 8.953185872653737
 Population 50: average loss 3.3219954564955856e-15

The average value of 3.3e-15 is close to zero, indicating that the exact expression was found!

The produced output was:

 Interpreted expression:
 v1 = sin x0
 v2 = v1 * v1
 result = 1 - v2

From here we can infer that cos^2(x) = 1 - v2 = 1 - v1 * v1 = 1 - sin^2(x).




type Chromosome a = Vector (Gene a Int) Source

A chromosome is a vector of genes

data Gene a i Source

Either a terminal symbol or a three-address code (a function and two pointers)


(Eq a, Eq i) => Eq (Gene a i) Source

Eq instance for Gene

(Show a, Show i) => Show (Gene a i) Source

Show instance for Gene

type Population a = [Chromosome a] Source

List of chromosomes

type Phenotype a = (Double, Chromosome a, Vector Int) Source

Loss value, chromosome, and the best expression indices vector

data Config a Source

MEP configuration




p'const :: Double

Probability of constant generation

p'var :: Double

Probability of variable generation. The probability of operator generation is inferred automatically as 1 - p'const - p'var.

p'mutation :: Double

Mutation probability

p'crossover :: Double

Crossover probability

c'length :: Int

The chromosome length

c'popSize :: Int

A (sub)population size

c'popN :: Int

Number of subpopulations (1 or more) [not implemented]

c'ops :: Vector (F a)

Functions pool with their symbolic representations

c'vars :: Int

The input dimensionality

defaultConfig :: Config Double Source

defaultConfig = Config
    p'const = 0.1
  , p'var = 0.4
  , p'mutation = 0.1
  , p'crossover = 0.9

  , c'length = 50
  , c'popSize = 100
  , c'popN = 1
  , c'ops = V.empty  -- <-- To be overridden
  , c'vars = 1

type LossFunction a = (Vector a -> Vector a) -> (Vector Int, Double) Source

A function to minimize.

The argument is a vector evaluation function whose input is a vector (length c'vars) and ouput is a vector with a different length c'length.

The result is a vector of the best indices and a scalar loss value.

Genetic algorithm

initialize :: PrimMonad m => Config Double -> RandT m (Population Double) Source

Randomly generate a new population

evaluatePopulation :: Num a => LossFunction a -> Population a -> Generation a Source

Using LossFunction, find how fit is each chromosome in the population

regressionLoss1 Source


:: (Num result, Ord result) 
=> (b -> b -> result)

Distance function

-> [(a, b)]


-> (Vector a -> Vector b)

Chromosome evaluation function (partially applied evaluate)

-> (Vector Int, result) 

Loss function for regression problems with one input and one output. Not normalized with respect to the dataset size.

avgLoss :: Generation Double -> Double Source

Average population loss

best :: Generation a -> Phenotype a Source

The best phenotype in the generation

worst :: Generation a -> Phenotype a Source

The worst phenotype in the generation

evolve Source


:: PrimMonad m 
=> Config Double

Common configuration

-> LossFunction Double

Custom loss function

-> (Chromosome Double -> RandT m (Chromosome Double))


-> (Chromosome Double -> Chromosome Double -> RandT m (Chromosome Double, Chromosome Double))


-> (Generation Double -> RandT m (Chromosome Double))

A chromosome selection algorithm. Does not need to be random, but may be.

-> Generation Double

Evaluated population

-> RandT m (Generation Double)

New generation

Selection operator that produces the next evaluated population.

Standard algorithm: the best offspring O replaces the worst individual W in the current population if O is better than W.

binaryTournament :: (PrimMonad m, Ord a) => Generation a -> RandT m (Chromosome a) Source

Binary tournament selection

crossover :: PrimMonad m => Chromosome a -> Chromosome a -> RandT m (Chromosome a, Chromosome a) Source

Uniform crossover operator

mutation3 Source


:: PrimMonad m 
=> Config Double

Common configuration

-> Chromosome Double 
-> RandT m (Chromosome Double) 

Mutation operator with up to three mutations per chromosome

smoothMutation Source


:: PrimMonad m 
=> Double

Probability of gene mutation

-> Config Double

Common configuration

-> Chromosome Double 
-> RandT m (Chromosome Double) 

Mutation operator with a fixed mutation probability of each gene

newChromosome Source


:: PrimMonad m 
=> Config Double

Common configuration

-> RandT m (Chromosome Double) 

Randomly initialize a new chromosome. By definition, the first gene is terminal (a constant or a variable).

Expression interpretation

generateCode :: Phenotype Double -> String Source

Generate code for the functions with a single output


data RandT m a :: (* -> *) -> * -> *


Monad m => Monad (RandT m) 
Monad m => Functor (RandT m) 
Monad m => Applicative (RandT m) 

runRandIO :: RandT IO a -> IO a Source

Alias for mwc: Take a RandT value and run it in IO, generating all the random values described by the RandT.

It initializes the random number generator. For performance reasons, it is recommended to minimize the number of calls to runRandIO.