Safe Haskell | None |
---|---|
Language | Haskell2010 |
Copyright Bogdan Penkovsky (c) 2017
Multiple Expression Programming
Example application: trigonometry cheating
Suppose, you forgot certain trigonometric identities.
For instance, you want to express cos^2(x) using sin(x).
No problem, set the target function cos^2(x) in the dataset
and add sin to the arithmetic set of operators {+,-,*,/}
.
See app/Main.hs.
After running
$ stack build && stack exec hmep-demo
We obtain
Average loss in the initial population 15.268705681244962 Population 10: average loss 14.709728527360586 Population 20: average loss 13.497114190675477 Population 30: average loss 8.953185872653737 Population 40: average loss 8.953185872653737 Population 50: average loss 3.3219954564955856e-15
The average value of 3.3e-15 is close to zero, indicating that the exact expression was found!
The produced output was:
Interpreted expression: v1 = sin x0 v2 = v1 * v1 result = 1 - v2
From here we can infer that
cos^2(x) = 1 - v2 = 1 - v1 * v1 = 1 - sin^2(x).
Sweet!
- type Chromosome a = Vector (Gene a Int)
- data Gene a i
- type Population a = [Chromosome a]
- type Phenotype a = (Double, Chromosome a, Vector Int)
- data Config a = Config {}
- defaultConfig :: Config Double
- type LossFunction a = (Vector a -> Vector a) -> (Vector Int, Double)
- initialize :: PrimMonad m => Config Double -> RandT m (Population Double)
- evaluatePopulation :: Num a => LossFunction a -> Population a -> Generation a
- regressionLoss1 :: (Num result, Ord result) => (b -> b -> result) -> [(a, b)] -> (Vector a -> Vector b) -> (Vector Int, result)
- avgLoss :: Generation Double -> Double
- best :: Generation a -> Phenotype a
- worst :: Generation a -> Phenotype a
- evolve :: PrimMonad m => Config Double -> LossFunction Double -> (Chromosome Double -> RandT m (Chromosome Double)) -> (Chromosome Double -> Chromosome Double -> RandT m (Chromosome Double, Chromosome Double)) -> (Generation Double -> RandT m (Chromosome Double)) -> Generation Double -> RandT m (Generation Double)
- binaryTournament :: (PrimMonad m, Ord a) => Generation a -> RandT m (Chromosome a)
- crossover :: PrimMonad m => Chromosome a -> Chromosome a -> RandT m (Chromosome a, Chromosome a)
- mutation3 :: PrimMonad m => Config Double -> Chromosome Double -> RandT m (Chromosome Double)
- smoothMutation :: PrimMonad m => Double -> Config Double -> Chromosome Double -> RandT m (Chromosome Double)
- newChromosome :: PrimMonad m => Config Double -> RandT m (Chromosome Double)
- generateCode :: Phenotype Double -> String
- data RandT m a :: (* -> *) -> * -> *
- runRandIO :: RandT IO a -> IO a
Documentation
type Chromosome a = Vector (Gene a Int) Source
A chromosome is a vector of genes
Either a terminal symbol or a three-address code (a function and two pointers)
type Population a = [Chromosome a] Source
List of chromosomes
type Phenotype a = (Double, Chromosome a, Vector Int) Source
Loss value, chromosome, and the best expression indices vector
MEP configuration
Config | |
|
defaultConfig :: Config Double Source
defaultConfig = Config { p'const = 0.1 , p'var = 0.4 , p'mutation = 0.1 , p'crossover = 0.9 , c'length = 50 , c'popSize = 100 , c'popN = 1 , c'ops = V.empty -- <-- To be overridden , c'vars = 1 }
type LossFunction a = (Vector a -> Vector a) -> (Vector Int, Double) Source
A function to minimize.
The argument is a vector evaluation function whose input
is a vector (length c'vars
) and ouput is
a vector with a different length c'length
.
The result is a vector of the best indices and a scalar loss value.
Genetic algorithm
initialize :: PrimMonad m => Config Double -> RandT m (Population Double) Source
Randomly generate a new population
evaluatePopulation :: Num a => LossFunction a -> Population a -> Generation a Source
Using LossFunction
, find how fit is each chromosome in the population
:: (Num result, Ord result) | |
=> (b -> b -> result) | Distance function |
-> [(a, b)] | Dataset |
-> (Vector a -> Vector b) | Chromosome evaluation function (partially applied |
-> (Vector Int, result) |
Loss function for regression problems with one input and one output. Not normalized with respect to the dataset size.
avgLoss :: Generation Double -> Double Source
Average population loss
best :: Generation a -> Phenotype a Source
The best phenotype in the generation
worst :: Generation a -> Phenotype a Source
The worst phenotype in the generation
:: PrimMonad m | |
=> Config Double | Common configuration |
-> LossFunction Double | Custom loss function |
-> (Chromosome Double -> RandT m (Chromosome Double)) | Mutation |
-> (Chromosome Double -> Chromosome Double -> RandT m (Chromosome Double, Chromosome Double)) | Crossover |
-> (Generation Double -> RandT m (Chromosome Double)) | A chromosome selection algorithm. Does not need to be random, but may be. |
-> Generation Double | Evaluated population |
-> RandT m (Generation Double) | New generation |
Selection operator that produces the next evaluated population.
Standard algorithm: the best offspring O replaces the worst individual W in the current population if O is better than W.
binaryTournament :: (PrimMonad m, Ord a) => Generation a -> RandT m (Chromosome a) Source
Binary tournament selection
crossover :: PrimMonad m => Chromosome a -> Chromosome a -> RandT m (Chromosome a, Chromosome a) Source
Uniform crossover operator
:: PrimMonad m | |
=> Config Double | Common configuration |
-> Chromosome Double | |
-> RandT m (Chromosome Double) |
Mutation operator with up to three mutations per chromosome
:: PrimMonad m | |
=> Double | Probability of gene mutation |
-> Config Double | Common configuration |
-> Chromosome Double | |
-> RandT m (Chromosome Double) |
Mutation operator with a fixed mutation probability of each gene
:: PrimMonad m | |
=> Config Double | Common configuration |
-> RandT m (Chromosome Double) |
Randomly initialize a new chromosome. By definition, the first gene is terminal (a constant or a variable).
Expression interpretation
generateCode :: Phenotype Double -> String Source
Generate code for the functions with a single output
Random
data RandT m a :: (* -> *) -> * -> *