Portability | non-portable |
---|---|
Stability | experimental |
Maintainer | Jan Snajder <jan.snajder@fer.hr> |
Safe Haskell | None |
The Genetic Programming Library.
Genetic programming is an evolutionary optimization technique inspired by biological evolution. It is similar to genetic algorithms except that the individual solutions are programs (or, more generally, expressions) representing a solution to a given problem. A genetic program is represented as an abstract syntax tree and associated with a custom-defined fitness value indicating the quality of the solution. Starting from a randomly generated initial population of genetic programs, the genetic operators of selection, crossover, and (occasionally) mutation are used to evolve programs of increasingly better quality.
Standard reference is: John Koza. Genetic programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
In GenProg, a genetic program is represented by a value of an
algebraic datatype. To use a datatype as a genetic program, it
suffices to define it as an instance of the GenProg
typeclass.
A custom datatype can be made an instance of the GenProg
typeclass, provided it is an instance of the Data
typeclass (see
GenProg.GenExpr.Data).
An example of how to use this library is given below.
- class (Eq e, GenExpr e, MonadRandom m) => GenProg m e | e -> m where
- terminal :: m e
- nonterminal :: m e
- generateFullExpr :: GenProg m e => Int -> m e
- generateGrownExpr :: GenProg m e => Int -> m e
- depth :: GenExpr e => e -> Int
- nodes :: GenExpr e => e -> Int
- data Ind e
- unInd :: Ind e -> e
- mkInd :: GenProg m e => Fitness e -> e -> Ind e
- aFitness :: Ind e -> Double
- sFitness :: Ind e -> Double
- data Pop e
- unPop :: Pop e -> [Ind e]
- mkPop :: [Ind e] -> Pop e
- generatePop :: GenProg m e => EvolParams m e -> m (Pop e)
- replenishPop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)
- mergePop :: GenProg m e => EvolParams m e -> Pop e -> Pop e -> Pop e
- best :: Pop e -> Ind e
- avgFitness :: Pop e -> Double
- avgDepth :: GenProg m e => Pop e -> Double
- avgNodes :: GenProg m e => Pop e -> Double
- crossoverInd :: GenProg m e => EvolParams m e -> Ind e -> Ind e -> m (Ind e, Ind e)
- mutateInd :: GenProg m e => EvolParams m e -> Ind e -> m (Ind e)
- crossoverPop :: GenProg m e => EvolParams m e -> Pop e -> m (Ind e, Ind e)
- mutatePop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)
- data EvolState e = EvolState {}
- type Fitness e = e -> Double
- type Mutate m e = e -> m e
- defaultMutation :: GenProg m e => EvolParams m e -> Mutate m e
- type Terminate e = EvolState e -> Bool
- tSuccess :: (e -> Bool) -> Terminate e
- tFitness :: GenProg m e => Double -> Terminate e
- tGeneration :: Int -> Terminate e
- data EvolParams m e = EvolParams {}
- defaultEvolParams :: GenProg m e => EvolParams m e
- evolve :: GenProg m e => EvolParams m e -> m (EvolState e)
- evolveFrom :: GenProg m e => EvolParams m e -> Pop e -> m (EvolState e)
- evolveTrace :: GenProg m e => EvolParams m e -> m [EvolState e]
- evolveTraceFrom :: GenProg m e => EvolParams m e -> Pop e -> m [EvolState e]
Genetic programs
class (Eq e, GenExpr e, MonadRandom m) => GenProg m e | e -> m whereSource
A typeclass defining a genetic program interface. Datatypes e
that are to be used as genetic programs must be instances of the
GenExpr
typeclass and must implement this interface.
Generates a random terminal T
.
nonterminal :: m eSource
Generates a random nonterminal (functional) node F(T,...,T)
whose
arguments are again terminals (this condition is not verified).
Expressions
generateFullExpr :: GenProg m e => Int -> m eSource
Generates a random expression fully expanded to the specified depth.
generateGrownExpr :: GenProg m e => Int -> m eSource
Generates a random expression of limited depth. The maximum depth of the resulting expression may be less than the specified depth limit, and paths may be of different length.
depth :: GenExpr e => e -> IntSource
The depth of an expression. Equals 1 for single-node expressions.
Individuals
A genetically programmed individual, representing a basic unit of evolution. (Basically a wrapper around a genetically programmable expression.)
aFitness :: Ind e -> DoubleSource
Adjusted fitness of an individual. Adjusted fitness equals
1/(1+s)
, where s
is the standardized fitness as computed by
fitness
. To reduce computational costs, this value is computed
only once and then cached.
Population
A population of individuals. (Basically a wrapper around a list of individuals.)
generatePop :: GenProg m e => EvolParams m e -> m (Pop e)Source
Generate population of given size and given depth limit using
ramped half-and-half method (Koza, 1992): for each depth value from 0 to
the initial depth limit iDepth
, 50% of individuals are generated using
generateFullExpr
and 50% are generated using
generateGrownExpr
. Afterwards, duplicates are removed, thus the
size of the resulting population may actually be less than the
specified size.
replenishPop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)Source
Replenishes a population up to popSize
by randomly
generating new individuals.
mergePop :: GenProg m e => EvolParams m e -> Pop e -> Pop e -> Pop eSource
Merges two populations by taking popSize
best-fitted individuals
from the union of the two populations.
avgFitness :: Pop e -> DoubleSource
Population's average standardized fitness.
avgNodes :: GenProg m e => Pop e -> DoubleSource
Average number of expression nodes in the population.
Genetic operators
The following functions are not meant to be used directly. They are exposed for debugging purposes.
crossoverInd :: GenProg m e => EvolParams m e -> Ind e -> Ind e -> m (Ind e, Ind e)Source
Crossover operation of two individuals, resulting in two
offsprings. Crossover is performed by choosing at random two nodes
in each expressions, and then by exchanging the subexpressions
rooted at these nodes between the two individuals. The probability
that an internal (functional) node is chosen as crossover point is
set by the ciProb
parameter in EvolParams
, whereas the
probability that an external (terminal) node is chosen equals
1-ciProb
. Among internal and external nodes, nodes are chosen
uniformly at random. If the depth of a created offspring exceeds
the depth limit cDepth
specified by evolution parameters
EvolParams
, that offspring is discarded and a parent is
reproduced (i.e., copied as-is).
mutateInd :: GenProg m e => EvolParams m e -> Ind e -> m (Ind e)Source
Mutates an individual by applying the mutation function mutate
to a randomly selected node. The probability that an internal
(functional) node is chosen for muration is set by the miProb
parameter in EvolParams
, whereas the probability that an external
(terminal) node is chosen equals 1-miProb
. Among internal and
external nodes, nodes are chosen uniformly at random. If the depth
of the mutated expression exceeds the depth limit cDepth
specified by evolution parameters EvolParams
, the individual is
left unaltered.
crossoverPop :: GenProg m e => EvolParams m e -> Pop e -> m (Ind e, Ind e)Source
Applies crossover to two randomly chosen individuals from a population. The probability of an individual being chosen as parent is fitness-proportionate (individuals with better fitness have better chanches of being chosen for crossover).
mutatePop :: GenProg m e => EvolParams m e -> Pop e -> m (Pop e)Source
Applies mutation operation to individuals from a population. The
probability of mutating each individual is determined by mProb
parameter
from EvalParams
.
Evolution state
The state of the evolution.
Control parameters
type Fitness e = e -> DoubleSource
Standardized fitness. It takes on values from 0 (best fitness) to +infinity (worst fitness).
defaultMutation :: GenProg m e => EvolParams m e -> Mutate m eSource
Default mutation. Replaces a node, irrespective of its value,
with a randomly generated subexpression whose depth is limited to
iDepth
.
tSuccess :: (e -> Bool) -> Terminate eSource
Termination predicate: terminate if any individual satisfies the specified predicate.
tFitness :: GenProg m e => Double -> Terminate eSource
Termination predicate: terminate if best individual's standardized fitness is greater than or equal to the specified value.
tGeneration :: Int -> Terminate eSource
Termination predicate: terminate after running for the specified number of iterations.
data EvolParams m e Source
Parameters governing the evolution.
Default evolution parameters,
as used in (Koza, 1992), are defined by defaultEvolParams
and indicated below. At least the fitness function fitness
should
be overriden.
EvolParams | |
|
defaultEvolParams :: GenProg m e => EvolParams m eSource
Evolution
evolve :: GenProg m e => EvolParams m e -> m (EvolState e)Source
Creates an initial population and evolves it until termination predicate is satisfied, returning the last evolution state.
evolveFrom :: GenProg m e => EvolParams m e -> Pop e -> m (EvolState e)Source
Evolves a given initial population until termination
predicate is satisfied, returning the last evolution state.
If the size of the initial population is less than
popSize
, the population will be replenished (see replenishPop
).
evolveTrace :: GenProg m e => EvolParams m e -> m [EvolState e]Source
Creates an initial population and runs evolution until termination predicate is satisfied. Returns a list of successive evolution states.
evolveTraceFrom :: GenProg m e => EvolParams m e -> Pop e -> m [EvolState e]Source
Runs evolution on a given initial population until termination
predicate is satisfied and returns a list of successive evolution
states. If the size of the initial population is less than
popSize
, the population will be replenished (see replenishPop
).
Example
This is a simple, worked through example of how to use the GenProg
library. Given a target number n
, out aim is to evolve an arithmetic
expression that evaluates to n
. For example, given 13
as the
target number, one possible solution is (3 * 5) - 2
. The constants
allowed to appear in the expression are restricted to integers from 1
to 9. The allowed operations are +
, -
, *
, and integer division
without remainder.
We begin by defining the datatype for the genetically programed expression:
-- The following language extensions need to be enabled: -- DeriveDataTypeable, FlexibleInstances, MultiParamTypeClasses import GenProg import Data.Generics import Control.Monad import Control.Monad.Random data E = Plus E E | Minus E E | Times E E | Div E E | Const Int deriving (Typeable,Data,Eq,Show)
In order to evolve arithmetic expressions, we need to be able to compute their values. To this end we define
eval :: E -> Maybe Int eval (Const c) = Just c eval (Plus e1 e2) = liftM2 (+) (eval e1) (eval e2) eval (Minus e1 e2) = liftM2 (-) (eval e1) (eval e2) eval (Times e1 e2) = liftM2 (*) (eval e1) (eval e2) eval (Div e1 e2) | ok = liftM2 div x1 x2 | otherwise = Nothing where (x1,x2) = (eval e1,eval e2) ok = x2 /= Just 0 && liftM2 mod x1 x2 == Just 0
Dividing by zero and dividing with a remainder are not allowed and in
such cases we return Nothing
.
Because we have made E
an instance of the Data
typeclass, it can
be readily used as a genetically programmable expression. Next step is
to make E
an instance of the GenProg
typeclass:
instance GenProg (Rand StdGen) E where
terminal = Const liftM
getRandomR (1,9)
nonterminal = do
r <- getRandomR (0,3)
[liftM2 Plus terminal terminal,
liftM2 Minus terminal terminal,
liftM2 Times terminal terminal,
liftM2 Div terminal terminal] !! r
Thus, a random terminal node contains one of the constants from 1 to
9. A nonterminal node can be one of the four arithmetic operations,
each with terminal nodes as arguments. Note that computations are run
within the standard random generator monad (Rand StdGen
).
The fitness function evaluates the accurateness of the arithmetic
expression with respect to the target number. If the value of the
expression is far off from the target number n
, the standardized
fitness should be high. Moreover, we would like to keep the expression
as simple as possible. To this end, we include a parsimony factor
that is proportional to the number of nodes an expression has. We
define the overall standardized fitness as
myFitness :: Int -> E -> Double myFitness n e = error + size where error = realToFrac $ maybe maxBound (abs . (n-)) (eval e) size = (realToFrac $ nodes e) / 100
The number of nodes is divided by a factor of 100 to make it less important than the numeric accuracy of the expression.
We now have everything in place to get the evolution going. We will use
default evolution parameters and choose 12345
as the target number:
>>>
let params = defaultEvolParams { fitness = myFitness 12345 }
Let us first create a random number generator:
>>>
let g = mkStdGen 0
We are doing this because we want our results to be reproducible, and
because we want to be able to compare the results of different
evolution runs. Normally, you would use getStdGen
to get a random
generator with random seed.
To run the evolution and get the best evolved individual, we type
>>>
let i = cachedBest $ evalRand (evolve params) g
To check out its standardized fitness, we type
>>>
sFitness i
39.61
Let us see how the actual expression looks like:
>>>
unInd i
Times (Minus (Minus (Minus (Plus (Const 4) (Const 4)) (Plus (Const 6) (Const 7))) (Minus (Minus (Const 5) (Const 9)) (Plus (Minus (Const 5) (Const 9)) (Minus (Const 4) (Const 4))))) (Plus (Times (Plus (Const 5) (Const 1)) (Const 6)) (Times (Plus (Const 9) (Const 3)) (Minus (Const 1) (Const 8))))) (Div (Times (Plus (Plus (Const 3) (Const 5)) (Times (Const 4) (Const 7))) (Plus (Const 4) (Const 4))) (Minus (Minus (Plus (Const 2) (Const 8)) (Plus (Const 6) (Const 7))) (Plus (Minus (Const 5) (Const 9)) (Minus (Const 4) (Const 4)))))
The number of nodes is
>>>
nodes $ unInd i
61
Let us see to what number the expression evaluates:
>>>
eval $ unInd i
Just 12384
So in this run we didn't get a perfect match, but we were close. Let us see if we can do better.
When doing genetic programming, it is always a good idea to experiment a bit with the parameters. There are no parameters that work best for any given problem. You can learn a lot about how parameters influence the evolution by analysing how the evolution progresses in time. This can be accomplised by evolving an evolution trace:
>>>
let trace = evalRand (evolveTrace params) g
We can now analyse how the standardized fitness of the best individual improves during the evolution:
>>>
map (sFitness . best . pop) trace
[9591.35,2343.59,1935.59,2343.59,903.51,903.45,585.59,585.59,327.45,225.41, 225.41,135.43,57.49,39.61,39.61,39.61,39.61,39.61,57.43,57.47,57.43,57.45, 57.33,57.43,57.43,57.45,57.43,57.43,57.35,57.35,57.43,57.27,57.33,57.33,57.43, 57.29,57.33,57.41,57.29,57.43,57.33,57.35,57.35,57.33,57.39,57.39,57.39,57.33, 57.37,57.37]
We see that at some point the fitness decreases and then increases again. This indicates that the best fitted individual was lost by evolving from one generation to the other. We can prevent this by employing the elitist strategy. Let us see what happens if we preserve a best fitted individual in each generation:
>>>
let trace = evalRand (evolveTrace params {elitists = 1}) g
>>>
map (sFitness . best . pop) trace
[9591.35,2343.59,711.61,711.61,711.61,711.61,57.55,57.53,57.39,57.39,57.39, 57.39,57.37,57.37,57.37,57.37,57.37,57.37,57.37,57.37,57.35,57.35,57.35, 57.35,57.35,57.35,57.35,57.35,57.35,57.35,57.33,57.33,57.33,57.33,57.33, 57.33,57.33,57.33,57.33,25.31,25.31,25.31,25.31,25.31,25.31,25.296,25.296, 25.296,25.296,25.296]
This gives us better fitness, but still not an exact match:
>>>
let i = cachedBest $ last trace
>>>
eval $ unInd i
Just 12320
In the previous evolution run fitness converged relatively fast, but then remained stuck. To stir up things a little, let us allow for some mutation. Setting mutation probability to 5%, while retaining the elitist strategy, we get
>>>
let trace = evalRand (evolveTrace params {elitists = 1, mProb = 0.05}) g
>>>
map (sFitness . best . pop) trace
[9591.35,9591.35,9591.35,9591.35,9591.35,9591.35,9159.35,8403.23,7239.11, 6087.15,6087.15,1479.13,819.21,60.13,51.19,5.19,5.19,5.19,5.19,5.19,1.23, 1.23,1.23,1.23,1.23,1.23,1.21,1.21,1.21,1.21,0.23998,0.23998,0.23998,0.23998, 0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998, 0.23998,0.23998,0.23998,0.23998,0.23998,0.23998,0.23998]
This time we've got a perfect match:
>>>
let i = cachedBest $ last trace
>>>
eval $ unInd i
Just 12345
while at the same time the expression is rather compact:
>>>
unInd i
Plus (Times (Const 4) (Plus (Const 9) (Const 4))) (Plus (Plus (Times (Plus (Const 4) (Const 3)) (Times (Times (Const 3) (Const 9)) (Times (Const 5) (Plus (Const 9) (Const 4))))) (Const 3)) (Const 5))>>>
nodes $ unInd i
23