criterion- Robust, reliable performance measurement and analysis

Copyright(c) 2009-2014 Bryan O'Sullivan
Safe HaskellNone




Types for benchmarking.

The core type is Benchmarkable, which admits both pure functions and IO actions.

For a pure function of type a -> b, the benchmarking harness calls this function repeatedly, each time with a different Int64 argument (the number of times to run the function in a loop), and reduces the result the function returns to weak head normal form.

For an action of type IO a, the benchmarking harness calls the action repeatedly, but does not reduce the result.



data Config Source

Top-level benchmarking configuration.




confInterval :: Double

Confidence interval for bootstrap estimation (greater than 0, less than 1).

forceGC :: Bool

Obsolete, unused. This option used to force garbage collection between every benchmark run, but it no longer has an effect (we now unconditionally force garbage collection). This option remains solely for backwards API compatibility.

timeLimit :: Double

Number of seconds to run a single benchmark. (In practice, execution time will very slightly exceed this limit.)

resamples :: Int

Number of resamples to perform when bootstrapping.

regressions :: [([String], String)]

Regressions to perform.

rawDataFile :: Maybe FilePath

File to write binary measurement and analysis data to. If not specified, this will be a temporary file.

reportFile :: Maybe FilePath

File to write report output to, with template expanded.

csvFile :: Maybe FilePath

File to write CSV summary to.

junitFile :: Maybe FilePath

File to write JUnit-compatible XML results to.

verbosity :: Verbosity

Verbosity level to use when running and analysing benchmarks.

template :: FilePath

Template file to use if writing a report.

data Verbosity Source

Control the amount of information displayed.



Benchmark descriptions

newtype Benchmarkable Source

A pure function or impure action that can be benchmarked. The Int64 parameter indicates the number of times to run the given function or action.


Benchmarkable (Int64 -> IO ()) 

data Benchmark where Source

Specification of a collection of benchmarks and environments. A benchmark may consist of:

  • An environment that creates input data for benchmarks, created with env.
  • A single Benchmarkable item with a name, created with bench.
  • A (possibly nested) group of Benchmarks, created with bgroup.


Environment :: NFData env => IO env -> (env -> Benchmark) -> Benchmark 
Benchmark :: String -> Benchmarkable -> Benchmark 
BenchGroup :: String -> [Benchmark] -> Benchmark 



data Measured Source

A collection of measurements made while benchmarking.

Measurements related to garbage collection are tagged with GC. They will only be available if a benchmark is run with "+RTS -T".

Packed storage. When GC statistics cannot be collected, GC values will be set to huge negative values. If a field is labeled with "GC" below, use fromInt and fromDouble to safely convert to "real" values.




measTime :: !Double

Total wall-clock time elapsed, in seconds.

measCpuTime :: !Double

Total CPU time elapsed, in seconds. Includes both user and kernel (system) time.

measCycles :: !Int64

Cycles, in unspecified units that may be CPU cycles. (On i386 and x86_64, this is measured using the rdtsc instruction.)

measIters :: !Int64

Number of loop iterations measured.

measAllocated :: !Int64

(GC) Number of bytes allocated. Access using fromInt.

measNumGcs :: !Int64

(GC) Number of garbage collections performed. Access using fromInt.

measBytesCopied :: !Int64

(GC) Number of bytes copied during garbage collection. Access using fromInt.

measMutatorWallSeconds :: !Double

(GC) Wall-clock time spent doing real work ("mutation"), as distinct from garbage collection. Access using fromDouble.

measMutatorCpuSeconds :: !Double

(GC) CPU time spent doing real work ("mutation"), as distinct from garbage collection. Access using fromDouble.

measGcWallSeconds :: !Double

(GC) Wall-clock time spent doing garbage collection. Access using fromDouble.

measGcCpuSeconds :: !Double

(GC) CPU time spent doing garbage collection. Access using fromDouble.

fromInt :: Int64 -> Maybe Int64 Source

Convert a (possibly unavailable) GC measurement to a true value. If the measurement is a huge negative number that corresponds to "no data", this will return Nothing.

toInt :: Maybe Int64 -> Int64 Source

Convert from a true value back to the packed representation used for GC measurements.

fromDouble :: Double -> Maybe Double Source

Convert a (possibly unavailable) GC measurement to a true value. If the measurement is a huge negative number that corresponds to "no data", this will return Nothing.

toDouble :: Maybe Double -> Double Source

Convert from a true value back to the packed representation used for GC measurements.

measureAccessors :: Map String (Measured -> Maybe Double, String) Source

Field names and accessors for a Measured record.

measureKeys :: [String] Source

Field names in a Measured record, in the order in which they appear.

rescale :: Measured -> Measured Source

Normalise every measurement as if measIters was 1.

(measIters itself is left unaffected.)

Benchmark construction

env Source


:: NFData env 
=> IO env

Create the environment. The environment will be evaluated to normal form before being passed to the benchmark.

-> (env -> Benchmark)

Take the newly created environment and make it available to the given benchmarks.

-> Benchmark 

Run a benchmark (or collection of benchmarks) in the given environment. The purpose of an environment is to lazily create input data to pass to the functions that will be benchmarked.

A common example of environment data is input that is read from a file. Another is a large data structure constructed in-place.

Motivation. In earlier versions of criterion, all benchmark inputs were always created when a program started running. By deferring the creation of an environment when its associated benchmarks need the its, we avoid two problems that this strategy caused:

  • Memory pressure distorted the results of unrelated benchmarks. If one benchmark needed e.g. a gigabyte-sized input, it would force the garbage collector to do extra work when running some other benchmark that had no use for that input. Since the data created by an environment is only available when it is in scope, it should be garbage collected before other benchmarks are run.
  • The time cost of generating all needed inputs could be significant in cases where no inputs (or just a few) were really needed. This occurred often, for instance when just one out of a large suite of benchmarks was run, or when a user would list the collection of benchmarks without running any.

Creation. An environment is created right before its related benchmarks are run. The IO action that creates the environment is run, then the newly created environment is evaluated to normal form (hence the NFData constraint) before being passed to the function that receives the environment.

Complex environments. If you need to create an environment that contains multiple values, simply pack the values into a tuple.

Lazy pattern matching. In situations where a "real" environment is not needed, e.g. if a list of benchmark names is being generated, undefined will be passed to the function that receives the environment. This avoids the overhead of generating an environment that will not actually be used.

The function that receives the environment must use lazy pattern matching to deconstruct the tuple, as use of strict pattern matching will cause a crash if undefined is passed in.

Example. This program runs benchmarks in an environment that contains two values. The first value is the contents of a text file; the second is a string. Pay attention to the use of a lazy pattern to deconstruct the tuple in the function that returns the benchmarks to be run.

setupEnv = do
  let small = replicate 1000 1
  big <- readFile "/usr/dict/words"
  return (small, big)

main = defaultMain [
   -- notice the lazy pattern match here!
   env setupEnv $ \ ~(small,big) ->
   bgroup "small" [
     bench "length" $ whnf length small
   , bench "length . filter" $ whnf (length . filter (==1)) small
 ,  bgroup "big" [
     bench "length" $ whnf length big
   , bench "length . filter" $ whnf (length . filter (==1)) big

Discussion. The environment created in the example above is intentionally not ideal. As Haskell's scoping rules suggest, the variable big is in scope for the benchmarks that use only small. It would be better to create a separate environment for big, so that it will not be kept alive while the unrelated benchmarks are being run.

bench Source


:: String

A name to identify the benchmark.

-> Benchmarkable

An activity to be benchmarked.

-> Benchmark 

Create a single benchmark.

bgroup Source


:: String

A name to identify the group of benchmarks.

-> [Benchmark]

Benchmarks to group under this name.

-> Benchmark 

Group several benchmarks together under a common name.

addPrefix Source


:: String


-> String


-> String 

Add the given prefix to a name. If the prefix is empty, the name is returned unmodified. Otherwise, the prefix and name are separated by a '/' character.

benchNames :: Benchmark -> [String] Source

Retrieve the names of all benchmarks. Grouped benchmarks are prefixed with the name of the group they're in.

Evaluation control

whnf :: (a -> b) -> a -> Benchmarkable Source

Apply an argument to a function, and evaluate the result to weak head normal form (WHNF).

nf :: NFData b => (a -> b) -> a -> Benchmarkable Source

Apply an argument to a function, and evaluate the result to head normal form (NF).

nfIO :: NFData a => IO a -> Benchmarkable Source

Perform an action, then evaluate its result to head normal form. This is particularly useful for forcing a lazy IO action to be completely performed.

whnfIO :: IO a -> Benchmarkable Source

Perform an action, then evaluate its result to weak head normal form (WHNF). This is useful for forcing an IO action whose result is an expression to be evaluated down to a more useful value.

Result types

data Outliers Source

Outliers from sample data, calculated using the boxplot technique.




samplesSeen :: !Int64
lowSevere :: !Int64

More than 3 times the interquartile range (IQR) below the first quartile.

lowMild :: !Int64

Between 1.5 and 3 times the IQR below the first quartile.

highMild :: !Int64

Between 1.5 and 3 times the IQR above the third quartile.

highSevere :: !Int64

More than 3 times the IQR above the third quartile.

data OutlierEffect Source

A description of the extent to which outliers in the sample data affect the sample mean and standard deviation.



Less than 1% effect.


Between 1% and 10%.


Between 10% and 50%.


Above 50% (i.e. measurements are useless).

data OutlierVariance Source

Analysis of the extent to which outliers in a sample affect its standard deviation (and to some extent, its mean).




ovEffect :: OutlierEffect

Qualitative description of effect.

ovDesc :: String

Brief textual description of effect.

ovFraction :: Double

Quantitative description of effect (a fraction between 0 and 1).

data Regression Source

Results of a linear regression.




regResponder :: String

Name of the responding variable.

regCoeffs :: Map String Estimate

Map from name to value of predictor coefficients.

regRSquare :: Estimate

R² goodness-of-fit estimate.

data KDE Source

Data for a KDE chart of performance.



data Report Source

Report of a sample analysis.




reportNumber :: Int

A simple index indicating that this is the nth report.

reportName :: String

The name of this report.

reportKeys :: [String]

See measureKeys.

reportMeasured :: Vector Measured

Raw measurements. These are not corrected for the estimated measurement overhead that can be found via the anOverhead field of reportAnalysis.

reportAnalysis :: SampleAnalysis

Report analysis.

reportOutliers :: Outliers

Analysis of outliers.

reportKDEs :: [KDE]

Data for a KDE of times.

data SampleAnalysis Source

Result of a bootstrap analysis of a non-parametric sample.




anRegress :: [Regression]

Estimates calculated via linear regression.

anOverhead :: Double

Estimated measurement overhead, in seconds. Estimation is performed via linear regression.

anMean :: Estimate

Estimated mean.

anStdDev :: Estimate

Estimated standard deviation.

anOutlierVar :: OutlierVariance

Description of the effects of outliers on the estimated variance.