Copyright	(c) 2009-2014 Bryan O'Sullivan
License	BSD-style
Maintainer	bos@serpentine.com
Stability	experimental
Portability	GHC
Safe Haskell	None
Language	Haskell2010

Gauge.Benchmark

Contents

Evaluating IO Actions or Pure Functions
Benchmarkable
Benchmarks
- Constructing Benchmarks
- Benchmarks with Environment
Listing benchmarks
Running Benchmarks

Description

Constructing and running benchmarks.

Synopsis

Evaluating IO Actions or Pure Functions

To benchmark, an IO action or a pure function must be evaluated to weak head normal form (WHNF) or normal form NF. This library provides APIs to reduce IO actions or pure functions to WHNF (e.g. whnf and whnfIO) or NF (e.g. nf or nfIO).

Suppose we want to benchmark the following pure function:

firstN :: Int -> [Int]
firstN k = take k [(0::Int)..]

We construct a benchmark evaluating it to NF as follows:

nf firstN 1000

We can also evaluate a pure function to WHNF, however we must remember that it only evaluates the result up to, well, WHNF. To naive eyes it might appear that the following code ought to benchmark the production of the first 1000 list elements:

whnf firstN 1000

Since this forces the expression to only WHNF, what this would actually benchmark is merely how long it takes to produce the first list element!

Benchmarkable

A Benchmarkable is the basic type which in turn is used to construct a Benchmark. It is a container for code that can be benchmarked. The value contained inside a Benchmarkable could be an IO action or a pure function.

data Benchmarkable Source #

A pure function or impure action that can be benchmarked. The function to be benchmarked is wrapped into a function (runRepeatedly) that takes an Int64 parameter which indicates the number of times to run the given function or action. The wrapper is constructed automatically by the APIs provided in this library to construct Benchmarkable.

When perRun is not set then runRepeatedly is invoked to perform all iterations in one measurement interval. When perRun is set, runRepeatedly is always invoked with 1 iteration in one measurement interval, before a measurement allocEnv is invoked and after the measurement cleanEnv is invoked. The performance counters for each iteration are then added together for all iterations.

Constructors

NFData a => Benchmarkable
Fields allocEnv :: Int64 -> IO a cleanEnv :: Int64 -> a -> IO () runRepeatedly :: a -> Int64 -> IO () perRun :: Bool

Constructing Benchmarkable

toBenchmarkable :: (Int64 -> IO ()) -> Benchmarkable Source #

This is a low level function to construct a Benchmarkable value from an impure wrapper action, where the Int64 parameter dictates the number of times the action wrapped inside would run. You would normally be using the other higher level APIs rather than this function to construct a benchmarkable.

Benchmarking IO actions

Benchmarking an IO action is straightforward compared to benchmarking pure code. An IO action resembling type IO a can be turned into a Benchmarkable using nfIO to reduce it to normal form or using whnfIO to reduce it to WHNF.

nfIO :: NFData a => IO a -> Benchmarkable Source #

Perform an action, then evaluate its result to normal form. This is particularly useful for forcing a lazy IO action to be completely performed.

whnfIO :: IO a -> Benchmarkable Source #

Perform an action, then evaluate its result to weak head normal form (WHNF). This is useful for forcing an IO action whose result is an expression to be evaluated down to a more useful value.

Benchmarking pure code

Benchmarking pure functions is a bit tricky. Because GHC optimises aggressively when compiling with -O, it is potentially easy to write innocent-looking benchmark code that will only be evaluated once, for which all but the first iteration of the timing loop will be timing the cost of doing nothing.

To work around this, the benchmark applies the function to its argument and evaluates the application. Unlike an IO action we need both the function and its argument. Therefore, the types of APIs to benchmark a pure function look like this:

nf :: NFData b => (a -> b) -> a -> Benchmarkable
whnf :: (a -> b) -> a -> Benchmarkable

As both of these types suggest, when you want to benchmark a function, you must supply two values:

The first element is the function, saturated with all but its last argument.
The second element is the last argument to the function.

nf :: NFData b => (a -> b) -> a -> Benchmarkable Source #

Apply an argument to a function, and evaluate the result to normal form (NF).

whnf :: (a -> b) -> a -> Benchmarkable Source #

Apply an argument to a function, and evaluate the result to weak head normal form (WHNF).

Benchmarking with Environment

perBatchEnv Source #

Arguments

:: (NFData env, NFData b)
=> (Int64 -> IO env)	Create an environment for a batch of N runs. The environment will be evaluated to normal form before running.
-> (env -> IO b)	Function returning the IO action that should be benchmarked with the newly generated environment.
-> Benchmarkable

Create a Benchmarkable where a fresh environment is allocated for every batch of runs of the benchmarkable.

The environment is evaluated to normal form before the benchmark is run.

When using whnf, whnfIO, etc. Gauge creates a Benchmarkable whichs runs a batch of N repeat runs of that expressions. Gauge may run any number of these batches to get accurate measurements. Environments created by env and envWithCleanup, are shared across all these batches of runs.

This is fine for simple benchmarks on static input, but when benchmarking IO operations where these operations can modify (and especially grow) the environment this means that later batches might have their accuracy effected due to longer, for example, longer garbage collection pauses.

An example: Suppose we want to benchmark writing to a Chan, if we allocate the Chan using environment and our benchmark consists of writeChan env (), the contents and thus size of the Chan will grow with every repeat. If Gauge runs a 1,000 batches of 1,000 repeats, the result is that the channel will have 999,000 items in it by the time the last batch is run. Since GHC GC has to copy the live set for every major GC this means our last set of writes will suffer a lot of noise of the previous repeats.

By allocating a fresh environment for every batch of runs this function should eliminate this effect.

perBatchEnvWithCleanup Source #

Arguments

:: (NFData env, NFData b)
=> (Int64 -> IO env)	Create an environment for a batch of N runs. The environment will be evaluated to normal form before running.
-> (Int64 -> env -> IO ())	Clean up the created environment.
-> (env -> IO b)	Function returning the IO action that should be benchmarked with the newly generated environment.
-> Benchmarkable

Same as perBatchEnv, but but allows for an additional callback to clean up the environment. Resource clean up is exception safe, that is, it runs even if the Benchmark throws an exception.

perRunEnv Source #

Arguments

:: (NFData env, NFData b)
=> IO env	Action that creates the environment for a single run.
-> (env -> IO b)	Function returning the IO action that should be benchmarked with the newly genereted environment.
-> Benchmarkable

Create a Benchmarkable where a fresh environment is allocated for every run of the operation to benchmark. This is useful for benchmarking mutable operations that need a fresh environment, such as sorting a mutable Vector.

As with env and perBatchEnv the environment is evaluated to normal form before the benchmark is run.

This introduces extra noise and result in reduce accuracy compared to other Gauge benchmarks. But allows easier benchmarking for mutable operations than was previously possible.

perRunEnvWithCleanup Source #

Arguments

:: (NFData env, NFData b)
=> IO env	Action that creates the environment for a single run.
-> (env -> IO ())	Clean up the created environment.
-> (env -> IO b)	Function returning the IO action that should be benchmarked with the newly genereted environment.
-> Benchmarkable

Same as perRunEnv, but but allows for an additional callback to clean up the environment. Resource clean up is exception safe, that is, it runs even if the Benchmark throws an exception.

Benchmarks

data Benchmark where Source #

Specification of a collection of benchmarks and environments. A benchmark may consist of:

An environment that creates input data for benchmarks, created with env.
A single Benchmarkable item with a name, created with bench.
A (possibly nested) group of Benchmarks, created with bgroup.

Constructors

Environment :: NFData env => IO env -> (env -> IO a) -> (env -> Benchmark) -> Benchmark
Benchmark :: String -> Benchmarkable -> Benchmark
BenchGroup :: String -> [Benchmark] -> Benchmark

Instances

Show Benchmark Source #
Methods showsPrec :: Int -> Benchmark -> ShowS # show :: Benchmark -> String # showList :: [Benchmark] -> ShowS #

Constructing Benchmarks

bench Source #

Arguments

:: String	A name to identify the benchmark.
-> Benchmarkable	An activity to be benchmarked.
-> Benchmark

Create a single benchmark.

bgroup Source #

Arguments

:: String	A name to identify the group of benchmarks.
-> [Benchmark]	Benchmarks to group under this name.
-> Benchmark

Group several benchmarks together under a common name.

Benchmarks with Environment

env Source #

Arguments

:: NFData env
=> IO env	Create the environment. The environment will be evaluated to normal form before being passed to the benchmark.
-> (env -> Benchmark)	Take the newly created environment and make it available to the given benchmarks.
-> Benchmark

Run a benchmark (or collection of benchmarks) in the given environment. The purpose of an environment is to lazily create input data to pass to the functions that will be benchmarked.

A common example of environment data is input that is read from a file. Another is a large data structure constructed in-place.

By deferring the creation of an environment when its associated benchmarks need the its, we avoid two problems that this strategy caused:

Memory pressure distorted the results of unrelated benchmarks. If one benchmark needed e.g. a gigabyte-sized input, it would force the garbage collector to do extra work when running some other benchmark that had no use for that input. Since the data created by an environment is only available when it is in scope, it should be garbage collected before other benchmarks are run.
The time cost of generating all needed inputs could be significant in cases where no inputs (or just a few) were really needed. This occurred often, for instance when just one out of a large suite of benchmarks was run, or when a user would list the collection of benchmarks without running any.

Creation. An environment is created right before its related benchmarks are run. The IO action that creates the environment is run, then the newly created environment is evaluated to normal form (hence the NFData constraint) before being passed to the function that receives the environment.

Complex environments. If you need to create an environment that contains multiple values, simply pack the values into a tuple.

Lazy pattern matching. In situations where a "real" environment is not needed, e.g. if a list of benchmark names is being generated, undefined will be passed to the function that receives the environment. This avoids the overhead of generating an environment that will not actually be used.

The function that receives the environment must use lazy pattern matching to deconstruct the tuple, as use of strict pattern matching will cause a crash if undefined is passed in.

Example. This program runs benchmarks in an environment that contains two values. The first value is the contents of a text file; the second is a string. Pay attention to the use of a lazy pattern to deconstruct the tuple in the function that returns the benchmarks to be run.

setupEnv = do
  let small = replicate 1000 (1 :: Int)
  big <- map length . words <$> readFile "/usr/dict/words"
  return (small, big)

main = defaultMain [
   -- notice the lazy pattern match here!
   env setupEnv $ \ ~(small,big) -> bgroup "main" [
   bgroup "small" [
     bench "length" $ whnf length small
   , bench "length . filter" $ whnf (length . filter (==1)) small
   ]
 ,  bgroup "big" [
     bench "length" $ whnf length big
   , bench "length . filter" $ whnf (length . filter (==1)) big
   ]
 ] ]

Discussion. The environment created in the example above is intentionally not ideal. As Haskell's scoping rules suggest, the variable big is in scope for the benchmarks that use only small. It would be better to create a separate environment for big, so that it will not be kept alive while the unrelated benchmarks are being run.

envWithCleanup Source #

Arguments

:: NFData env
=> IO env	Create the environment. The environment will be evaluated to normal form before being passed to the benchmark.
-> (env -> IO a)	Clean up the created environment.
-> (env -> Benchmark)	Take the newly created environment and make it available to the given benchmarks.
-> Benchmark

Same as env, but but allows for an additional callback to clean up the environment. Resource clean up is exception safe, that is, it runs even if the Benchmark throws an exception.

Listing benchmarks

benchNames :: Benchmark -> [String] Source #

Retrieve the names of all benchmarks. Grouped benchmarks are prefixed with the name of the group they're in.

Running Benchmarks

runBenchmark Source #

Arguments

:: (String -> Bool)	Select benchmarks by name.
-> Benchmark
-> BenchmarkAnalysis	Analysis function
-> Gauge ()

Run benchmarkables, selected by a given selector function, under a given benchmark and analyse the output using the given analysis function.

data BenchmarkAnalysis Source #

The function to run after measurement

Constructors

BenchmarkNormal (String -> Vector Measured -> Gauge a)
BenchmarkIters Int64