Copyright | (c) 2018 Composewell Technologies |
---|---|
License | BSD3 |
Maintainer | harendra.kumar@gmail.com |
Safe Haskell | None |
Language | Haskell2010 |
BenchShow generates text reports and graphs from benchmarking results. It allows you to manipulate the format of the report and the benchmarking data to present it in many useful ways. BenchShow uses robust statistical analysis using three different statistical estimators to provide as stable run-to-run comparison of benchmark results as possible. For stable results, make sure that you are not executing any other tasks on the benchmark host while benhmarking is going on. For even more stable results, we recommend using a desktop or server machine instead of a laptop notebook for benchmarking.
Synopsis
Generating benchmark results
To generate benchmark results use gauge
or criterion
benchmarking
libraries, define some benchmarks and run it with --csv=results.csv
.
The resulting results.csv
may look like the following, for simplicity we
have removed some of the fields::
Name,time,maxrss vector/fold,6.41620933137583e-4,2879488 streamly/fold,6.399582632376517e-4,2879488 vector/map,6.388913781259641e-4,2854912 streamly/map,6.533649051066093e-4,2793472 vector/zip,6.514202653014291e-4,2707456 streamly/zip,6.443344209329669e-4,2711552
If you run the benchmarks again (maybe after a change) the new results are appended to the file. BenchShow can compare two or more result sets and compare the results in different ways. We will use the above data for the examples below, you can copy it and paste it in a file and use that as input to a BenchShow application.
gauge
supports generating a raw csv file using --csvraw
option. The raw
csv file has results for many more benchmarking fields other than time e.g.
maxrss
or allocated
and many more.
Reports and Charts
The most common usecase is to see the time and peak memory usage of a
program for each benchmark. The report
API with Fields
presentation
style generates a multi-column report for a quick overview of all
benchmarks. Units of the fields are automatically determined based on the
range of values:
report
"results.csv" NothingdefaultConfig
{presentation
=Fields
}
(default)(Median) Benchmark time(μs) maxrss(MiB) ------------- -------- ----------- vector/fold 641.62 2.75 streamly/fold 639.96 2.75 vector/map 638.89 2.72 streamly/map 653.36 2.66 vector/zip 651.42 2.58 streamly/zip 644.33 2.59
We can generate equivalent visual report using graph
, it generates one bar
chart for each column:
graph
"results.csv" "output"defaultConfig
By default all the benchmarks are placed in a single benchmark group named
default
.
Grouping
Let's write a benchmark classifier to put the streamly
and vector
benchmarks in their own groups:
classifier name = case splitOn "/" name of grp : bench -> Just (grp, concat bench) _ -> Nothing
Now we can show the two benchmark groups as columns each showing the time
field for that group. We can generate separate reports comparing different
benchmark fields (e.g. time
and maxrss
) for all the groups::
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier }
(time)(Median) Benchmark streamly(μs) vector(μs) --------- ------------ ---------- fold 639.96 641.62 map 653.36 638.89 zip 644.33 651.42
We can do the same graphically as well, just replace report
with graph
in the code above. Each group is placed as a cluster on the graph. Multiple
clusters are placed side by side on the same scale for easy
comparison. For example:
Difference
We can make the first group as baseline and report the subsequent groups as a difference from the baseline:
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
Diff
}
(time)(Median)(Diff using min estimator) Benchmark streamly(μs)(base) vector(μs)(-base) --------- ------------------ ----------------- fold 639.96 +1.66 map 653.36 -14.47 zip 644.33 +7.09
In a chart, the second cluster plots the difference streamly - vector
.
Percentage Difference
Absolute difference does not give us a good idea about how good or bad the comparison is. We can report precentage difference instead:
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
PercentDiff
}
(time)(Median)(Diff using min estimator) Benchmark streamly(μs)(base) vector(%)(-base) --------- ------------------ ---------------- fold 639.96 +0.26 map 653.36 -2.22 zip 644.33 +1.10
Graphically:
Statistical Estimators
When multiple samples are available for each benchmark we report the
Median
by default. However, other estimators like Mean
and Regression
(a value arrived at by linear regression) can be used:
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
PercentDiff
,estimator
=Regression
}
(time)(Regression Coeff.)(Diff using min estimator) Benchmark streamly(μs)(base) vector(%)(-base) --------- ------------------ ---------------- fold 639.96 +0.26 map 653.36 -2.22 zip 644.33 +1.10
Graphically:
Difference Strategy
A DiffStrategy
controls how the difference between two groups being
compared is arrived at. By default we use the MinEstimators
strategy which
computes the difference using all the available estimators and takes the
minimum of all. We can use a SingleEstimator
strategy instead if we so
desire, it uses the estimator configured for the report using the
estimator
field of the configuration.
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
PercentDiff
,estimator
=Regression
,diffStrategy
=SingleEstimator
}
(time)(Regression Coeff.)(Diff ) Benchmark streamly(μs)(base) vector(%)(-base) --------- ------------------ ---------------- fold 639.96 +0.26 map 653.36 -2.22 zip 644.33 +1.10
Graphically:
Sorting
Percentage difference does not immediately tell us the worst affected benchmarks. We can sort the results by the difference:
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
PercentDiff
,selectBenchmarks
= f -> reverse $ map fst $ sortBy (comparing snd) $ either error id $ f (ColumnIndex
1) Nothing }
(time)(Median)(Diff using min estimator) Benchmark streamly(μs)(base) vector(%)(-base) --------- ------------------ ---------------- zip 644.33 +1.10 fold 639.96 +0.26 map 653.36 -2.22
This tells us that zip is the relatively worst benchmark for vector compared to streamly, as it takes 1.10% more time, whereas map is the best taking 2.22% less time..
Graphically:
Regression
We can append benchmarks results from multiple runs to the same file. These runs can then be compared. We can run benchmarks before and after a change and then report the regressions by percentage change in a sorted order:
Given the following results file with two runs appended:
Name,time streamly/fold,1.755309435106302e-2 streamly/zip,2.960114434592148e-2 streamly/map,2.4673020708256527e-2 Name,time streamly/fold,8.970816964261911e-3 streamly/zip,8.439519884529081e-3 streamly/map,6.972814233286865e-3
This code generates the report that follows:
report
"results.csv" NothingdefaultConfig
{classifyBenchmark
= classifier ,presentation
=Groups
PercentDiff
,selectBenchmarks
= f -> reverse $ map fst $ sortBy (comparing snd) $ either error id $ f (ColumnIndex
1) Nothing }
(time)(Median)(Diff using min estimator) Benchmark streamly(0)(μs)(base) streamly(1)(%)(-base) --------- --------------------- --------------------- zip 644.33 +23.28 map 653.36 +7.65 fold 639.96 -15.63
It tells us that in the second run the worst affected benchmark is zip taking 23.28 percent more time comapred to the baseline.
Graphically: