bench-show-0.3.1: Show, plot and compare benchmark results

Copyright(c) 2018 Composewell Technologies
LicenseBSD3
Maintainerharendra.kumar@gmail.com
Safe HaskellNone
LanguageHaskell2010

BenchShow.Tutorial

Contents

Description

BenchShow generates text reports and graphs from benchmarking results. It allows you to manipulate the format of the report and the benchmarking data to present it in many useful ways. BenchShow uses robust statistical analysis using three different statistical estimators to provide as stable run-to-run comparison of benchmark results as possible. For stable results, make sure that you are not executing any other tasks on the benchmark host while benhmarking is going on. For even more stable results, we recommend using a desktop or server machine instead of a laptop notebook for benchmarking.

Synopsis

    Generating benchmark results

    To generate benchmark results use gauge or criterion benchmarking libraries, define some benchmarks and run it with --csv=results.csv.

    The resulting results.csv may look like the following, for simplicity we have removed some of the fields::

    Name,time,maxrss
    vector/fold,6.41620933137583e-4,2879488
    streamly/fold,6.399582632376517e-4,2879488
    vector/map,6.388913781259641e-4,2854912
    streamly/map,6.533649051066093e-4,2793472
    vector/zip,6.514202653014291e-4,2707456
    streamly/zip,6.443344209329669e-4,2711552
    

    If you run the benchmarks again (maybe after a change) the new results are appended to the file. BenchShow can compare two or more result sets and compare the results in different ways. We will use the above data for the examples below, you can copy it and paste it in a file and use that as input to a BenchShow application.

    gauge supports generating a raw csv file using --csvraw option. The raw csv file has results for many more benchmarking fields other than time e.g. maxrss or allocated and many more.

    Reports and Charts

    The most common usecase is to see the time and peak memory usage of a program for each benchmark. The report API with Fields presentation style generates a multi-column report for a quick overview of all benchmarks. Units of the fields are automatically determined based on the range of values:

    report "results.csv" Nothing defaultConfig { presentation = Fields }
    
    (default)(Median)
    Benchmark     time(μs) maxrss(MiB)
    ------------- -------- -----------
    vector/fold     641.62        2.75
    streamly/fold   639.96        2.75
    vector/map      638.89        2.72
    streamly/map    653.36        2.66
    vector/zip      651.42        2.58
    streamly/zip    644.33        2.59
    

    We can generate equivalent visual report using graph, it generates one bar chart for each column:

    graph "results.csv" "output" defaultConfig
    

    By default all the benchmarks are placed in a single benchmark group named default.

    Grouping

    Let's write a benchmark classifier to put the streamly and vector benchmarks in their own groups:

       classifier name =
           case splitOn "/" name of
               grp : bench -> Just (grp, concat bench)
               _          -> Nothing
    

    Now we can show the two benchmark groups as columns each showing the time field for that group. We can generate separate reports comparing different benchmark fields (e.g. time and maxrss) for all the groups::

       report "results.csv" Nothing
         defaultConfig { classifyBenchmark = classifier }
    
    (time)(Median)
    Benchmark streamly(μs) vector(μs)
    --------- ------------ ----------
    fold            639.96     641.62
    map             653.36     638.89
    zip             644.33     651.42
    

    We can do the same graphically as well, just replace report with graph in the code above. Each group is placed as a cluster on the graph. Multiple clusters are placed side by side on the same scale for easy comparison. For example:

    Difference

    We can make the first group as baseline and report the subsequent groups as a difference from the baseline:

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups Diff
             }
    
    (time)(Median)(Diff using min estimator)
    Benchmark streamly(μs)(base) vector(μs)(-base)
    --------- ------------------ -----------------
    fold                  639.96             +1.66
    map                   653.36            -14.47
    zip                   644.33             +7.09
    

    In a chart, the second cluster plots the difference streamly - vector.

    Percentage Difference

    Absolute difference does not give us a good idea about how good or bad the comparison is. We can report precentage difference instead:

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups PercentDiff
             }
    
    (time)(Median)(Diff using min estimator)
    Benchmark streamly(μs)(base) vector(%)(-base)
    --------- ------------------ ----------------
    fold                  639.96            +0.26
    map                   653.36            -2.22
    zip                   644.33            +1.10
    

    Graphically:

    Statistical Estimators

    When multiple samples are available for each benchmark we report the Median by default. However, other estimators like Mean and Regression (a value arrived at by linear regression) can be used:

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups PercentDiff
             , estimator = Regression
             }
    
    (time)(Regression Coeff.)(Diff using min estimator)
    Benchmark streamly(μs)(base) vector(%)(-base)
    --------- ------------------ ----------------
    fold                  639.96            +0.26
    map                   653.36            -2.22
    zip                   644.33            +1.10
    

    Graphically:

    Difference Strategy

    A DiffStrategy controls how the difference between two groups being compared is arrived at. By default we use the MinEstimators strategy which computes the difference using all the available estimators and takes the minimum of all. We can use a SingleEstimator strategy instead if we so desire, it uses the estimatorr configured for the report using the estimator field of the configuration..

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups PercentDiff
             , estimator = Regression
             , diffStrategy = SingleEstimator
             }
    
    (time)(Regression Coeff.)(Diff )
    Benchmark streamly(μs)(base) vector(%)(-base)
    --------- ------------------ ----------------
    fold                  639.96            +0.26
    map                   653.36            -2.22
    zip                   644.33            +1.10
    

    Graphically:

    Sorting

    Percentage difference does not immediately tell us the worst affected benchmarks. We can sort the results by the difference:

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups PercentDiff
             , selectBenchmarks = f ->
                        reverse
                      $ map fst
                      $ sortBy (comparing snd)
                      $ either error id $ f (ColumnIndex 1) Nothing
             }
    
    (time)(Median)(Diff using min estimator)
    Benchmark streamly(μs)(base) vector(%)(-base)
    --------- ------------------ ----------------
    zip                   644.33            +1.10
    fold                  639.96            +0.26
    map                   653.36            -2.22
    

    This tells us that zip is the relatively worst benchmark for vector compared to streamly, as it takes 1.10% more time, whereas map is the best taking 2.22% less time..

    Graphically:

    Regression

    We can append benchmarks results from multiple runs to the same file. These runs can then be compared. We can run benchmarks before and after a change and then report the regressions by percentage change in a sorted order:

    Given the following results file with two runs appended:

    Name,time
    streamly/fold,1.755309435106302e-2
    streamly/zip,2.960114434592148e-2
    streamly/map,2.4673020708256527e-2
    Name,time
    streamly/fold,8.970816964261911e-3
    streamly/zip,8.439519884529081e-3
    streamly/map,6.972814233286865e-3
    

    This code generates the report that follows:

       report "results.csv" Nothing
         defaultConfig
             { classifyBenchmark = classifier
             , presentation = Groups PercentDiff
             , selectBenchmarks = f ->
                        reverse
                      $ map fst
                      $ sortBy (comparing snd)
                      $ either error id $ f (ColumnIndex 1) Nothing
             }
    
    (time)(Median)(Diff using min estimator)
    Benchmark streamly(0)(μs)(base) streamly(1)(%)(-base)
    --------- --------------------- ---------------------
    zip                      644.33                +23.28
    map                      653.36                 +7.65
    fold                     639.96                -15.63
    

    It tells us that in the second run the worst affected benchmark is zip taking 23.28 percent more time comapred to the baseline.

    Graphically: