Changelog for golds-gym-0.4.0.0
Changelog
[0.4.0]
Added
-
QuickCheck property tests for statistical computations in Runner module
- 28 property tests covering mathematical invariants and edge cases
- Tests for
calculateTrimmedMean,calculateMAD,calculateIQR,detectOutliers - Tests for
compareStatstolerance logic (hybrid percentage/absolute) - Tests for
checkVariancewarning generation - Custom generators for valid timing data and benchmark configurations
- New test suite:
golds-gym-properties
-
Evaluation strategy combinators for proper laziness handling
nf- Force result to normal form (deep evaluation)whnf- Force result to weak head normal form (shallow evaluation)nfIO- Normal form for IO actionswhnfIO- Weak head normal form for IO actionsnfAppIO- Normal form for functions returning IOwhnfAppIO- WHNF for functions returning IOio- Plain IO action (for backward compatibility)- Vendored evaluation loops from tasty-bench with attribution
-
BenchActiontype wrappingWord64 -> IO ()for benchmarkable actions- Enables direct benchmarking of pure functions without manual
evaluatecalls
- Enables direct benchmarking of pure functions without manual
Changed
-
BREAKING: All benchmark API functions now accept
BenchActioninstead ofIO ()benchGolden :: String -> BenchAction -> SpecbenchGoldenWith :: BenchConfig -> String -> BenchAction -> SpecbenchGoldenWithExpectation :: String -> BenchConfig -> [Expectation] -> BenchAction -> Spec- Migration: wrap existing
IO ()actions withiocombinator - New: use
nf/whnffor pure functions instead of manualevaluate
-
Output format: Baseline now appears before Actual in comparison tables
-
Error messages: Tolerance values in failure messages are now extracted from expectations rather than config defaults
Fixed
- Evaluation strategy bug where GHC could share computation across benchmark iterations
- Error message wording for performance improvements (now says "decreased by" instead of "increased by -X%")
- Flaky micro-benchmarks stabilized by increasing iteration counts (500-2000 iterations)
Dependencies
- Added
deepseq >= 1.4 && < 2forNFDataconstraint - Added
QuickCheck >= 2.14 && < 3for property tests (test suite only)
[0.3.0]
Added
-
Lens-based expectation combinators for custom performance assertions
- New
Test.Hspec.BenchGolden.Lensesmodule with van Laarhoven lenses forGoldenStatsfields - Lenses:
_statsMean,_statsMedian,_statsTrimmedMean,_statsStddev,_statsMAD,_statsIQR,_statsMin,_statsMax - Smart metric selectors:
metricForandvarianceForautomatically choose appropriate lens based onBenchConfig Expectationtype for composable performance assertionsTolerancevariants:Percent,Absolute,Hybrid,MustImprove,MustRegress- Boolean composition operators:
(&&~)for AND,(||~)for OR - Infix operators:
(@~)for percentage,(@<)for absolute,(@<<)for must-improve,(@>>)for must-regress benchGoldenWithExpectationcombinator for custom lens-based expectations- Enables assertions like "must be 10% faster", "median within 10%", "IQR < 0.1ms"
- New
-
Tolerance helper functions for manual comparison
withinPercent,withinAbsolute,withinHybridfor tolerance checkingmustImprove,mustRegressfor directional performance expectationspercentDiff,absDiffutilities
Changed
- Refactored comparison logic to use lenses internally (non-breaking)
compareStatsnow usesmetricForinstead of if/else branchingcheckVariancenow usesvarianceForfor cleaner metric selection
Dependencies
- Added
microlens >= 0.4 && < 0.6
[0.2.0] - 2026-01-30
Added
-
Hybrid tolerance mechanism to prevent false failures from measurement noise
- New
absoluteToleranceMsfield inBenchConfig(default:Just 0.01ms) - Benchmarks now pass if EITHER percentage tolerance OR absolute tolerance is satisfied
- Eliminates random failures for sub-millisecond operations where measurement noise causes large percentage variations despite negligible absolute differences
- New
-
Enhanced failure messages showing both tolerance thresholds
- Regression/improvement messages now display:
"tolerance: 15.0% or 0.010 ms"when absolute tolerance is configured
- Regression/improvement messages now display:
-
Example benchmarks demonstrating tolerance configurations
- Hybrid tolerance (default)
- Percentage-only tolerance
- Strict absolute tolerance
- Relaxed tolerance for CI environments
Changed
- Default
BenchConfignow includesabsoluteToleranceMs = Just 0.01(10 microseconds) BenchResultconstructors (Regression,Improvement) now includeMaybe Doublefor absolute tolerance- Updated "sort already sorted" example to use robust statistics to handle outliers
- Increased tolerance for percentage-only example to 30% to reduce false failures
Fixed
- Random benchmark failures for fast operations (< 1ms) due to measurement noise
- False regressions when absolute time differences are negligible but percentage variations are large
- Inconsistent test results across runs for sub-millisecond operations
[0.1.0] - 2026-01-30
Added
- Initial release of golds-gym
- Golden testing framework for performance benchmarks
- Architecture-specific golden files
- Integration with hspec and benchpress
- Configurable tolerance for mean time comparison
- Robust statistics mode (trimmed mean, MAD, outlier detection)
- Variance warnings
- Environment variables for accepting/skipping benchmarks
- Support for both standard and robust statistical methods