úÎ]uZn5      !"#$%&'()*+,-./01234None,9@^an unwrapped Word64•tick_ measures the number of cycles it takes to read the rdtsc chip twice: the difference is then how long it took to read the clock the second time..Below are indicative measurements using tick_:onetick <- tick_ticks' <- replicateM 10 tick_%manyticks <- replicateM 1000000 tick_8let average = L.fold ((/) <$> L.sum <*> L.genericLength)2let avticks = average (fromIntegral <$> manyticks)!let qticks = deciles 10 manyticks(let tick999 = percentile 0.999 manyticks ÿ one tick_: 78 cycles next 10: [20,18,20,20,20,20,18,16,20,20] average over 1m: 20.08 cycles 99.999% perc: 7,986 99.9% perc: 50.97 99th perc: 24.99 40th perc: 18.37 [min, 10th, 20th, .. 90th, max]: 12.00 16.60 17.39 17.88 18.37 18.86 19.46 20.11 20.75 23.04 5.447e5ÿ The distribution of tick_ measurements is highly skewed, with the maximum being around 50k cycles, which is of the order of a GC. The important point on the distribution is around the 30th to 50th percentile, where you get a clean measure, usually free of GC activity and cache miss-fires¼Warm up the register, to avoid a high first measurement. Without a warmup, one or more larger values can occur at the start of a measurement spree, and often are in the zone of an L2 miss.,t <- tick_ -- first measure can be very high_ <- warmup 100/t <- tick_ -- should be around 20 (3k for ghci)>`tick f a` strictly applies a to f, and returns a (Cycle, f a)_ <- warmup 100(cs, _) <- tick f a «one tick: 197012 cycles average over 1000: 10222.79 cycles -- 10 cycles per operation [min, 30th, median, 90th, 99th, max]: 1.002e4 1.011e4 1.013e4 1.044e4 1.051e4 2.623e4 evaluates and measures an `IO a`(cs, _) <- tickIO (pure (f a))needs more testingn measurements of a tick3returns a list of Cycles and the last evaluated f aËGHC is very good as memoization, and any of the functions that measuring a computation multiple times are fraught. When a computation actually gets memoized is an inexact science. Current readings are: ÿësum to 1000.0 Perf.ticks n f a 8.37e3 cycles Main.ticks n f a 8.38e3 cycles Perf.ticksIO n (pure $ f a) 8.38e3 cycles Perf.qtick n f a 8.38e3 cycles Main.qtick n f a 8.38e3 cycles replicateM n (tick f a) 8.37e3 cycles replicateM' n (tick f a) 9.74e3 cycles replicateM n (tickIO (pure (f a))) 1.21e4 cycles replicateM n (tick (app (f a)) ()) 9.72e3 cycles replicateM n (tick identity (f n)) 18.2 cycles replicateM n (tick (const (f a)) ()) 9.71e3 cycles (replicateM n . tick f) <$> [1,10,100,1000,10000]: 16.3 16.2 16.3 16.2 16.2 Perf.tickns n f [1,10,100,1000,10000]: 16.2 16.2 16.2 16.2 16.2 let n = 1000(cs, fa) <- ticks n f aBreturns the 40th percentile measurement and the last evaluated f a(c, fa) <- qtick n f an measuremenst of a tickIO>returns an IO tuple; list of Cycles and the last evaluated f a"(cs, fa) <- ticksIO n (pure $ f a) ;n measurements on each of a list of a's to be applied to f. Currently memoizing it's ass off tickns n f [1,10,100,1000] /extra oomph for those hard to reach evaluations !a replicateM with good attributes average of a Cycle foldable cAv <- average <$> ticks n f a compute deciles c5 <- decile 5 <$> ticks n f acompute a percentile $c <- percentoile 0.4 <$> ticks n f a    None9@AT^A Measure consists of a monadic effect prior to measuring, a monadic effect to finalise the measurement, and the value measured•For example, the measure specified below will return 1 every time measurement is requested, thus forming the base of a simple counter for loopy code.(let count = Measure 0 (pure ()) (pure 1)Measure a single effect."r <- runMeasure count (pure "joy")r (1,"joy")/Measure once, but run an effect multiple times.)r <- runMeasureN 1000 count (pure "joys")r (1,"joys")9cost of a measurement in terms of the Measure's own unitsr <- cost countr1a measure using 5* from System.CPUTime (unit is picoseconds)7r <- runMeasure cputime (pure $ foldl' (+) 0 [0..1000]) (34000000,500500) a measure using 6 (unit is 7 which prints as seconds)8r <- runMeasure realtime (pure $ foldl' (+) 0 [0..1000]) (0.000046s,500500)!"a measure used to count iterationsr <- runMeasure count (pure ())r(1,())"a Measure using the 8 chip set (units are in cycles)&r <- runMeasureN 1000 cycles (pure ()) :(120540,()) -- ghci-level (18673,()) -- compiled with -O2 !"#$%&'()  !"  !" !"#$%&'()None9@I^*(The obligatory transformer over Identity+„PerfT is polymorphic in the type of measurement being performed. The monad stores and produces a Map of labelled measurement values,ALift a monadic computation to a PerfT m, providing a label and a .-VLift a monadic computation to a PerfT m, and carry out the computation multiple times..;Consume the PerfT layer and return a (result, measurement).:set -XOverloadedStringsM(cs, result) <- runPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) $(50005000,fromList [("sum",562028)])/…Consume the PerfT layer and return the original monadic result. Fingers crossed, PerfT structure should be completely compiled away.Hresult <- evalPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) 5000500001Consume a PerfT layer and return the measurement.Dcs <- execPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) fromList [("sum",562028)] *+9:,-./01"  !"*+,-./0+*,-./0*+9:,-./01;      !"#$%&'()*+,-./0123456789:;<:=>?@A-BC!perf-0.2.0-CQpPgK3uumBFOj102nrY3f Perf.Cycle Perf.MeasurePerfCycletick_warmupticktickIOappticksqtickticksIOticknsforce replicateM'averagedeciles percentile$fToIntegerWord64$fAdditiveGroupWord64$fAdditiveInvertibleWord64$fAdditiveWord64$fAdditiveCommutativeWord64$fAdditiveAssociativeWord64$fAdditiveUnitalWord64$fAdditiveMagmaWord64Measuremeasurepresteppoststep runMeasure runMeasureNcostcputimerealtimecountcycles$fAdditiveGroupNominalDiffTime#$fAdditiveInvertibleNominalDiffTime$fAdditiveNominalDiffTime$$fAdditiveCommutativeNominalDiffTime$$fAdditiveAssociativeNominalDiffTime$fAdditiveUnitalNominalDiffTime$fAdditiveMagmaNominalDiffTimePerfTperfperfNrunPerfT evalPerfT execPerfT$fMonadIOPerfT$fFunctorPerfT$fApplicativePerfT $fMonadPerfTbaseSystem.CPUTime getCPUTime time-1.6.0.1Data.Time.Clock.POSIXgetCurrentTimeData.Time.Clock.UTCNominalDiffTime#rdtsc-1.3.0.1-8z2cSjcJhVHgiER8Wf0e3System.CPUTime.RdtscrdtscrunPerf_