úÎi®fY7      !"#$%&'()*+,-./0123456None,9@^an unwrapped Word64•tick_ measures the number of cycles it takes to read the rdtsc chip twice: the difference is then how long it took to read the clock the second time..Below are indicative measurements using tick_:onetick <- tick_ticks' <- replicateM 10 tick_%manyticks <- replicateM 1000000 tick_8let average = L.fold ((/) <$> L.sum <*> L.genericLength)2let avticks = average (fromIntegral <$> manyticks)!let qticks = deciles 10 manyticks(let tick999 = percentile 0.999 manyticks ÿ one tick_: 78 cycles next 10: [20,18,20,20,20,20,18,16,20,20] average over 1m: 20.08 cycles 99.999% perc: 7,986 99.9% perc: 50.97 99th perc: 24.99 40th perc: 18.37 [min, 10th, 20th, .. 90th, max]: 12.00 16.60 17.39 17.88 18.37 18.86 19.46 20.11 20.75 23.04 5.447e5ÿ The distribution of tick_ measurements is highly skewed, with the maximum being around 50k cycles, which is of the order of a GC. The important point on the distribution is around the 30th to 50th percentile, where you get a clean measure, usually free of GC activity and cache miss-fires¼Warm up the register, to avoid a high first measurement. Without a warmup, one or more larger values can occur at the start of a measurement spree, and often are in the zone of an L2 miss.,t <- tick_ -- first measure can be very high_ <- warmup 100/t <- tick_ -- should be around 20 (3k for ghci)ptick where the arguments are lazy, so measurement may include evluation of thunks that may constitute f and/or aZ`tick f a` strictly evaluates f and a, then deeply evaluates f a, returning a (Cycle, f a)_ <- warmup 100(cs, _) <- tick f a @sum to 1000 first measure: 1202 cycles second measure: 18 cyclesÿ!Note that feeding the same computation through tick twice will tend to kick off sharing (aka memoization aka let floating). Given the importance of sharing to GHC optimisations this is the intended behaviour. If you want to turn this off then see -fn--full-laziness (and maybe -fno-cse).'measures and deeply evaluates an `IO a`(cs, _) <- tickIO (pure (f a))n measurements of a tick3returns a list of Cycles and the last evaluated f aÿfGHC is very good at finding ways to share computation, and anything measuring a computation multiple times is a prime candidate for aggresive ghc treatment. Internally, ticks uses a noinline pragma and a noinline on tick to help reduce the chances of memoization, but this is an inexact science in the hands of he author, at least, so interpret with caution. let n = 1000(cs, fa) <- ticks n f aÏBaseline speed can be highly senistive to the nature of the function trimmings. Polymorphic functions can tend to be slightly slower, and functions with lambda expressions can experience dramatic slowdowns. ¾fMono :: Int -> Int fMono x = foldl' (+) 0 [1 .. x] fPoly :: (Enum b, Num b, Additive b) => b -> b fPoly x = foldl' (+) 0 [1 .. x] fLambda :: Int -> Int fLambda = \x -> foldl' (+) 0 [1 .. x] ÿXsum to 1000 n = 1000 prime run: 1.13e3 run first 2nd 3rd 4th 5th 40th % ticks 1.06e3 712 702 704 676 682 cycles ticks (lambda) 1.19e3 718 682 684 678 682 cycles ticks (poly) 1.64e3 1.34e3 1.32e3 1.32e3 1.32e3 1.31e3 cyclesn measuremenst of a tickIO>returns an IO tuple; list of Cycles and the last evaluated f a"(cs, fa) <- ticksIO n (pure $ f a) éticksIO 834 752 688 714 690 709 cycles ticksIO (lambda) 822 690 720 686 688 683 cycles ticksIO (poly) 1.01e3 688 684 682 712 686 cyclesWmake a series of measurements on a list of a's to be applied to f, for a tick function.UTends to be fragile to sharing issues, but very useful to determine computation Order ns ticks n f [1,10,100,1000] ;sum to's [1,10,100,1000] tickns n fMono: 17.8 23.5 100 678 average of a Cycle foldable cAv <- average <$> ticks n f a compute deciles c5 <- decile 5 <$> ticks n f a compute a percentile #c <- percentile 0.4 <$> ticks n f a  WHNF version  WHNF version WHNF version WHNF version WHNF version78 9 :   78 9 :None9@AT^A Measure consists of a monadic effect prior to measuring, a monadic effect to finalise the measurement, and the value measured•For example, the measure specified below will return 1 every time measurement is requested, thus forming the base of a simple counter for loopy code.(let count = Measure 0 (pure ()) (pure 1)Measure a single effect."r <- runMeasure count (pure "joy")r (1,"joy")/Measure once, but run an effect multiple times.)r <- runMeasureN 1000 count (pure "joys")r (1,"joys") 9cost of a measurement in terms of the Measure's own unitsr <- cost countr1!a measure using ;* from System.CPUTime (unit is picoseconds)7r <- runMeasure cputime (pure $ foldl' (+) 0 [0..1000]) (34000000,500500)"a measure using < (unit is = which prints as seconds)8r <- runMeasure realtime (pure $ foldl' (+) 0 [0..1000]) (0.000046s,500500)#"a measure used to count iterationsr <- runMeasure count (pure ())r(1,())$a Measure using the > chip set (units are in cycles)&r <- runMeasureN 1000 cycles (pure ()) :(120540,()) -- ghci-level (18673,()) -- compiled with -O2 !"#$%&'()*+  !"#$  !"#$ !"#$%&'()*+None9@I^,(The obligatory transformer over Identity-„PerfT is polymorphic in the type of measurement being performed. The monad stores and produces a Map of labelled measurement values.ALift a monadic computation to a PerfT m, providing a label and a ./VLift a monadic computation to a PerfT m, and carry out the computation multiple times.0;Consume the PerfT layer and return a (result, measurement).:set -XOverloadedStringsM(cs, result) <- runPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) $(50005000,fromList [("sum",562028)])1…Consume the PerfT layer and return the original monadic result. Fingers crossed, PerfT structure should be completely compiled away.Hresult <- evalPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) 5000500021Consume a PerfT layer and return the measurement.Dcs <- execPerfT $ perf "sum" cycles (pure $ foldl' (+) 0 [0..10000]) fromList [("sum",562028)] ,-?@./0123$  !"#$,-./012-,./012,-?@./0123A      !"#$%&'()*+,-./0123456789:;<=>?@AB@CDEFG/HI!perf-0.3.0-GunfzJ3cbY0LjGwhlyvBO2 Perf.Cycle Perf.MeasurePerfCycletick_warmuptick'ticktickIOticksticksIOnsaveragedeciles percentiletickWHNF tickWHNF' tickWHNFIO ticksWHNF ticksWHNFIO$fToIntegerWord64$fAdditiveGroupWord64$fAdditiveInvertibleWord64$fAdditiveWord64$fAdditiveCommutativeWord64$fAdditiveAssociativeWord64$fAdditiveUnitalWord64$fAdditiveMagmaWord64Measuremeasurepresteppoststep runMeasure runMeasureNcostcputimerealtimecountcycles$fAdditiveGroupNominalDiffTime#$fAdditiveInvertibleNominalDiffTime$fAdditiveNominalDiffTime$$fAdditiveCommutativeNominalDiffTime$$fAdditiveAssociativeNominalDiffTime$fAdditiveUnitalNominalDiffTime$fAdditiveMagmaNominalDiffTimePerfTperfperfNrunPerfT evalPerfT execPerfT$fMonadIOPerfT$fFunctorPerfT$fApplicativePerfT $fMonadPerfT tickNoinlinetickIONoinlinetickWHNFNoinlinetickWHNFIONoinlinebaseSystem.CPUTime getCPUTime time-1.6.0.1Data.Time.Clock.POSIXgetCurrentTimeData.Time.Clock.UTCNominalDiffTime#rdtsc-1.3.0.1-8z2cSjcJhVHgiER8Wf0e3System.CPUTime.RdtscrdtscrunPerf_