This module defines the commonly used data structures and basic types of the heap profiling framework.
Profiling information is a sequence of time-stamped samples, therefore the ideal data structure should have an efficient snoc operation. Also, it should make it easy to extract an interval given by a start and an end time. On top of the raw data, we also want to access some statistics as efficiently as possible.
We can separate two phases: looking at the profile during execution
and later. In the first case we might not want statistics, just live
monitoring, while we probably want to analyse archived profiles more
deeply. Therefore, it makes sense to define two separate data
structures for these two purposes, and give them a common interface
for extracting the necessary data. The simple case is covered by the
Profile
type defined here, while a more complex structure providing
fast off-line queries is defined in the Profiling.Heap.Stats module.
- type CostCentreId = Int
- type CostCentreName = ByteString
- type Time = Double
- type Cost = Int64
- type ProfileSample = [(CostCentreId, Cost)]
- data Profile = Profile {
- prSamples :: ![(Time, ProfileSample)]
- prNames :: !(IntMap CostCentreName)
- prNamesInv :: !(Trie CostCentreId)
- prJob :: !String
- prDate :: !String
- emptyProfile :: Profile
- class ProfileQuery p where
- job :: p -> String
- date :: p -> String
- ccNames :: p -> IntMap CostCentreName
- ccName :: p -> Int -> CostCentreName
- samples :: p -> [(Time, ProfileSample)]
- samplesIvl :: p -> Time -> Time -> [(Time, ProfileSample)]
- minTime :: p -> Time
- maxTime :: p -> Time
- maxCost :: p -> Cost
- maxCostTotal :: p -> Cost
- maxCostIvl :: p -> Time -> Time -> Cost
- maxCostTotalIvl :: p -> Time -> Time -> Cost
- integral :: p -> ProfileSample
- integralIvl :: p -> Time -> Time -> ProfileSample
- type ProfileSink = SinkInput -> IO ()
- data SinkInput
Documentation
type CostCentreId = IntSource
Cost centres are identified by integers for simplicity (so we can use IntMap).
type CostCentreName = ByteStringSource
At this level cost centre names have no internal structure that we would care about. While in some cases they reflect the call hierarchy, we are not splitting them at this point, because all kinds of names can appear here.
type ProfileSample = [(CostCentreId, Cost)]Source
A sampling point is simply a list of cost centres with the associated cost. There is no need for a fancy data structure here, since we normally process every value in this collection, and it's usually not big either, only holding a few dozen entries at most.
Profile data structure
A raw heap profile that's easy to grow further, therefore it is used during loading.
Profile | |
|
An initial Profile
structure that can be used in
accumulations.
Query interface
class ProfileQuery p whereSource
The ProfileQuery
class contains all kinds of reading operations.
The minimal definition consists of job
, date
, ccNames
and
samples
. All the statistics have default implementations, which are
mostly okay for a single query, but they are generally highly
inefficient.
Job information (command line).
Job start time.
ccNames :: p -> IntMap CostCentreNameSource
Cost centre id to name mapping.
ccName :: p -> Int -> CostCentreNameSource
Find cost centre name by id.
samples :: p -> [(Time, ProfileSample)]Source
The measurements in a list ordered by time.
samplesIvl :: p -> Time -> Time -> [(Time, ProfileSample)]Source
The samples between two given times.
The time of the first sample.
The time of the last sample.
The highest individual cost at any time.
maxCostTotal :: p -> CostSource
The highest total cost at any time.
maxCostIvl :: p -> Time -> Time -> CostSource
The highest individual cost in the interval.
maxCostTotalIvl :: p -> Time -> Time -> CostSource
The highest total cost in the interval.
integral :: p -> ProfileSampleSource
The total cost of each cost centre. Not a time integral; samples are simply summed.
integralIvl :: p -> Time -> Time -> ProfileSampleSource
The total cost of each cost centre in the interval.
Streaming interface
type ProfileSink = SinkInput -> IO ()Source
We might not want to hold on to all the past output, just do some
stream processing. We can achieve this using a callback function
that's invoked whenever a new profile sample is available. The type
of this function can be ProfileSink
. Besides the actual costs, it
is also necessary to send over the names that belong to the short cost
centre identifiers as well as the fact that no more data will come.
The SinkInput
type expresses these possibilities.
SinkSample !Time !ProfileSample | A snapshot of costs at a given time. |
SinkId !CostCentreId !CostCentreName | The name behind a cost centre id used in the samples. |
SinkStop | Indication that no more data will come. |