Safe Haskell	None
Language	Haskell2010

DataSketches.Quantiles.RelativeErrorQuantile

Contents

Construction
- Configuration settings
Sketch summaries
Updating the sketch
Internals used in test. DO NOT USE.

Description

The Relative Error Quantile (REQ) sketch provides extremely high accuracy at a chosen end of the rank domain. This is best illustrated with some rank domain accuracy plots that compare the KLL quantiles sketch to the REQ sketch.

This first plot illustrates the typical error behavior of the KLL sketch (also the quantiles/DoublesSketch). The error is flat for all ranks (0, 1). The green and yellow lines correspond to +/- one RSE at 68% confidence; the blue and red lines, +- two RSE at 95% confidence; and, the purple and brown lines +- 3 RSE at 99% confidence. The reason all the curves pinch at 0 and 1.0, is because the sketch knows with certainty that a request for a quantile at rank = 0 is the minimum value of the stream; and a request for a quantiles at rank = 1.0, is the maximum value of the stream. Both of which the sketch tracks.

KLL Gaussian Error Quantiles

The next plot is the exact same data and queries fed to the REQ sketch set for High Rank Accuracy (HRA) mode. In this plot, starting at a rank of about 0.3, the contour lines start converging and actually reach zero error at rank 1.0. Therefore the error (the inverse of accuracy) is relative to the requested rank, thus the name of the sketch. This means that the user can perform getQuantile(rank) queries, where rank = .99999 and get accurate results.

ReqSketch Gaussian Error Quantiles - HighRankAccuracy

This next plot is also the same data and queries, except the REQ sketch was configured for Low Rank Accuracy (LRA). In this case the user can perform getQuantiles(rank) queries, where rank = .00001 and get accurate results.

ReqSketch Gaussian Error Quantiles - LowRankAccuracy

Synopsis

data ReqSketch s
mkReqSketch :: forall m. PrimMonad m => Word32 -> RankAccuracy -> m (ReqSketch (PrimState m))
data RankAccuracy
- = HighRanksAreAccurate
- | LowRanksAreAccurate
data Criterion
- = (:<)
- | (:<=)
count :: PrimMonad m => ReqSketch (PrimState m) -> m Word64
null :: PrimMonad m => ReqSketch (PrimState m) -> m Bool
sum :: PrimMonad m => ReqSketch (PrimState m) -> m Double
maximum :: PrimMonad m => ReqSketch (PrimState m) -> m Double
minimum :: PrimMonad m => ReqSketch (PrimState m) -> m Double
retainedItemCount :: PrimMonad m => ReqSketch (PrimState m) -> m Int
relativeStandardError :: Int -> Double -> RankAccuracy -> Word64 -> Double
countWithCriterion :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> Double -> m Word64
probabilityMassFunction :: PrimMonad m => ReqSketch (PrimState m) -> [Double] -> m [Double]
quantile :: PrimMonad m => ReqSketch (PrimState m) -> Double -> m Double
quantiles :: PrimMonad m => ReqSketch (PrimState m) -> [Double] -> m [Double]
rank :: PrimMonad m => ReqSketch (PrimState m) -> Double -> m Double
rankLowerBound :: PrimMonad m => ReqSketch (PrimState m) -> Double -> Int -> m Double
ranks :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> [Double] -> m [Double]
rankUpperBound :: PrimMonad m => ReqSketch (PrimState m) -> Double -> Int -> m Double
cumulativeDistributionFunction :: PrimMonad m => ReqSketch (PrimState m) -> [Double] -> m (Maybe [Double])
getK :: ReqSketch s -> Word32
merge :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> ReqSketch s -> m (ReqSketch s)
insert :: PrimMonad m => ReqSketch (PrimState m) -> Double -> m ()
rankAccuracy :: ReqSketch s -> RankAccuracy
isEstimationMode :: PrimMonad m => ReqSketch (PrimState m) -> m Bool
isLessThanOrEqual :: ReqSketch s -> Bool
data CumulativeDistributionInvariants
- = CumulativeDistributionInvariantsSplitsAreEmpty
- | CumulativeDistributionInvariantsSplitsAreNotFinite
- | CumulativeDistributionInvariantsSplitsAreNotUniqueAndMontonicallyIncreasing
mkAuxiliaryFromReqSketch :: PrimMonad m => ReqSketch (PrimState m) -> m ReqAuxiliary
computeTotalRetainedItems :: PrimMonad m => ReqSketch (PrimState m) -> m Int

Construction

data ReqSketch s Source #

This Relative Error Quantiles Sketch is the Haskell implementation based on the paper "Relative Error Streaming Quantiles", https://arxiv.org/abs/2004.01668, and loosely derived from a Python prototype written by Pavel Vesely, ported from the Java equivalent.

This implementation differs from the algorithm described in the paper in the following:

The algorithm requires no upper bound on the stream length. Instead, each relative-compactor counts the number of compaction operations performed so far (via variable state). Initially, the relative-compactor starts with INIT_NUMBER_OF_SECTIONS. Each time the number of compactions (variable state) exceeds 2^{numSections - 1}, we double numSections. Note that after merging the sketch with another one variable state may not correspond to the number of compactions performed at a particular level, however, since the state variable never exceeds the number of compactions, the guarantees of the sketch remain valid.

The size of each section (variable k and sectionSize in the code and parameter k in the paper) is initialized with a value set by the user via variable k. When the number of sections doubles, we decrease sectionSize by a factor of sqrt(2). This is applied at each level separately. Thus, when we double the number of sections, the nominal compactor size increases by a factor of approx. sqrt(2) (+/- rounding).

The merge operation here does not perform "special compactions", which are used in the paper to allow for a tight mathematical analysis of the sketch.

This implementation provides a number of capabilities not discussed in the paper or provided in the Python prototype.

The Python prototype only implemented high accuracy for low ranks. This implementation provides the user with the ability to choose either high rank accuracy or low rank accuracy at the time of sketch construction.

The Python prototype only implemented a comparison criterion of "<". This implementation allows the user to switch back and forth between the "<=" criterion and the "<=" criterion.

Instances

Instances details

TakeSnapshot ReqSketch Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Associated Types data Snapshot ReqSketch Source # Methods takeSnapshot :: PrimMonad m => ReqSketch (PrimState m) -> m (Snapshot ReqSketch) Source #
Show (Snapshot ReqSketch) Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Methods showsPrec :: Int -> Snapshot ReqSketch -> ShowS # show :: Snapshot ReqSketch -> String # showList :: [Snapshot ReqSketch] -> ShowS #
Generic (ReqSketch s) Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Associated Types type Rep (ReqSketch s) :: Type -> Type # Methods from :: ReqSketch s -> Rep (ReqSketch s) x # to :: Rep (ReqSketch s) x -> ReqSketch s #
NFData (ReqSketch s) Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Methods rnf :: ReqSketch s -> () #
data Snapshot ReqSketch Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile data Snapshot ReqSketch = ReqSketchSnapshot { snapshotRankAccuracySetting :: !RankAccuracy snapshotCriterion :: !Criterion snapshotTotalN :: !Word64 snapshotMinValue :: !Double snapshotMaxValue :: !Double snapshotRetainedItems :: !Int snapshotMaxNominalCapacitiesSize :: !Int snapshotCompactors :: !(Vector (Snapshot ReqCompactor)) }
type Rep (ReqSketch s) Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile type Rep (ReqSketch s) = D1 ('MetaData "ReqSketch" "DataSketches.Quantiles.RelativeErrorQuantile" "data-sketches-0.3.0.1-6J1wEmzzafm56VnpwQnbyl" 'False) (C1 ('MetaCons "ReqSketch" 'PrefixI 'True) (((S1 ('MetaSel ('Just "k") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Word32) :: (S1 ('MetaSel ('Just "rankAccuracySetting") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 RankAccuracy) :: S1 ('MetaSel ('Just "criterion") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Criterion))) :: (S1 ('MetaSel ('Just "sketchRng") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (Gen s)) :: (S1 ('MetaSel ('Just "totalN") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Word64)) :: S1 ('MetaSel ('Just "minValue") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Double))))) :: ((S1 ('MetaSel ('Just "maxValue") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Double)) :: (S1 ('MetaSel ('Just "sumValue") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Double)) :: S1 ('MetaSel ('Just "retainedItems") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Int)))) :: (S1 ('MetaSel ('Just "maxNominalCapacitiesSize") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (URef s Int)) :: (S1 ('MetaSel ('Just "aux") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (MutVar s (Maybe ReqAuxiliary))) :*: S1 ('MetaSel ('Just "compactors") 'SourceUnpack 'SourceStrict 'DecidedStrict) (Rec0 (MutVar s (Vector (ReqCompactor s)))))))))

mkReqSketch Source #

Arguments

:: forall m. PrimMonad m
=> Word32	K
-> RankAccuracy
-> m (ReqSketch (PrimState m))

The K parameter can be increased to trade increased space efficiency for higher accuracy in rank and quantile calculations. Due to the way the compaction algorithm works, it must be an even number between 4 and 1024.

A good starting number when in doubt is 6.

Configuration settings

data RankAccuracy Source #

Constructors

HighRanksAreAccurate	High ranks are prioritized for better accuracy.
LowRanksAreAccurate	Low ranks are prioritized for better accuracy

Instances

Instances details

Eq RankAccuracy Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile.Types Methods (==) :: RankAccuracy -> RankAccuracy -> Bool # (/=) :: RankAccuracy -> RankAccuracy -> Bool #
Show RankAccuracy Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile.Types Methods showsPrec :: Int -> RankAccuracy -> ShowS # show :: RankAccuracy -> String # showList :: [RankAccuracy] -> ShowS #

data Criterion Source #

Constructors

(:<)
(:<=)

Instances

Instances details

Eq Criterion Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile.Types Methods (==) :: Criterion -> Criterion -> Bool # (/=) :: Criterion -> Criterion -> Bool #
Show Criterion Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile.Types Methods showsPrec :: Int -> Criterion -> ShowS # show :: Criterion -> String # showList :: [Criterion] -> ShowS #
InequalitySearch Criterion Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile.Types Methods inequalityCompare :: Ord a => Criterion -> a -> a -> a -> Ordering Source # getIndex :: (PrimMonad m, MVector v a, Ord a) => Criterion -> v (PrimState m) a -> Int -> Int -> a -> m Int Source # resolve :: Criterion -> Int -> (Int, Int) -> (Int, Int) -> Int Source #

Sketch summaries

count :: PrimMonad m => ReqSketch (PrimState m) -> m Word64 Source #

Get the total number of items inserted into the sketch

null :: PrimMonad m => ReqSketch (PrimState m) -> m Bool Source #

Returns true if this sketch is empty.

sum :: PrimMonad m => ReqSketch (PrimState m) -> m Double Source #

maximum :: PrimMonad m => ReqSketch (PrimState m) -> m Double Source #

Gets the largest value seen by this sketch

minimum :: PrimMonad m => ReqSketch (PrimState m) -> m Double Source #

Gets the smallest value seen by this sketch

retainedItemCount :: PrimMonad m => ReqSketch (PrimState m) -> m Int Source #

relativeStandardError Source #

Arguments

:: Int	k - the given value of k
-> Double	rank - the given normalized rank, a number in [0,1].
-> RankAccuracy
-> Word64	totalN - an estimate of the total number of items submitted to the sketch.
-> Double	an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]).

Returns an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]). Derived from Lemma 12 in https://arxiv.org/abs/2004.01668v2, but the constant factors were modified based on empirical measurements.

countWithCriterion :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> Double -> m Word64 Source #

Returns the approximate count of items satisfying the criterion set in the ReqSketch criterion field.

probabilityMassFunction Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> [Double]	splitPoints - an array of m unique, monotonically increasing double values that divide the real number line into m+1 consecutive disjoint intervals. The definition of an "interval" is inclusive of the left splitPoint (or minimum value) and exclusive of the right splitPoint, with the exception that the last interval will include the maximum value. It is not necessary to include either the min or max values in these splitpoints.
-> m [Double]	An array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals. The definition of an "interval" is inclusive of the left splitPoint and exclusive of the right splitPoint, with the exception that the last interval will include maximum value.

Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of splitPoints (values). The resulting approximations have a probabilistic guarantee that be obtained, a priori, from the getRSE(int, double, boolean, long) function.

If the sketch is empty this returns an empty list.

quantile Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> Double	normRank - the given normalized rank
-> m Double	the approximate quantile given the normalized rank.

Gets the approximate quantile of the given normalized rank based on the lteq criterion.

quantiles Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> [Double]	normRanks - the given array of normalized ranks.
-> m [Double]	the array of quantiles that correspond to the given array of normalized ranks.

Gets an array of quantiles that correspond to the given array of normalized ranks.

rank Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> Double	value - the given value
-> m Double	the normalized rank of the given value in the stream.

Computes the normalized rank of the given value in the stream. The normalized rank is the fraction of values less than the given value; or if lteq is true, the fraction of values less than or equal to the given value.

rankLowerBound Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> Double	rank - the given rank, a value between 0 and 1.0.
-> Int	numStdDev - the number of standard deviations. Must be 1, 2, or 3.
-> m Double	an approximate lower bound rank.

Returns an approximate lower bound rank of the given normalized rank.

ranks :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> [Double] -> m [Double] Source #

Gets an array of normalized ranks that correspond to the given array of values. TODO, make it ifaster

rankUpperBound Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> Double	rank - the given rank, a value between 0 and 1.0.
-> Int	numStdDev - the number of standard deviations. Must be 1, 2, or 3.
-> m Double	an approximate upper bound rank.

Returns an approximate upper bound rank of the given rank.

cumulativeDistributionFunction Source #

Arguments

:: PrimMonad m
=> ReqSketch (PrimState m)
-> [Double]	Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of splitPoint (values). The resulting approximations have a probabilistic guarantee that be obtained, a priori, from the getRSE(int, double, boolean, long) function. If the sketch is empty this returns `Nothing`.
-> m (Maybe [Double])

Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of splitPoint (values).

getK :: ReqSketch s -> Word32 Source #

Updating the sketch

merge :: (PrimMonad m, s ~ PrimState m) => ReqSketch s -> ReqSketch s -> m (ReqSketch s) Source #

Merge other sketch into this one.

insert :: PrimMonad m => ReqSketch (PrimState m) -> Double -> m () Source #

Updates this sketch with the given item.

rankAccuracy :: ReqSketch s -> RankAccuracy Source #

isEstimationMode :: PrimMonad m => ReqSketch (PrimState m) -> m Bool Source #

Returns true if this sketch is in estimation mode.

isLessThanOrEqual :: ReqSketch s -> Bool Source #

Returns the current comparison criterion.

Internals used in test. DO NOT USE.

If you see this error, please file an issue in the GitHub repository.

data CumulativeDistributionInvariants Source #

Constructors

CumulativeDistributionInvariantsSplitsAreEmpty
CumulativeDistributionInvariantsSplitsAreNotFinite
CumulativeDistributionInvariantsSplitsAreNotUniqueAndMontonicallyIncreasing

Instances

Instances details

Eq CumulativeDistributionInvariants Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Methods (==) :: CumulativeDistributionInvariants -> CumulativeDistributionInvariants -> Bool # (/=) :: CumulativeDistributionInvariants -> CumulativeDistributionInvariants -> Bool #
Show CumulativeDistributionInvariants Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Methods showsPrec :: Int -> CumulativeDistributionInvariants -> ShowS # show :: CumulativeDistributionInvariants -> String # showList :: [CumulativeDistributionInvariants] -> ShowS #
Exception CumulativeDistributionInvariants Source #
Instance details Defined in DataSketches.Quantiles.RelativeErrorQuantile Methods toException :: CumulativeDistributionInvariants -> SomeException # fromException :: SomeException -> Maybe CumulativeDistributionInvariants # displayException :: CumulativeDistributionInvariants -> String #

mkAuxiliaryFromReqSketch :: PrimMonad m => ReqSketch (PrimState m) -> m ReqAuxiliary Source #

computeTotalRetainedItems :: PrimMonad m => ReqSketch (PrimState m) -> m Int Source #