reservoir-0.2.0.0: Unweighted reservoir sampling

Safe HaskellSafe
LanguageHaskell2010

System.Random.Reservoir

Description

Unweighted reservoir algorithm suitable for sampling from data streams of large or unknown size.

Synopsis

Documentation

data Res g a Source #

Wrapper for the state kept by algorithm R. Keeps the sample as an IntMap, the number of elements seen, and the random seed.

Constructors

Res 

Fields

  • resSample :: !(IntMap a)

    The current sample. Should usually not be modified by hand.

  • numSeen :: !Int

    The number of elements seen. Should almost never be modified by hand.

  • resSeed :: !g

    The random seed for the sample

Instances

Functor (Res g) Source # 

Methods

fmap :: (a -> b) -> Res g a -> Res g b #

(<$) :: a -> Res g b -> Res g a #

(Show g, Show a) => Show (Res g a) Source # 

Methods

showsPrec :: Int -> Res g a -> ShowS #

show :: Res g a -> String #

showList :: [Res g a] -> ShowS #

getSample :: Res g a -> [a] Source #

Extracts the sample as a list.

emptyRes :: RandomGen g => g -> Res g a Source #

Creates a Res with nothing in the sample and with the counter at zero.

reservoirSample Source #

Arguments

:: RandomGen g 
=> Int

The maximum number of elements to be in the sample.

-> a

The next element to be considered

-> Res g a

The current wrapped sample

-> Res g a 

Jeffrey Vitter's "Algorithm R" for randomly choosing a fixed-size sample from a stream of possibly very large or unknown length. At any point in the sampling, all streamed elements have equal probability of appearing in the sample. Intended to be partially applied on the sample size and used with fold-like functions.