Safe Haskell | None |
---|---|
Language | Haskell2010 |
- data HyperLogLogPlus p k
- insert :: forall p k a. (KnownNat p, KnownNat k, Hashable64 a) => a -> HyperLogLogPlus p k -> HyperLogLogPlus p k
- insertHash :: forall p k. (KnownNat p, KnownNat k) => Hash64 -> HyperLogLogPlus p k -> HyperLogLogPlus p k
- size :: forall p k. (KnownNat p, KnownNat k) => HyperLogLogPlus p k -> Word64
- intersection :: forall p k. (KnownNat p, KnownNat k) => [HyperLogLogPlus p k] -> Word64
- cast :: forall p1 k1 p2 k2. (KnownNat p1, KnownNat k1, KnownNat p2, KnownNat k2, 4 <= p2, p2 <= 18) => HyperLogLogPlus p1 k1 -> Maybe (HyperLogLogPlus p2 k2)
Documentation
data HyperLogLogPlus p k
HyperLogLog++ cardinality estimation paired with MinHash for intersection estimation
p
- precision of HLL structurek
- precision of MinHash structure (max size)
Create new counter:
>>>
:set -XDataKinds
>>>
:load Data.HyperLogLogPlus
>>>
type HLL = HyperLogLogPlus 12 8192
>>>
mempty :: HLL
HyperLogLogPlus [ p = 12 k = 8192 ] [ minSet size = 0 ]
HyperLogLogPlus and MinHash precisions are specified in a type.
HLL precision p
should be between 4 and 18, starting from 10
for good accuracy.
MinHash precision k
ideally should be greater or equal 8192
for decent intersection estimation.
Estimating number of unique items:
>>>
size (foldr insert mempty [1 .. 75000] :: HLL)
75090
Combine multiple counters:
>>>
size $ (foldr insert mempty [1 .. 5000] :: HLL) <> (foldr insert mempty [3000 .. 10000] :: HLL)
10044
Compute estimated set intersection:
>>>
intersection $ [(foldr insert mempty [1 .. 15000] :: HLL), (foldr insert mempty [12000 .. 20000] :: HLL)]
3100
Eq (HyperLogLogPlus p k) | |
(KnownNat p, KnownNat k) => Show (HyperLogLogPlus p k) | |
(KnownNat p, KnownNat k, (<=) 4 p, (<=) p 18) => Monoid (HyperLogLogPlus p k) | |
KnownNat k => Semigroup (HyperLogLogPlus p k) |
insert :: forall p k a. (KnownNat p, KnownNat k, Hashable64 a) => a -> HyperLogLogPlus p k -> HyperLogLogPlus p k
Insert hashable value
insertHash :: forall p k. (KnownNat p, KnownNat k) => Hash64 -> HyperLogLogPlus p k -> HyperLogLogPlus p k
Insert already hashed value
size :: forall p k. (KnownNat p, KnownNat k) => HyperLogLogPlus p k -> Word64
Compute estimated size of HyperLogLogPlus. If number of inserted values is smaller than MinHash precision this will return exact value
intersection :: forall p k. (KnownNat p, KnownNat k) => [HyperLogLogPlus p k] -> Word64
Returns an estimate of the size of the intersection of the given HyperLogLogPlus objects
cast :: forall p1 k1 p2 k2. (KnownNat p1, KnownNat k1, KnownNat p2, KnownNat k2, 4 <= p2, p2 <= 18) => HyperLogLogPlus p1 k1 -> Maybe (HyperLogLogPlus p2 k2)
Cast HyperLogLogPlus to new precision levels
- New HLL precision should less or equal to old one
- New MinHash precision has to be less or equal to old one, or it can be larger, but only if number of inserted hashes in old structure is smaller than old precision (size limit)