data-sketches

[ library, unclassified ] [ Propose Tags ]

Please see the README on GitHub at https://github.com/iand675/datasketches-haskell#readme

[Skip to Readme]

Modules

[Index] [Quick Jump]

DataSketches
- Quantiles
  - DataSketches.Quantiles.RelativeErrorQuantile
    - Internal
    - DataSketches.Quantiles.RelativeErrorQuantile.Types

Downloads

data-sketches-0.3.0.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

IanDuncan

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.0.1, 0.1.0.2, 0.2.0.0, 0.2.0.1, 0.3.0.0, 0.3.0.1, 0.3.1.0
Change log	ChangeLog.md
Dependencies	base (>=4.7 && <5), cereal, deepseq, ghc-prim, mtl, mwc-random, pretty-show, prettyprinter, primitive, vector, vector-algorithms [details]
License	LicenseRef-Apache
Copyright	2021 Ian Duncan, Rob Bassi, Mercury Technologies
Author	Ian Duncan, Rob Bassi
Maintainer	ian@iankduncan.com
Home page	https://github.com/iand675/datasketches-haskell#readme
Bug tracker	https://github.com/iand675/datasketches-haskell/issues
Source repo	head: git clone https://github.com/iand675/datasketches-haskell
Uploaded	by IanDuncan at 2021-08-24T21:31:52Z
Distributions	LTSHaskell:0.3.1.0, NixOS:0.3.1.0, Stackage:0.3.1.0
Reverse Dependencies	1 direct, 13 indirect [details]
Downloads	2045 total (53 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2021-08-24 [all 1 reports]

Readme for data-sketches-0.3.0.1

[back to package description]

streaming-quantiles

The Business Challenge: Analyzing Big Data Quickly.

In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis.

If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or sketches that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.

For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo (Verizon Media) successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.

This project is dedicated to providing a broad selection of sketch algorithms of production quality. Contributions are welcome from those interested in further development of this science and art.

Why use this project?

Sketches are fast. The sketch algorithms in this library process data in a single pass and are suitable for both real-time and batch. Sketches enable streaming computation of set expression cardinalities, quantiles, frequency estimation and more. In addition, designing a system around sketching allows simplification of system's architecture and reduction in overall compute resources required for these heretofore difficult computation
Built-in Theta Sketch set operators (Union, Intersection, Difference) produce sketches as a result (and not just a number) enabling full set expressions of cardinality, such as ((A ∪ B) ∩ (C ∪ D)) \ (E ∪ F). This capability along with predictable and superior accuracy (compared with Include/Exclude approaches) enable unprecedented analysis capabilities for fast queries.