statistics: A library of statistical types, data, and functions

[ bsd2, library, math, statistics ] [ Propose Tags ] [ Report a vulnerability ]

This library provides a number of common functions and types useful in statistics. We focus on high performance, numerical robustness, and use of good algorithms. Where possible, we provide references to the statistical literature.

The library's facilities can be divided into four broad categories:

  • Working with widely used discrete and continuous probability distributions. (There are dozens of exotic distributions in use; we focus on the most common.)

  • Computing with sample data: quantile estimation, kernel density estimation, histograms, bootstrap methods, significance testing, and autocorrelation analysis.

  • Random variate generation under several different distributions.

  • Common statistical tests for significant differences between samples.

Changes in 0.10.1.0

  • Kolmogorov-Smirnov nonparametric test added.

  • Pearson's chi squared test added.

  • Type class for generating random variates for given distribution is added.

  • Modules Statistics.Math and Statistics.Constants are moved to the math-functions package. They are still available but marked as deprecated.

Changed in 0.10.0.1

  • dct and idct now have type Vector Double -> Vector Double

Changes in 0.10.0.0:

  • The type classes Mean and Variance are split in two. This is required for distributions which do not have finite variance or mean.

  • The S.Sample.KernelDensity module has been renamed, and completely rewritten to be much more robust. The older module oversmoothed multi-modal data. (The older module is still available under the name S.Sample.KernelDensity.Simple).

  • Histogram computation is added, in S.Sample.Histogram.

  • Forward and inverse discrete Fourier and cosine transforms are added, in S.Transform.

  • Root finding is added, in S.Math.RootFinding.

  • The complCumulative function is added to the Distribution class in order to accurately assess probalities P(X>x) which are used in one-tailed tests.

  • A stdDev function is added to the Variance class for distributions.

  • The constructor S.Distribution.normalDistr now takes standard deviation instead of variance as its parameter.

  • A bug in S.Quantile.weightedAvg is fixed. It produced a wrong answer if a sample contained only one element.

  • Bugs in quantile estimations for chi-square and gamma distribution are fixed.

  • Integer overlow in mannWhitneyUCriticalValue is fixed. It produced incorrect critical values for moderately large samples. Something around 20 for 32-bit machines and 40 for 64-bit ones.

  • A bug in mannWhitneyUSignificant is fixed. If either sample was larger than 20, it produced a completely incorrect answer.

  • One- and two-tailed tests in S.Tests.NonParametric are selected with sum types instead of Bool.

  • Test results returned as enumeration instead of Bool.

  • Performance improvements for Mann-Whitney U and Wilcoxon tests.

  • Module S.Tests.NonParamtric is split into S.Tests.MannWhitneyU and S.Tests.WilcoxonT

  • sortBy is added to S.Function.

  • Mean and variance for gamma distribution are fixed.

  • Much faster cumulative probablity functions for Poisson and hypergeometric distributions.

  • Better density functions for gamma and Poisson distributions.

  • Student-T, Fisher-Snedecor F-distributions and Cauchy-Lorentz distrbution are added.

  • The function S.Function.create is removed. Use generateM from the vector package instead.

  • Function to perform approximate comparion of doubles is added to S.Function.Comparison

  • Regularized incomplete beta function and its inverse are added to S.Function.


[Skip to Readme]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1, 0.2, 0.2.1, 0.2.2, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.4.0, 0.4.1, 0.5.0.0, 0.5.1.0, 0.5.1.1, 0.5.1.2, 0.6.0.0, 0.6.0.1, 0.6.0.2, 0.7.0.0, 0.8.0.0, 0.8.0.1, 0.8.0.2, 0.8.0.3, 0.8.0.4, 0.8.0.5, 0.9.0.0, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.2.0, 0.10.3.0, 0.10.3.1, 0.10.4.0, 0.10.4.1, 0.10.5.0, 0.10.5.1, 0.10.5.2, 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 0.12.0.0, 0.13.1.0, 0.13.1.1, 0.13.2.0, 0.13.2.1, 0.13.2.2, 0.13.2.3, 0.13.3.0, 0.14.0.0, 0.14.0.1, 0.14.0.2, 0.15.0.0, 0.15.1.0, 0.15.1.1, 0.15.2.0, 0.16.0.0, 0.16.0.1, 0.16.0.2, 0.16.1.0, 0.16.1.1, 0.16.1.2, 0.16.2.0, 0.16.2.1 (info)
Dependencies base (<5), deepseq (>=1.1.0.2 && <1.4), erf, math-functions (>=0.1.1), monad-par (>=0.1.0.1), mwc-random (>=0.11.0.0), primitive (>=0.3), vector (>=0.7.1), vector-algorithms (>=0.4) [details]
License BSD-3-Clause
Copyright 2009, 2010, 2011 Bryan O'Sullivan
Author Bryan O'Sullivan <bos@serpentine.com>
Maintainer Bryan O'Sullivan <bos@serpentine.com>
Revised Revision 1 made by HerbertValerioRiedel at 2015-01-05T20:58:15Z
Category Math, Statistics
Home page https://github.com/bos/statistics
Bug tracker https://github.com/bos/statistics/issues
Source repo head: git clone https://github.com/bos/statistics
head: hg clone https://bitbucket.org/bos/statistics
Uploaded by BryanOSullivan at 2012-01-13T22:18:55Z
Distributions Arch:0.16.2.1, Debian:0.15.2.0, Fedora:0.16.2.0, FreeBSD:0.13.2.3, LTSHaskell:0.16.2.1, NixOS:0.16.2.1, Stackage:0.16.2.1, openSUSE:0.16.2.1
Reverse Dependencies 65 direct, 3629 indirect [details]
Downloads 119841 total (569 in the last 30 days)
Rating 2.25 (votes: 2) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for statistics-0.10.1.0

[back to package description]

Statistics: efficient, general purpose statistics

This package provides the Statistics module, a Haskell library for working with statistical data in a space- and time-efficient way.

Where possible, we give citations and computational complexity estimates for the algorithms used.

Performance

This library has been carefully optimised for high performance. To obtain the best runtime efficiency, it is imperative to compile libraries and applications that use this library using a high level of optimisation.

Suggested GHC options:

-O -funbox-strict-fields

To illustrate, here are the times (in seconds) to generate and sum 250 million random Word32 values, on a laptop with a 2.4GHz Core2 Duo P8600 processor, running Fedora 11 and GHC 6.10.3:

no flags   200+
-O           1.249
-O -fvia-C   0.991

As the numbers above suggest, compiling without optimisation will yield unacceptable performance.

Get involved!

Please report bugs via the github issue tracker.

Master git mirror:

  • git clone git://github.com/bos/statistics.git

There's also a Mercurial mirror:

  • hg clone https://bitbucket.org/bos/statistics

(You can create and contribute changes using either Mercurial or git.)

Authors

This library is written and maintained by Bryan O'Sullivan, bos@serpentine.com.