datasets-0.2.0.1: Classical data sets for statistics and machine learning

Safe HaskellNone
LanguageHaskell98

Numeric.Datasets

Contents

Description

The datasets package defines two different kinds of datasets:

  • small data sets which are directly (or indirectly with `file-embed`) embedded in the package as pure values and do not require network or IO to download the data set.
  • other data sets which need to be fetched over the network with getDataset and are cached in a local temporary directory

This module defines the getDataset function for fetching datasets and utilies for defining new data sets. It is only necessary to import this module when using fetched data sets. Embedded data sets can be imported directly.

Synopsis

Using datasets

getDataset :: Dataset a -> IO [a] Source #

Load a dataset, using the system temporary directory as a cache

type Dataset a Source #

Arguments

 = FilePath

Directory for caching downloaded datasets

-> IO [a] 

A dataset is defined as a function from the caching directory to the IO action that loads the data

Defining datasets

data Source Source #

Constructors

URL String 

csvDatasetPreprocess :: FromRecord a => (ByteString -> ByteString) -> Source -> Dataset a Source #

Define a dataset from a pre-processing function and a source for a CSV file

csvDataset :: FromRecord a => Source -> Dataset a Source #

Define a dataset from a source for a CSV file

getFileFromSource :: FilePath -> Source -> IO ByteString Source #

Get a ByteString from the specified Source

Helper functions for parsing

dashToCamelCase :: String -> String Source #

Turn dashes to CamlCase

parseDashToCamelField :: Read a => Field -> Parser a Source #

Parse a field, first turning dashes to CamlCase

parseReadField :: Read a => Field -> Parser a Source #

parse somethign, based on its read instance

dropLines :: Int -> ByteString -> ByteString Source #

Drop lines from a bytestring

fixAmericanDecimals :: ByteString -> ByteString Source #

Turn US-style decimals starting with a period (e.g. .2) into something Haskell can parse (e.g. 0.2)

fixedWidthToCSV :: ByteString -> ByteString Source #

Convert a Fixed-width format to a CSV