Safe Haskell	None
Language	Haskell2010

Numeric.Datasets

Contents

Parsing datasets
Defining datasets
Dataset source URLs

Description

The datasets package defines three different kinds of datasets:

Tiny datasets (up to a few tens of rows) are embedded as part of the library source code, as lists of values.
Small data sets are embedded indirectly (via file-embed) in the package as pure values and do not require IO to be downloaded (i.e. the data is loaded and parsed at compile time).
Larger data sets which need to be fetched over the network and are cached in a local temporary directory for subsequent use.

This module defines the getDataset function for fetching datasets and utilities for defining new data sets and modifying their options. It is only necessary to import this module when using fetched data sets. Embedded data sets can be used directly.

Please refer to the dataset modules for examples.

Synopsis

getDataset :: Dataset h a -> IO [a]
data Dataset h a = Dataset {
- source :: Source h
- temporaryDirectory :: Maybe FilePath
- preProcess :: Maybe (ByteString -> ByteString)
- readAs :: ReadAs a
}
data Source h
- = URL (Url h)
- | File FilePath
readDataset :: ReadAs a -> ByteString -> [a]
data ReadAs a where
- JSON :: FromJSON a => ReadAs a
- CSVRecord :: FromRecord a => HasHeader -> DecodeOptions -> ReadAs a
- CSVNamedRecord :: FromNamedRecord a => DecodeOptions -> ReadAs a
csvRecord :: FromRecord a => ReadAs a
csvDataset :: FromRecord a => Source h -> Dataset h a
csvHdrDataset :: FromNamedRecord a => Source h -> Dataset h a
csvHdrDatasetSep :: FromNamedRecord a => Char -> Source h -> Dataset h a
csvDatasetSkipHdr :: FromRecord a => Source h -> Dataset h a
jsonDataset :: FromJSON a => Source h -> Dataset h a
withPreprocess :: (ByteString -> ByteString) -> Dataset h a -> Dataset h a
withTempDir :: FilePath -> Dataset h a -> Dataset h a
dropLines :: Int -> ByteString -> ByteString
fixedWidthToCSV :: ByteString -> ByteString
removeEscQuotes :: ByteString -> ByteString
fixAmericanDecimals :: ByteString -> ByteString
parseReadField :: Read a => Field -> Parser a
parseDashToCamelField :: Read a => Field -> Parser a
yearToUTCTime :: Double -> UTCTime
umassMLDB :: Url Http
uciMLDB :: Url Https

Documentation

getDataset :: Dataset h a -> IO [a] Source #

Load a dataset, using the system temporary directory as a cache

data Dataset h a Source #

A Dataset contains metadata for loading, caching, preprocessing and parsing data.

Constructors

Dataset
Fields source :: Source h Dataset source temporaryDirectory :: Maybe FilePath Temporary directory (optional) preProcess :: Maybe (ByteString -> ByteString) Dataset preprocessing function (optional) readAs :: ReadAs a

data Source h Source #

A Dataset source can be either a URL (for remotely-hosted datasets) or the filepath of a local file.

Constructors

URL (Url h)
File FilePath

Parsing datasets

readDataset Source #

Arguments

:: ReadAs a	How to parse the raw data string
-> ByteString	The data string
-> [a]

Parse a ByteString into a list of Haskell values

data ReadAs a where Source #

ReadAs is a datatype to describe data formats that hold data sets

Constructors

JSON :: FromJSON a => ReadAs a
CSVRecord :: FromRecord a => HasHeader -> DecodeOptions -> ReadAs a
CSVNamedRecord :: FromNamedRecord a => DecodeOptions -> ReadAs a