datasets-0.4.0: Classical data sets for statistics and machine learning

Safe HaskellNone
LanguageHaskell2010

Numeric.Datasets.Netflix

Contents

Description

Netflix prize dataset

From the README : The movie rating files contain over 100 million ratings from 480 thousand randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The data were collected between October, 1998 and December, 2005 and reflect the distribution of all ratings received during this period. The ratings are on a scale from 1 to 5 (integral) stars. To protect customer privacy, each customer id has been replaced with a randomly-assigned id. The date of each rating and the title and year of release for each movie id are also provided.

The competition ended on September, 2009, and the dataset was subsequently removed from the public domain by the company (see http://netflixprize.com/).

We include in the repository a tiny subset of the original dataset for development purposes. Since we use `file-embed` to load the data, the directories are hardcoded (see the Datasets section below); users may either symlink or copy the full dataset in the given directories.

Synopsis

Dataset parsing and shaping

parseTrainingSet :: Num a => Either String [(UserId, MovieId, RD a)] Source #

Parse the whole training set, convert to coordinate format and concatenate into a single list.

parseTestSet :: Either String [(UserId, MovieId, Day)] Source #

Parse the whole test set, convert to coordinate format and concatenate into a single list.

parseMovies :: Either String [Movie] Source #

Parse the whole movies file, convert to coordinate format and concatenate into a single list.

Types

data RD a Source #

A type for date-tagged movie ratings

Constructors

RD 

Fields

Instances
Eq a => Eq (RD a) Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: RD a -> RD a -> Bool #

(/=) :: RD a -> RD a -> Bool #

Show a => Show (RD a) Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

showsPrec :: Int -> RD a -> ShowS #

show :: RD a -> String #

showList :: [RD a] -> ShowS #

data UserId Source #

User ID (anonymized)

Instances
Eq UserId Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: UserId -> UserId -> Bool #

(/=) :: UserId -> UserId -> Bool #

Show UserId Source # 
Instance details

Defined in Numeric.Datasets.Netflix

data MovieId Source #

Movie ID

Instances
Eq MovieId Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: MovieId -> MovieId -> Bool #

(/=) :: MovieId -> MovieId -> Bool #

Show MovieId Source # 
Instance details

Defined in Numeric.Datasets.Netflix

data Train Source #

Training set item

Constructors

Train 
Instances
Eq Train Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: Train -> Train -> Bool #

(/=) :: Train -> Train -> Bool #

Show Train Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

showsPrec :: Int -> Train -> ShowS #

show :: Train -> String #

showList :: [Train] -> ShowS #

newtype Test Source #

Test set item

Constructors

Test 
Instances
Eq Test Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: Test -> Test -> Bool #

(/=) :: Test -> Test -> Bool #

Show Test Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

showsPrec :: Int -> Test -> ShowS #

show :: Test -> String #

showList :: [Test] -> ShowS #

data Movie Source #

Movie dataset item

Constructors

Movie 
Instances
Eq Movie Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

(==) :: Movie -> Movie -> Bool #

(/=) :: Movie -> Movie -> Bool #

Show Movie Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Methods

showsPrec :: Int -> Movie -> ShowS #

show :: Movie -> String #

showList :: [Movie] -> ShowS #

data RatingDate Source #

A date-tagged movie rating

Constructors

RatingDate 

Fields

Instances
Eq RatingDate Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Show RatingDate Source # 
Instance details

Defined in Numeric.Datasets.Netflix

Datasets

trainingSet :: [(FilePath, ByteString)] Source #

The training set (a set of text files) is assumed to be in the directory `datafiles/netflix/training/` relative to the repository root

testSet :: [(FilePath, ByteString)] Source #

The test set (one text file) is assumed to be in the directory `datafiles/netflix/test/` relative to the repository root

movies :: [(FilePath, ByteString)] Source #

The movies dataset (one text file) is assumed to be in the directory `datafiles/netflix/movies/` relative to the repository root