Safe Haskell | None |
---|---|
Language | Haskell2010 |
Netflix prize dataset
From the README :
The movie rating files contain over 100 million ratings from 480 thousand randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The data were collected between October, 1998 and December, 2005 and reflect the distribution of all ratings received during this period. The ratings are on a scale from 1 to 5 (integral) stars. To protect customer privacy, each customer id has been replaced with a randomly-assigned id. The date of each rating and the title and year of release for each movie id are also provided.
The competition ended on September, 2009, and the dataset was subsequently removed from the public domain by the company.
We include in this repository a tiny subset of the original dataset for development purposes.
For further information, see http://netflixprize.com/.
- trainingSet :: [(FilePath, ByteString)]
- testSet :: [(FilePath, ByteString)]
- movies :: [(FilePath, ByteString)]
- data RatingDate = RatingDate {
- userId :: UserId
- ratingDate :: Day
- newtype UserId = UserId {}
- data Train = Train {
- trainRating :: RatingDate
- rating :: Int
- newtype MovieId = MovieId {}
- data Movie = Movie {
- movieId :: MovieId
- releaseYear :: Day
- movieTitle :: ByteString
- newtype Test = Test {}
- data TrainCol = TrainC {
- tcMovieId :: MovieId
- tcTrainSet :: [Train]
- data RD a = RD {}
- toCoordsCol :: Num a => TrainCol -> [(UserId, MovieId, RD a)]
- parseTrainingSet :: Num a => Either String [(UserId, MovieId, RD a)]
- parseTrainingSet' :: Num a => Either String [[(UserId, MovieId, RD a)]]
- trainingSetParser :: Parser ByteString TrainCol
- testSetParser :: Parser ByteString [(MovieId, [Test])]
- moviesParser :: Parser ByteString [Movie]
- trainRow :: Parser ByteString Train
- testRow :: Parser ByteString Test
- moviesRow :: Parser ByteString Movie
- parseRows :: Parser ByteString a -> Parser ByteString [a]
- stanza :: Parser ByteString a -> Parser ByteString (MovieId, [a])
- date :: Parser ByteString Day
- comma :: Parser Char
- dash :: Parser Char
- decc :: Parser ByteString Int
- ident :: Parser ByteString Integer
Dataset files. The directories are scanned recursively and their contents are presented as (FilePath, ByteString) pairs
trainingSet :: [(FilePath, ByteString)] Source #
testSet :: [(FilePath, ByteString)] Source #
movies :: [(FilePath, ByteString)] Source #
Data types
data RatingDate Source #
RatingDate | |
|
Train | |
|
Movies file
Movie | |
|
Qualifying file (test set)
Additional types and helper functions
TrainC | |
|
Netflix dataset parsers
testSetParser :: Parser ByteString [(MovieId, [Test])] Source #
moviesParser :: Parser ByteString [Movie] Source #
Netflix dataset row type parsers
Attoparsec parser combinators
parseRows :: Parser ByteString a -> Parser ByteString [a] Source #
stanza :: Parser ByteString a -> Parser ByteString (MovieId, [a]) Source #