hadoop-formats-0.2.1.1: Read/write file formats commonly used by Hadoop.

Safe HaskellNone
LanguageHaskell2010

Data.Hadoop.SequenceFile

Description

This module allows for lazy decoding of hadoop sequence files from a lazy L.ByteString. In the future an incremental API using strict ByteString will be provided, but for now if you need that level of control you need to use the attoparsec parsers in Data.Hadoop.SequenceFile.Parser directly.

Basic Examples

import           Control.Applicative ((<$>))
import qualified Data.ByteString.Lazy as L
import qualified Data.Foldable as F
import           Data.Int (Int32)
import           Data.Text (Text)
import qualified Data.Text.IO as T

import           Data.Hadoop.SequenceFile

-- | Print all the keys in a sequence file.
printKeys :: FilePath -> IO ()
printKeys path = do
    bs <- L.readFile path
    let records = decode bs :: Stream (RecordBlock Text Int32)
    F.for_ records $ \rb -> do
        F.mapM_ T.putStrLn (rbKeys rb)

-- | Count the number of records in a sequence file.
recordCount :: FilePath -> IO ()
recordCount path = do
    bs <- L.readFile path
    let records = decode bs :: Stream (RecordBlock Text Int32)
    print $ F.sum $ rbCount <$> records

Integration with Conduit

sourceRecords :: MonadIO m => FilePath -> Source m (RecordBlock Text ByteString)
sourceRecords path = do
    bs <- liftIO (L.readFile path)
    F.traverse_ yield (decode bs)

Synopsis

Documentation

data Stream a Source

A lazy stream of values.

Constructors

Error !String 
Value !a (Stream a) 
Done 

Instances

Functor Stream 
Foldable Stream 
Eq a => Eq (Stream a) 
Ord a => Ord (Stream a) 
Show a => Show (Stream a) 

class (Collection a ~ c a) => Writable c a where Source

Equivalent to the java interface org.apache.hadoop.io.Writable. All serializable key or value types in the Hadoop Map-Reduce framework implement this interface.

Methods

javaType :: a -> Text Source

Gets the package qualified name of a in Java land. Does not inspect the value of a, simply uses it for type information.

decoder :: Decoder (c a) Source

Gets a decoder for this writable type.

data RecordBlock k v Source

A block of key/value pairs. The key at index i always relates to the value at index i. Both vectors will always be the same size.

Constructors

RecordBlock 

Fields

rbCount :: Int

The number of records.

rbKeys :: Collection k

The keys.

rbValues :: Collection v

The values.

decode :: (Writable ck k, Writable cv v) => L.ByteString -> Stream (RecordBlock k v) Source

Decode a lazy L.ByteString in to a stream of record blocks.