repa-flow-4.2.2.1: Data-parallel data flows.

Safe HaskellNone
LanguageHaskell98

Data.Repa.Flow.IO.Bucket

Contents

Synopsis

Documentation

data Bucket Source

A bucket represents portion of a whole data-set on disk, and contains a file handle that points to the next piece of data to be read or written.

The bucket could be created from a portion of a single flat file, or be one file of a pre-split data set. The main advantage over a plain Handle is that a Bucket can represent a small portion of a single large file.

Constructors

Bucket 

Fields

bucketFilePath :: Maybe FilePath

Physical location of the file, if known.

bucketStartPos :: Integer

Starting position of the bucket in the file, in bytes.

bucketLength :: Maybe Integer

Maximum length of the bucket, in bytes.

If Nothing then the length is indeterminate, which is used when writing to files.

bucketHandle :: Handle

File handle for the bucket.

If several buckets have been created from a single file, then all buckets will have handles bound to that file, but they will be at different positions.

openBucket :: FilePath -> IOMode -> IO Bucket Source

Open a file as a single bucket.

hBucket :: Handle -> IO Bucket Source

Wrap an existing file handle as a bucket.

The starting position is set to 0.

Reading

fromFiles :: [FilePath] -> (Array B Bucket -> IO b) -> IO b Source

Open some files as buckets and use them as Sources.

fromFiles' Source

Arguments

:: (Bulk l FilePath, Target l Bucket) 
=> Array l FilePath

Files to open.

-> (Array l Bucket -> IO b)

Consumer.

-> IO b 

Like fromFiles', but take a list of file paths.

fromDir :: FilePath -> (Array B Bucket -> IO b) -> IO b Source

Open all the files in a directory as separate buckets.

This operation may fail with the same exceptions as getDirectoryContents.

fromSplitFile Source

Arguments

:: Int

Number of buckets.

-> (Word8 -> Bool)

Detect the end of a record.

-> FilePath

File to open.

-> (Array B Bucket -> IO b)

Consumer.

-> IO b 

Open a file containing atomic records and split it into the given number of evenly sized buckets.

The records are separated by a special terminating charater, which the given predicate detects. The file is split cleanly on record boundaries, so we get a whole number of records in each bucket. As the records can be of varying size the buckets are not guaranteed to have exactly the same length, in either records or buckets, though we try to give them the approximatly the same number of bytes.

fromSplitFileAt Source

Arguments

:: Int

Number of buckets.

-> (Word8 -> Bool)

Detect the end of a record.

-> FilePath

File to open.

-> Integer

Starting offset.

-> (Array B Bucket -> IO b)

Consumer.

-> IO b 

Like fromSplitFile but start at the given offset.

Writing

toFiles Source

Arguments

:: [FilePath]

File paths.

-> (Array B Bucket -> IO b)

Worker writes data to buckets.

-> IO b 

Open some files for writing as individual buckets and pass them to the given consumer.

toFiles' Source

Arguments

:: (Bulk l FilePath, Target l Bucket) 
=> Array l FilePath

File paths.

-> (Array l Bucket -> IO b)

Worker writes data to buckets. ^ Consumer.

-> IO b 

Like toFiles, but take an array of FilePaths.

toDir Source

Arguments

:: Int

Number of buckets to create.

-> FilePath

Path to directory.

-> (Array B Bucket -> IO b)

Consumer.

-> IO b 

Create a new directory of the given name, containing the given number of buckets.

If the directory is named somedir then the files are named somedir/000000, somedir/000001, somedir/000002 and so on.

toDirs Source

Arguments

:: Int

Number of buckets to create per directory.

-> [FilePath]

Paths to directories.

-> (Array (E B DIM2) Bucket -> IO b)

Consumer.

-> IO b 

Given a list of directories, create those directories and open the given number of output files per directory.

In the resulting array of buckets, the outer dimension indexes each directory, and the inner one indexes each file in its directory.

For each directory somedir the files are named somedir/000000, somedir/000001, somedir/000002 and so on.

Bucket IO

bClose :: Bucket -> IO () Source

Close a bucket, releasing the contained file handle.

bIsOpen :: Bucket -> IO Bool Source

Check if the bucket is currently open.

bAtEnd :: Bucket -> IO Bool Source

Check if the contained file handle is at the end of the bucket.

bSeek :: Bucket -> SeekMode -> Integer -> IO () Source

Seek to a position with a bucket.

bGetArray :: Bucket -> Integer -> IO (Array F Word8) Source

Get some data from a bucket.

bPutArray :: Bucket -> Array F Word8 -> IO () Source

Put some data in a bucket.