tar-conduit-0.1.0: Parse tar files using conduit for streaming

Safe HaskellNone
LanguageHaskell2010

Data.Conduit.Tar

Contents

Description

This module is about stream-processing tar archives. It is currently not very well tested. See the documentation of withEntries for an usage sample.

Synopsis

Basic functions

withEntry :: MonadThrow m => (Header -> ConduitM ByteString o m r) -> ConduitM TarChunk o m r Source #

Process a single tar entry. See withEntries for more details.

withEntries :: MonadThrow m => (Header -> ConduitM ByteString o m ()) -> ConduitM TarChunk o m () Source #

This function handles each entry of the tar archive according to the behaviour of the function passed as first argument.

Here is a full example function, that reads a compressed tar archive and for each entry that is a simple file, it prints its file path and SHA256 digest. Note that this function can throw exceptions!

import qualified Crypto.Hash.Conduit as CH
import qualified Data.Conduit.Tar    as CT

import Conduit
import Crypto.Hash (Digest, SHA256)
import Control.Monad (when)
import Data.Conduit.Zlib (ungzip)
import Data.ByteString (ByteString)

filedigests :: FilePath -> IO ()
filedigests fp = runConduitRes (  sourceFileBS fp          -- read the raw file
                               .| ungzip                   -- gunzip
                               .| CT.untar                 -- decode the tar archive
                               .| CT.withEntries hashentry -- process each file
                               .| printC                   -- print the results
                               )
    where
        hashentry :: Monad m => CT.Header -> Conduit ByteString m (FilePath, Digest SHA256)
        hashentry hdr = when (CT.headerFileType hdr == CT.FTNormal) $ do
            hash <- CH.sinkHash
            yield (CT.headerFilePath hdr, hash)

The hashentry function handles a single entry, based on its first Header argument. In this example, a Consumer is used to process the whole entry.

Note that the benefits of stream processing are easily lost when working with a Consumer. For example, the following implementation would have used an unbounded amount of memory:

        hashentry hdr = when (CT.headerFileType hdr == CT.FTNormal) $ do
            content <- mconcat <$> sinkList
            yield (CT.headerFilePath hdr, hash content)

Helper functions

Types

data TarException Source #

This the the exception type that is used in this module.

More constructors are susceptible to be added without bumping the major version of this module.

type Size = Int Source #