Copyright | (c) 2007 Bjorn Bringert 2008 Andrea Vezzosi 2008-2012 Duncan Coutts |
---|---|
License | BSD3 |
Maintainer | duncan@community.haskell.org |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Reading, writing and manipulating ".tar
" archive files.
This module uses common names and so is designed to be imported qualified:
import qualified Codec.Archive.Tar as Tar
Synopsis
- create :: FilePath -> FilePath -> [FilePath] -> IO ()
- extract :: FilePath -> FilePath -> IO ()
- append :: FilePath -> FilePath -> [FilePath] -> IO ()
- read :: ByteString -> Entries FormatError
- write :: [Entry] -> ByteString
- pack :: FilePath -> [FilePath] -> IO [Entry]
- packAndCheck :: (GenEntry FilePath FilePath -> Maybe SomeException) -> FilePath -> [FilePath] -> IO [Entry]
- unpack :: Exception e => FilePath -> Entries e -> IO ()
- unpackAndCheck :: Exception e => (GenEntry FilePath FilePath -> Maybe SomeException) -> FilePath -> Entries e -> IO ()
- data GenEntry tarPath linkTarget
- type Entry = GenEntry TarPath LinkTarget
- entryPath :: GenEntry TarPath linkTarget -> FilePath
- entryContent :: GenEntry tarPath linkTarget -> GenEntryContent linkTarget
- data GenEntryContent linkTarget
- = NormalFile ByteString !FileSize
- | Directory
- | SymbolicLink !linkTarget
- | HardLink !linkTarget
- | CharacterDevice !DevMajor !DevMinor
- | BlockDevice !DevMajor !DevMinor
- | NamedPipe
- | OtherEntryType !TypeCode ByteString !FileSize
- type EntryContent = GenEntryContent LinkTarget
- data GenEntries tarPath linkTarget e
- = Next (GenEntry tarPath linkTarget) (GenEntries tarPath linkTarget e)
- | Done
- | Fail e
- type Entries e = GenEntries TarPath LinkTarget e
- mapEntries :: (GenEntry tarPath linkTarget -> Either e' (GenEntry tarPath linkTarget)) -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget (Either e e')
- mapEntriesNoFail :: (GenEntry tarPath linkTarget -> GenEntry tarPath linkTarget) -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e
- foldEntries :: (GenEntry tarPath linkTarget -> a -> a) -> a -> (e -> a) -> GenEntries tarPath linkTarget e -> a
- foldlEntries :: (a -> GenEntry tarPath linkTarget -> a) -> a -> GenEntries tarPath linkTarget e -> Either (e, a) a
- unfoldEntries :: (a -> Either e (Maybe (GenEntry tarPath linkTarget, a))) -> a -> GenEntries tarPath linkTarget e
- encodeLongNames :: GenEntry FilePath FilePath -> [Entry]
- decodeLongNames :: Entries e -> GenEntries FilePath FilePath (Either e DecodeLongNamesError)
- data DecodeLongNamesError
- data FormatError
Documentation
Tar archive files are used to store a collection of other files in a single file. They consists of a sequence of entries. Each entry describes a file or directory (or some other special kind of file). The entry stores a little bit of meta-data, in particular the file or directory name.
Unlike some other archive formats, a tar file contains no index. The information about each entry is stored next to the entry. Because of this, tar files are almost always processed linearly rather than in a random-access fashion.
The functions in this package are designed for working on tar files linearly and lazily. This makes it possible to do many operations in constant space rather than having to load the entire archive into memory.
It can read and write standard POSIX tar files and also the GNU and old Unix V7 tar formats. The convenience functions that are provided in the Codec.Archive.Tar.Entry module for creating archive entries are primarily designed for standard portable archives. If you need to construct GNU format archives or exactly preserve file ownership and permissions then you will need to write some extra helper functions.
This module contains just the simple high level operations without exposing the all the details of tar files. If you need to inspect tar entries in more detail or construct them directly then you also need the module Codec.Archive.Tar.Entry.
High level "all in one" operations
:: FilePath | Path of the ".tar" file to write. |
-> FilePath | Base directory |
-> [FilePath] | Files and directories to archive, relative to base dir |
-> IO () |
Create a new ".tar"
file from a directory of files.
It is equivalent to calling the standard tar
program like so:
$ tar -f tarball.tar -C base -c dir
This assumes a directory ./base/dir
with files inside, eg
./base/dir/foo.txt
. The file names inside the resulting tar file will be
relative to dir
, eg dir/foo.txt
.
This is a high level "all in one" operation. Since you may need variations on this function it is instructive to see how it is written. It is just:
import qualified Data.ByteString.Lazy as BL BL.writeFile tar . Tar.write =<< Tar.pack base paths
Notes:
The files and directories must not change during this operation or the result is not well defined.
The intention of this function is to create tarballs that are portable between systems. It is not suitable for doing file system backups because file ownership and permissions are not fully preserved. File ownership is not preserved at all. File permissions are set to simple portable values:
rw-r--r--
for normal filesrwxr-xr-x
for executable filesrwxr-xr-x
for directories
Extract all the files contained in a ".tar"
file.
It is equivalent to calling the standard tar
program like so:
$ tar -x -f tarball.tar -C dir
So for example if the tarball.tar
file contains foo/bar.txt
then this
will extract it to dir/foo/bar.txt
.
This is a high level "all in one" operation. Since you may need variations on this function it is instructive to see how it is written. It is just:
import qualified Data.ByteString.Lazy as BL Tar.unpack dir . Tar.read =<< BL.readFile tar
Notes:
Extracting can fail for a number of reasons. The tarball may be incorrectly formatted. There may be IO or permission errors. In such cases an exception will be thrown and extraction will not continue.
Since the extraction may fail part way through it is not atomic. For this reason you may want to extract into an empty directory and, if the extraction fails, recursively delete the directory.
Security: only files inside the target directory will be written. Tarballs containing entries that point outside of the tarball (either absolute paths or relative paths) will be caught and an exception will be thrown.
Notes
Compressed tar archives
Tar files are commonly used in conjunction with compression, as in
.tar.gz
or .tar.bz2
files. This module does not directly
handle compressed tar files however they can be handled easily by
composing functions from this module and the modules
Codec.Compression.GZip
or
Codec.Compression.BZip
.
Creating a compressed .tar.gz
file is just a minor variation on the
create
function, but where throw compression into the pipeline:
import qualified Data.ByteString.Lazy as BL import qualified Codec.Compression.GZip as GZip BL.writeFile tar . GZip.compress . Tar.write =<< Tar.pack base dir
Similarly, extracting a compressed .tar.gz
is just a minor variation
on the extract
function where we use decompression in the pipeline:
import qualified Data.ByteString.Lazy as BL import qualified Codec.Compression.GZip as GZip Tar.unpack dir . Tar.read . GZip.decompress =<< BL.readFile tar
Security
This is pretty important. A maliciously constructed tar archives could
contain entries that specify bad file names. It could specify absolute
file names like /etc/passwd
or relative files outside of the
archive like ../../../something
. This security problem is commonly
called a "directory traversal vulnerability". Historically, such
vulnerabilities have been common in packages handling tar archives.
The extract
and unpack
functions check for bad file names. See the
checkSecurity
function for more details.
If you need to do any custom
unpacking then you should use this.
Tarbombs
A "tarbomb" is a .tar
file where not all entries are in a
subdirectory but instead files extract into the top level directory. The
extract
function does not check for these however if you want to do
that you can use the checkTarbomb
function like so:
import Control.Exception (SomeException(..)) import Control.Applicative ((<|>)) import qualified Data.ByteString.Lazy as BL Tar.unpackAndCheck (\x -> SomeException <$> checkEntryTarbomb expectedDir x <|> SomeException <$> checkEntrySecurity x) dir . Tar.read =<< BL.readFile tar
In this case extraction will fail if any file is outside of expectedDir
.
Converting between internal and external representation
Note, you cannot expect write . read
to give exactly the same output
as input. You can expect the information to be preserved exactly however.
This is because read
accepts common format variations while write
produces the standard format.
read :: ByteString -> Entries FormatError Source #
write :: [Entry] -> ByteString Source #
Create the external representation of a tar archive by serialising a list of tar entries.
- The conversion is done lazily.
Packing and unpacking files to/from internal representation
These functions are for packing and unpacking portable archives. They are not suitable in cases where it is important to preserve file ownership and permissions or to archive special files like named pipes and Unix device files.
:: FilePath | Base directory |
-> [FilePath] | Files and directories to pack, relative to the base dir |
-> IO [Entry] |
Creates a tar archive from a list of directory or files. Any directories specified will have their contents included recursively. Paths in the archive will be relative to the given base directory.
This is a portable implementation of packing suitable for portable archives.
In particular it only constructs NormalFile
, Directory
and SymbolicLink
entries. Hard links are treated like ordinary files. Special files like
FIFOs (named pipes), sockets or device files will cause problems.
- This function returns results lazily. Subdirectories are scanned
and files are read one by one as the list of entries is consumed.
Do not change their contents before the output of
pack
was consumed in full.
:: (GenEntry FilePath FilePath -> Maybe SomeException) | |
-> FilePath | Base directory |
-> [FilePath] | Files and directories to pack, relative to the base dir |
-> IO [Entry] |
Like pack
, but allows to specify additional sanity/security
checks on the input filenames. This is useful if you know which
check will be used on client side
in unpack
/ unpackAndCheck
.
Since: 0.6.0.0
Create local files and directories based on the entries of a tar archive.
This is a portable implementation of unpacking suitable for portable
archives. It handles NormalFile
and Directory
entries and has simulated
support for SymbolicLink
and HardLink
entries. Links are implemented by
copying the target file. This therefore works on Windows as well as Unix.
All other entry types are ignored, that is they are not unpacked and no
exception is raised.
If the Entries
ends in an error then it is raised an an exception. Any
files or directories that have been unpacked before the error was
encountered will not be deleted. For this reason you may want to unpack
into an empty directory so that you can easily clean up if unpacking fails
part-way.
On its own, this function only checks for security (using checkEntrySecurity
).
Use unpackAndCheck
if you need more checks.
:: Exception e | |
=> (GenEntry FilePath FilePath -> Maybe SomeException) | Checks to run on each entry before unpacking |
-> FilePath | Base directory |
-> Entries e | Entries to upack |
-> IO () |
Like unpack
, but run custom sanity/security checks instead of checkEntrySecurity
.
For example,
import Control.Exception (SomeException(..)) import Control.Applicative ((<|>)) unpackAndCheck (\x -> SomeException <$> checkEntryPortability x <|> SomeException <$> checkEntrySecurity x) dir entries
Since: 0.6.0.0
Types
Tar entry type
This module provides only very simple and limited read-only access to
the GenEntry
type. If you need access to the details or if you need to
construct your own entries then also import Codec.Archive.Tar.Entry.
data GenEntry tarPath linkTarget Source #
Polymorphic tar archive entry. High-level interfaces
commonly work with GenEntry
FilePath
FilePath
,
while low-level ones use GenEntry
TarPath
LinkTarget
.
Since: 0.6.0.0
type Entry = GenEntry TarPath LinkTarget Source #
Monomorphic tar archive entry, ready for serialization / deserialization.
entryPath :: GenEntry TarPath linkTarget -> FilePath Source #
Low-level function to get a native FilePath
of the file or directory
within the archive, not accounting for long names. It's likely
that you want to apply decodeLongNames
and use entryTarPath
afterwards instead of entryPath
.
entryContent :: GenEntry tarPath linkTarget -> GenEntryContent linkTarget Source #
The real content of the entry. For NormalFile
this includes the
file data. An entry usually contains a NormalFile
or a Directory
.
data GenEntryContent linkTarget Source #
Polymorphic content of a tar archive entry. High-level interfaces
commonly work with GenEntryContent
FilePath
,
while low-level ones use GenEntryContent
LinkTarget
.
Portable archives should contain only NormalFile
and Directory
.
Since: 0.6.0.0
NormalFile ByteString !FileSize | |
Directory | |
SymbolicLink !linkTarget | |
HardLink !linkTarget | |
CharacterDevice !DevMajor !DevMinor | |
BlockDevice !DevMajor !DevMinor | |
NamedPipe | |
OtherEntryType !TypeCode ByteString !FileSize |
Instances
type EntryContent = GenEntryContent LinkTarget Source #
Monomorphic content of a tar archive entry, ready for serialization / deserialization.
Sequences of tar entries
data GenEntries tarPath linkTarget e Source #
Polymorphic sequence of archive entries.
High-level interfaces
commonly work with GenEntries
FilePath
FilePath
,
while low-level ones use GenEntries
TarPath
LinkTarget
.
The point of this type as opposed to just using a list is that it makes the failure case explicit. We need this because the sequence of entries we get from reading a tarball can include errors.
Converting from a list can be done with just foldr Next Done
. Converting
back into a list can be done with foldEntries
however in that case you
must be prepared to handle the Fail
case inherent in the Entries
type.
The Monoid
instance lets you concatenate archives or append entries to an
archive.
Since: 0.6.0.0
Next (GenEntry tarPath linkTarget) (GenEntries tarPath linkTarget e) infixr 5 | |
Done | |
Fail e |
Instances
Foldable (GenEntries tarPath linkTarget) Source # | Since: 0.6.0.0 |
Defined in Codec.Archive.Tar.Types fold :: Monoid m => GenEntries tarPath linkTarget m -> m # foldMap :: Monoid m => (a -> m) -> GenEntries tarPath linkTarget a -> m # foldMap' :: Monoid m => (a -> m) -> GenEntries tarPath linkTarget a -> m # foldr :: (a -> b -> b) -> b -> GenEntries tarPath linkTarget a -> b # foldr' :: (a -> b -> b) -> b -> GenEntries tarPath linkTarget a -> b # foldl :: (b -> a -> b) -> b -> GenEntries tarPath linkTarget a -> b # foldl' :: (b -> a -> b) -> b -> GenEntries tarPath linkTarget a -> b # foldr1 :: (a -> a -> a) -> GenEntries tarPath linkTarget a -> a # foldl1 :: (a -> a -> a) -> GenEntries tarPath linkTarget a -> a # toList :: GenEntries tarPath linkTarget a -> [a] # null :: GenEntries tarPath linkTarget a -> Bool # length :: GenEntries tarPath linkTarget a -> Int # elem :: Eq a => a -> GenEntries tarPath linkTarget a -> Bool # maximum :: Ord a => GenEntries tarPath linkTarget a -> a # minimum :: Ord a => GenEntries tarPath linkTarget a -> a # sum :: Num a => GenEntries tarPath linkTarget a -> a # product :: Num a => GenEntries tarPath linkTarget a -> a # | |
Traversable (GenEntries tarPath linkTarget) Source # | Since: 0.6.0.0 |
Defined in Codec.Archive.Tar.Types traverse :: Applicative f => (a -> f b) -> GenEntries tarPath linkTarget a -> f (GenEntries tarPath linkTarget b) # sequenceA :: Applicative f => GenEntries tarPath linkTarget (f a) -> f (GenEntries tarPath linkTarget a) # mapM :: Monad m => (a -> m b) -> GenEntries tarPath linkTarget a -> m (GenEntries tarPath linkTarget b) # sequence :: Monad m => GenEntries tarPath linkTarget (m a) -> m (GenEntries tarPath linkTarget a) # | |
Functor (GenEntries tarPath linkTarget) Source # | |
Defined in Codec.Archive.Tar.Types fmap :: (a -> b) -> GenEntries tarPath linkTarget a -> GenEntries tarPath linkTarget b # (<$) :: a -> GenEntries tarPath linkTarget b -> GenEntries tarPath linkTarget a # | |
Monoid (GenEntries tarPath linkTarget e) Source # | |
Defined in Codec.Archive.Tar.Types mempty :: GenEntries tarPath linkTarget e # mappend :: GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e # mconcat :: [GenEntries tarPath linkTarget e] -> GenEntries tarPath linkTarget e # | |
Semigroup (GenEntries tarPath linkTarget e) Source # | Since: 0.5.1.0 |
Defined in Codec.Archive.Tar.Types (<>) :: GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e # sconcat :: NonEmpty (GenEntries tarPath linkTarget e) -> GenEntries tarPath linkTarget e # stimes :: Integral b => b -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e # | |
(Show tarPath, Show linkTarget, Show e) => Show (GenEntries tarPath linkTarget e) Source # | |
Defined in Codec.Archive.Tar.Types showsPrec :: Int -> GenEntries tarPath linkTarget e -> ShowS # show :: GenEntries tarPath linkTarget e -> String # showList :: [GenEntries tarPath linkTarget e] -> ShowS # | |
(NFData tarPath, NFData linkTarget, NFData e) => NFData (GenEntries tarPath linkTarget e) Source # | |
Defined in Codec.Archive.Tar.Types rnf :: GenEntries tarPath linkTarget e -> () # | |
(Eq tarPath, Eq linkTarget, Eq e) => Eq (GenEntries tarPath linkTarget e) Source # | |
Defined in Codec.Archive.Tar.Types (==) :: GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e -> Bool # (/=) :: GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e -> Bool # |
type Entries e = GenEntries TarPath LinkTarget e Source #
Monomorphic sequence of archive entries, ready for serialization / deserialization.
:: (GenEntry tarPath linkTarget -> Either e' (GenEntry tarPath linkTarget)) | Function to apply to each entry |
-> GenEntries tarPath linkTarget e | Input sequence |
-> GenEntries tarPath linkTarget (Either e e') |
This is like the standard map
function on lists, but for Entries
. It
includes failure as a extra possible outcome of the mapping function.
If your mapping function cannot fail it may be more convenient to use
mapEntriesNoFail
mapEntriesNoFail :: (GenEntry tarPath linkTarget -> GenEntry tarPath linkTarget) -> GenEntries tarPath linkTarget e -> GenEntries tarPath linkTarget e Source #
Like mapEntries
but the mapping function itself cannot fail.
foldEntries :: (GenEntry tarPath linkTarget -> a -> a) -> a -> (e -> a) -> GenEntries tarPath linkTarget e -> a Source #
This is like the standard foldr
function on lists, but for Entries
.
Compared to foldr
it takes an extra function to account for the
possibility of failure.
This is used to consume a sequence of entries. For example it could be used to scan a tarball for problems or to collect an index of the contents.
foldlEntries :: (a -> GenEntry tarPath linkTarget -> a) -> a -> GenEntries tarPath linkTarget e -> Either (e, a) a Source #
A foldl
-like function on Entries. It either returns the final
accumulator result, or the failure along with the intermediate accumulator
value.
unfoldEntries :: (a -> Either e (Maybe (GenEntry tarPath linkTarget, a))) -> a -> GenEntries tarPath linkTarget e Source #
This is like the standard unfoldr
function on lists, but for Entries
.
It includes failure as an extra possibility that the stepper function may
return.
It can be used to generate Entries
from some other type. For example it is
used internally to lazily unfold entries from a ByteString
.
Long file names
encodeLongNames :: GenEntry FilePath FilePath -> [Entry] Source #
Translate high-level entries with POSIX FilePath
s for files and symlinks
into entries suitable for serialization by emitting additional
OtherEntryType
'K'
and OtherEntryType
'L'
nodes.
Input FilePath
s must be POSIX file names, not native ones.
Since: 0.6.0.0
decodeLongNames :: Entries e -> GenEntries FilePath FilePath (Either e DecodeLongNamesError) Source #
Translate low-level entries (usually freshly deserialized) into
high-level entries with POSIX FilePath
s for files and symlinks
by parsing and eliminating
OtherEntryType
'K'
and OtherEntryType
'L'
nodes.
Resolved FilePath
s are still POSIX file names, not native ones.
Since: 0.6.0.0
data DecodeLongNamesError Source #
Errors raised by decodeLongNames
.
Since: 0.6.0.0
TwoTypeKEntries | Two adjacent |
TwoTypeLEntries | Two adjacent |
NoLinkEntryAfterTypeKEntry |
|
Instances
Exception DecodeLongNamesError Source # | |
Show DecodeLongNamesError Source # | |
Defined in Codec.Archive.Tar.LongNames showsPrec :: Int -> DecodeLongNamesError -> ShowS # show :: DecodeLongNamesError -> String # showList :: [DecodeLongNamesError] -> ShowS # | |
Eq DecodeLongNamesError Source # | |
Defined in Codec.Archive.Tar.LongNames (==) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # (/=) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # | |
Ord DecodeLongNamesError Source # | |
Defined in Codec.Archive.Tar.LongNames compare :: DecodeLongNamesError -> DecodeLongNamesError -> Ordering # (<) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # (<=) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # (>) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # (>=) :: DecodeLongNamesError -> DecodeLongNamesError -> Bool # max :: DecodeLongNamesError -> DecodeLongNamesError -> DecodeLongNamesError # min :: DecodeLongNamesError -> DecodeLongNamesError -> DecodeLongNamesError # |
Error handling
Reading tar files can fail if the data does not match the tar file format correctly.
The style of error handling by returning structured errors. The pure
functions in the library do not throw exceptions, they return the errors
as data. The IO actions in the library can throw exceptions, in particular
the unpack
action does this. All the error types used are an instance of
the standard Exception
class so it is possible to throw
and catch
them.
Errors from reading tar files
data FormatError Source #
Errors that can be encountered when parsing a Tar archive.
TruncatedArchive | |
ShortTrailer | |
BadTrailer | |
TrailingJunk | |
ChecksumIncorrect | |
NotTarFormat | |
UnrecognisedTarFormat | |
HeaderBadNumericEncoding |
Instances
Exception FormatError Source # | |
Defined in Codec.Archive.Tar.Read | |
Show FormatError Source # | |
Defined in Codec.Archive.Tar.Read showsPrec :: Int -> FormatError -> ShowS # show :: FormatError -> String # showList :: [FormatError] -> ShowS # | |
NFData FormatError Source # | |
Defined in Codec.Archive.Tar.Read rnf :: FormatError -> () # | |
Eq FormatError Source # | |
Defined in Codec.Archive.Tar.Read (==) :: FormatError -> FormatError -> Bool # (/=) :: FormatError -> FormatError -> Bool # |