hashed-storage-0.3.3: Hashed file storage support code.Source codeContentsIndex

This module contains plain tree indexing code.

The index is a binary file, that overlays a hashed tree over the working copy. This means that every working file and directory has an entry in the index, that contains its path and hash and validity data. The validity data is a last seen timestamp plus the file size. The file hashes are sha256's of the file's content.

There are two entry types, a file entry and a directory entry. Both have a common binary format (see Item). The on-disk format is best described by peekItem.

For each file, the index has a copy of the timestamp taken at the instant when the hash has been computed. This means that when file size and timestamp of a file in working copy matches those in the index, we assume that the hash stored in the index for given file is valid. These hashes are then exposed in the resulting Tree object, and can be leveraged by eg. diffTrees to compare many files quickly.

You may have noticed that we also keep hashes of directories. These are assumed to be valid whenever the complete subtree has had valid timestamps. At any point, as soon as a size or timestamp mismatch is found, the working file in question is opened, its hash (and timestamp and size) is recomputed and updated in-place in the index file (everything lives at a fixed offset and is fixed size, so this isn't an issue). This is also true of directories: when a file in a directory changes hash, this triggers recomputation of all of its parent directory hashes; moreover this is done efficiently -- each directory is updated at most once during a run.

readIndex :: FilePath -> (Tree -> Hash) -> IO Tree
updateIndexFrom :: FilePath -> (Tree -> Hash) -> Tree -> IO Tree
readIndex :: FilePath -> (Tree -> Hash) -> IO TreeSource
Read an index and build up a Tree object from it, referring to current working directory. Any parts of the index that are out of date are updated in-place. The result is always an up-to-date index. Also, the Tree is stubby and only the pieces of the index that are expanded will be actually updated! To implement a subtree query, you can use Tree.filter and then expand the result. Otherwise just expand the whole tree to avoid unexpected problems.
updateIndexFrom :: FilePath -> (Tree -> Hash) -> Tree -> IO TreeSource
Will add and remove files in index to make it match the Tree object given (it is an error for the Tree to contain a file or directory that does not exist in a plain form in current working directory).
Produced by Haddock version 2.4.2