optimal-blocks-0.1.0: Optimal Block boundary determination for rsync-like behaviours

Safe HaskellNone
LanguageHaskell2010

Algorithm.OptimalBlocks.BuzzHash

Description

This is a rolling hash used to calculate the hashes of fixed-length sequences of characters within a given ByteString. BuzzHash is nice in that it makes this calculation very efficient.

Synopsis

Documentation

hashes Source

Arguments

:: WindowSize

How many bytes to put into each hash

-> ByteString

The ByteString to calculate hashes of.

-> Vector Word64 

Determine the hash of every len sequence of bytes in the given ByteString. Because this uses BuzzHash, a rolling hash, this runs in O(n) time dependent on the length of bs, and independent of len.

This will generate (length bs - len + 1) 64-bit hashes.

This function really doesn't serve any purpose anymore, and is just kept around as a sanity check for the much faster split function.

split Source

Arguments

:: WindowSize

How many bytes to use when generating the pattern hash

-> BlockShift

log2 of the desired block size

-> ByteString

The ByteString to split

-> (ByteString, ByteString) 

Split up a ByteString into a complete block and a remainder. If no break point is found in the given string, the first element of the resulting pair will be empty.

slowSplit Source

Arguments

:: WindowSize

How many bytes to use for pattern search

-> BlockShift

log2 of the desired block size

-> ByteString

The ByteString to split

-> (ByteString, ByteString) 

Same thing as split, but actually calculates the hash of every window explicitly. This is really slow, and is only used to validate the results of split

data Hash Source

A representation of a hash that allows rolling hashes to be easily calculated.

Constructors

Hash 

Fields

windowLength :: !WindowSize
 
currentVal :: !Word64
 

Instances

init :: ByteString -> Hash Source

Create a Hash instance using an entire ByteString. This doesn't have any sort of length argument to do partial ByteStrings because ByteString supports efficient slices on its own.

roll :: WindowSize -> Word64 -> Word8 -> Word8 -> Word64 Source

Roll the Hash to the next byte over in the ByteString being hashed. This doesn't do any sort of checking to ensure that old and new are actually correct, so this is probably easy to mess up when using it manually. The expected usage is that one would initialize a hash using init on the beginning of some ByteString, and then to calculate the hash of each sequence of bytes one would manually track the first and last byte of each window on the ByteString. hashes does this for you...

h :: Word8 -> Word64 Source

Upgrade an 8-bit word to a 64-bit one that is very "different" from all the other possible 8-bit words. This library uses SipHash to do this.