darcs-2.14.3: a distributed, interactive, smart revision control system

Copyright2003 David Roundy
2005 Benedikt Schmidt
Safe HaskellNone



LCS stands for Longest Common Subsequence, and it is a relatively challenging problem to find an LCS efficiently. This module implements the algorithm described in:

"An O(ND) Difference Algorithm and its Variations", Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251-266; especially the variation described in section 4.2 and most refinements implemented in GNU diff (D is the edit-distance).

There is currently no heuristic to reduce the running time and produce suboptimal output for large inputs with many differences. It behaves like GNU diff with the -d option in this regard.

In the first step, a hash value for every line is calculated and collisions are marked with a special value. This reduces a string comparison to an int comparison for line tuples where at least one of the hash values is not equal to the special value. After that, lines which only exists in one of the files are removed and marked as changed which reduces the running time of the following difference algorithm. GNU diff additionally removes lines that appear very often in the other file in some cases. The last step tries to create longer changed regions and line up deletions in the first file to insertions in the second by shifting changed lines forward and backward.



getChanges :: [ByteString] -> [ByteString] -> [(Int, [ByteString], [ByteString])] Source #

create a list of changes between a and b, each change has the form (starta, lima, startb, limb) which means that a[starta, lima) has to be replaced by b[startb, limb)

shiftBoundaries :: BSTArray s -> BSTArray s -> PArray -> Int -> Int -> ST s () Source #

try to create nicer diffs by shifting around regions of changed lines

aLen :: IArray a e => a Int e -> Int Source #