lcs-0.2: Find longest common sublist of two lists

Portabilitynon-portable (uses STUArray)




This is an implementation of the Hunt-Szymanski LCS algorithm. Derived from the description in "String searching algorithms" by Graham A Stephen, ISBN 981021829X.



We take two sequences, xs and ys, of length #xs and #ys.

First we make an array


such that

 (matchlist[i] = js) => ((j `elem` js) <=> (xs !! i == ys !! j))
                     && sort js == reverse js

i.e. matchlist[i] is the indices of elements of ys equal to the ith element of xs, in descending order.

Let #xys be the minimum of #xs and #ys. Trivially this is the maximum possible length of the LCS of xs and ys. Then we can imagine an array


such that k[i][l] = j where j is the smallest value such that the LCS of xs[0..i] and ys[0..j] has length l. We use #ys to mean there is no such j.

We will not need to whole array at once, though. Instead we use an array


representing a row of kk for a particular i. Initially it is for i = -1, so kk[0] = -1 and kk[l] = #ys otherwise. As the algorithm progresses we will increase i by one at the outer level and compute the replacement values for k's elements.

But we want more than just the length of the LCS, we also want the LCS itself. Another array


stores the list of xs indices an LCS of length l, if one is known, at revres[l].

Now, suppose kk contains k[i-1]. We consider each j in matchlist[i] in turn. We find the l such that k[l-1] < j <= k[l]. If j < k[l] then we updated k[l] to be j and set revres[l] to be i:revres[l-1].

Finding l is basically binary search, but there are some tricks we can do. First, as the js are decreasing the last l we had for this i is an upper bound on this l. Second, we use another array


to store the l we got last time for this j, initially all 1. As the values in kk[j] monotonically decrease this is a lower bound for l. We also test to see whether this old l is still l before we start the binary search.


lcs :: Ord a => [a] -> [a] -> [a]Source

The lcs function takes two lists and returns a list with a longest common subsequence of the two.