lcs-0.2: Find longest common sublist of two lists

Portabilitynon-portable (uses STUArray)
Stabilityprovisional
Maintainerigloo@earth.li

Data.List.LCS.HuntSzymanski

Contents

Description

This is an implementation of the Hunt-Szymanski LCS algorithm. Derived from the description in "String searching algorithms" by Graham A Stephen, ISBN 981021829X.

Synopsis

Algorithm

We take two sequences, xs and ys, of length #xs and #ys.

First we make an array

 matchlist[i=0..(#xs-1)]

such that

 (matchlist[i] = js) => ((j `elem` js) <=> (xs !! i == ys !! j))
                     && sort js == reverse js

i.e. matchlist[i] is the indices of elements of ys equal to the ith element of xs, in descending order.

Let #xys be the minimum of #xs and #ys. Trivially this is the maximum possible length of the LCS of xs and ys. Then we can imagine an array

 k[i=0..#xs][l=0..#xys]

such that k[i][l] = j where j is the smallest value such that the LCS of xs[0..i] and ys[0..j] has length l. We use #ys to mean there is no such j.

We will not need to whole array at once, though. Instead we use an array

 kk[l=0..#xys]

representing a row of kk for a particular i. Initially it is for i = -1, so kk[0] = -1 and kk[l] = #ys otherwise. As the algorithm progresses we will increase i by one at the outer level and compute the replacement values for k's elements.

But we want more than just the length of the LCS, we also want the LCS itself. Another array

 revres[l=0..#xys]

stores the list of xs indices an LCS of length l, if one is known, at revres[l].

Now, suppose kk contains k[i-1]. We consider each j in matchlist[i] in turn. We find the l such that k[l-1] < j <= k[l]. If j < k[l] then we updated k[l] to be j and set revres[l] to be i:revres[l-1].

Finding l is basically binary search, but there are some tricks we can do. First, as the js are decreasing the last l we had for this i is an upper bound on this l. Second, we use another array

 lastl[j=0..#ys-1]

to store the l we got last time for this j, initially all 1. As the values in kk[j] monotonically decrease this is a lower bound for l. We also test to see whether this old l is still l before we start the binary search.

LCS

lcs :: Ord a => [a] -> [a] -> [a]Source

The lcs function takes two lists and returns a list with a longest common subsequence of the two.