Portability | non-portable (uses STUArray) |
---|---|

Stability | provisional |

Maintainer | igloo@earth.li |

This is an implementation of the Hunt-Szymanski LCS algorithm. Derived from the description in "String searching algorithms" by Graham A Stephen, ISBN 981021829X.

# Algorithm

We take two sequences, `xs`

and `ys`

, of length `#xs`

and `#ys`

.

First we make an array

matchlist[i=0..(#xs-1)]

such that

(matchlist[i] = js) => ((j `elem` js) <=> (xs !! i == ys !! j)) && sort js == reverse js

i.e. `matchlist[i]`

is the indices of elements of `ys`

equal to the
ith element of `xs`

, in descending order.

Let `#xys`

be the minimum of `#xs`

and `#ys`

. Trivially this is the maximum
possible length of the LCS of `xs`

and `ys`

. Then we can imagine an array

k[i=0..#xs][l=0..#xys]

such that `k[i][l] = j`

where `j`

is the smallest value such that the
LCS of `xs[0..i]`

and `ys[0..j]`

has length `l`

. We use `#ys`

to
mean there is no such `j`

.

We will not need to whole array at once, though. Instead we use an array

kk[l=0..#xys]

representing a row of `kk`

for a particular `i`

. Initially it is for
`i = -1`

, so `kk[0] = -1`

and `kk[l] = #ys`

otherwise. As the algorithm
progresses we will increase `i`

by one at the outer level and compute
the replacement values for `k`

's elements.

But we want more than just the length of the LCS, we also want the LCS itself. Another array

revres[l=0..#xys]

stores the list of `xs`

indices an LCS of length `l`

, if one is known,
at `revres[l]`

.

Now, suppose `kk`

contains `k[i-1]`

. We consider each `j`

in `matchlist[i]`

in turn. We find the `l`

such that `k[l-1] < j <= k[l]`

. If `j < k[l]`

then
we updated `k[l]`

to be `j`

and set `revres[l]`

to be `i:revres[l-1]`

.

Finding `l`

is basically binary search, but there are some tricks we can
do. First, as the `j`

s are decreasing the last `l`

we had for this `i`

is
an upper bound on this `l`

. Second, we use another array

lastl[j=0..#ys-1]

to store the `l`

we got last time for this `j`

, initially all `1`

. As the
values in `kk[j]`

monotonically decrease this is a lower bound for `l`

.
We also test to see whether this old `l`

is still `l`

before we start the
binary search.