stringsearch-0.3.6.4: Fast searching, splitting and replacing of ByteStrings

Portabilitynon-portable (BangPatterns)
StabilityProvisional
MaintainerDaniel Fischer <daniel.is.fischer@googlemail.com>
Safe HaskellNone

Data.ByteString.Lazy.Search.KMP

Contents

Description

Fast search of lazy ByteString values using the Knuth-Morris-Pratt algorithm.

A description of the algorithm can be found at http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm.

Original authors: Justin Bailey (jgbailey at gmail.com) and Chris Kuklewicz (haskell at list.mightyreason.com).

Synopsis

Overview

This module provides two functions for finding the occurrences of a pattern in a target string using the Knuth-Morris-Pratt algorithm. It exists mostly for systematic reasons, the functions from Data.ByteString.Lazy.Search are much faster, except for very short patterns or long patterns with a short period if overlap is allowed. In the latter case, indices from this module may be the best choice since the Boyer-Moore function's performance degrades if there are many matches and the DFA function's automaton needs much space for long patterns. In the former case, for some pattern/target combinations DFA has better performance, for others KMP, usually the difference is small.

Complexity and Performance

The preprocessing of the pattern is O(patternLength) in time and space. The time complexity of the searching phase is O(targetLength) for both functions.

In most cases, these functions are considerably slower than the Boyer-Moore variants, performance is close to that of those from Data.ByteString.Search.DFA.

Partial application

Both functions can be usefully partially applied. Given only a pattern, the auxiliary data will be computed only once, allowing for efficient re-use.

Functions

indicesSource

Arguments

:: ByteString

Strict pattern to find

-> ByteString

Lazy string to search

-> [Int64]

Offsets of matches

indices finds the starting indices of all possibly overlapping occurrences of the pattern in the target string. If the pattern is empty, the result is [0 .. length target].

nonOverlappingIndicesSource

Arguments

:: ByteString

Strict pattern to find

-> ByteString

Lazy string to search

-> [Int64]

Offsets of matches

nonOverlappingIndices finds the starting indices of all non-overlapping occurrences of the pattern in the target string. It is more efficient than removing indices from the list produced by indices.

Convenience

strictify :: ByteString -> ByteStringSource

strictify transforms a lazy ByteString into a strict ByteString, to make it a suitable pattern for the searching functions.