The case-insensitive-match package

[Tags:benchmark, bsd3, library, program, test]

A way to do case-insensitive string matching and comparison with less overhead and more speed. The Data.CaseInsensitive.Eq module offers simplified syntax and optimized instances for ByteString, String and Text. In particular, the ByteString implementation assumes ISO-8859-1 (8-bit) encoding and performs benchmark testing significantly faster than other implementations.


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.1.0
Change log CHANGELOG
Dependencies base (==4.*), bytestring, case-insensitive-match, text [details]
License BSD3
Copyright (c) 2016 Michael Hatfield
Author Michael Hatfield
Maintainer github@michael-hatfield.com
Category Text
Home page https://github.com/mikehat/case-insensitive-match
Bug tracker https://github.com/mikehat/case-insensitive-match
Source repository head: git clone git://github.com/mikehat/case-insensitive-match.git -b master
this: git clone git://github.com/mikehat/case-insensitive-match.git -b master(tag 0.1.1.0)
Uploaded Thu Aug 4 03:44:46 UTC 2016 by mikehat
Distributions NixOS:0.1.1.0
Downloads 75 total (9 in the last 30 days)
Votes
0 []
Status Docs uploaded by user
Build status unknown [no reports yet]

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Readme for case-insensitive-match

Readme for case-insensitive-match-0.1.1.0

case-insensitive-match 0.1.1.0

Here is a simplified library for matching and comparing strings in a case-insensitive manner. The only dependencies are base, bytestring and text.

Usage is simple

-- normal string comparison
"href"    /=  "HREF"
"apples"  /=  "oranges"
"Smith"   <   "Jones

-- case-insensitive comparison
"href"    ^== "HREF"
"appples" ^/= "oranges"
"jones"   ^<  "Smith"

-- sorting some data structurue
get_names p = (last_name p,first_name p)
sortBy (caseInsensitiveComparing get_names) people

Benchmarks

The benchmarks are pretty comprehensive, offering comparisons with other algorithms, including the case-insensitive package and simple case folding using the base package. Before simply running the bench-others executable, check the source code or you'll end up with a long series of 360 benchmark tests. You'll want something like bench-others -m glob ByteString/Short/*/*, which runs only 36 benchmarks. The heirarchy is <data-type>/<string-length>/<match-type>/<algorithm>. As usual, performance comparisons depend heavily on use-cases, but for matching shorter strings that are often unequal this algorithm is clearly fastest.

There is also a real-world bench test that compares different algorithms while looking for links in an HTML file with Text.HTML.TagSoup. This bench involves a lot of work other than string comparison, so the differences between algorithms is slim, but usually measurable. Build an run:

$ cabal build bench-tagsoup
...

$ curl -s 'https://hackage.haskell.org/packages/names' > sample/hackage-names.html
$ dist/build/bench-tagsoup/bench-tagsoup < sample/hackage-names.html
...

Testing

It would be quite involved to build a perfectly comprehensive testing module, but the test-basics executable is tests multiple cases against all supported data types.

Sample

Here is a sample:

{-# LANGUAGE OverloadedStrings #-}

module Main ( main ) where

import           Data.List
import           Data.CaseInsensitive
import qualified Data.ByteString.Char8 as BS

main = do
    stdin <- BS.getContents
    let sorted_names = map join_name $ sortBy caseInsensitiveCompare $ map split_name $ BS.lines stdin
    mapM_ BS.putStrLn sorted_names


split_name name = (last,BS.drop 2 first)
    where (last,first) = BS.span (/= ',') name

join_name (last,first) = BS.concat [ last , ", " , first ]

Try it with:

$ cabal build readme-sample
...

$ dist/build/readme-sample/readme-sample < sample/declaration-signers.txt