The case-insensitive-match package

[ Tags: benchmark, bsd3, library, program, text ] [ Propose Tags ]

A way to do case-insensitive string matching and comparison with less overhead and more speed. The Data.CaseInsensitive.Eq module offers simplified syntax and optimized instances for ByteString, String and Text. In particular, the ByteString implementation assumes ISO-8859-1 (8-bit) encoding and performs benchmark testing significantly faster than other implementations.

[Skip to Readme]


Change log CHANGELOG
Dependencies base (==4.*), bytestring, case-insensitive-match, text [details]
License BSD3
Copyright (c) 2016 Michael Hatfield
Author Michael Hatfield
Category Text
Home page
Bug tracker
Source repository head: git clone git:// -b master
this: git clone git:// -b master(tag
Uploaded Thu Aug 4 03:44:46 UTC 2016 by mikehat
Distributions NixOS:
Executables readme-example
Downloads 124 total (9 in the last 30 days)
Rating 0.0 (0 ratings) [clear rating]
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]
Hackage Matrix CI




Maintainer's Corner

For package maintainers and hackage trustees

Readme for case-insensitive-match-

[back to package description]


Here is a simplified library for matching and comparing strings in a case-insensitive manner. The only dependencies are base, bytestring and text.

Usage is simple

-- normal string comparison
"href"    /=  "HREF"
"apples"  /=  "oranges"
"Smith"   <   "Jones

-- case-insensitive comparison
"href"    ^== "HREF"
"appples" ^/= "oranges"
"jones"   ^<  "Smith"

-- sorting some data structurue
get_names p = (last_name p,first_name p)
sortBy (caseInsensitiveComparing get_names) people


The benchmarks are pretty comprehensive, offering comparisons with other algorithms, including the case-insensitive package and simple case folding using the base package. Before simply running the bench-others executable, check the source code or you'll end up with a long series of 360 benchmark tests. You'll want something like bench-others -m glob ByteString/Short/*/*, which runs only 36 benchmarks. The heirarchy is <data-type>/<string-length>/<match-type>/<algorithm>. As usual, performance comparisons depend heavily on use-cases, but for matching shorter strings that are often unequal this algorithm is clearly fastest.

There is also a real-world bench test that compares different algorithms while looking for links in an HTML file with Text.HTML.TagSoup. This bench involves a lot of work other than string comparison, so the differences between algorithms is slim, but usually measurable. Build an run:

$ cabal build bench-tagsoup

$ curl -s '' > sample/hackage-names.html
$ dist/build/bench-tagsoup/bench-tagsoup < sample/hackage-names.html


It would be quite involved to build a perfectly comprehensive testing module, but the test-basics executable is tests multiple cases against all supported data types.


Here is a sample:

{-# LANGUAGE OverloadedStrings #-}

module Main ( main ) where

import           Data.List
import           Data.CaseInsensitive
import qualified Data.ByteString.Char8 as BS

main = do
    stdin <- BS.getContents
    let sorted_names = map join_name $ sortBy caseInsensitiveCompare $ map split_name $ BS.lines stdin
    mapM_ BS.putStrLn sorted_names

split_name name = (last,BS.drop 2 first)
    where (last,first) = BS.span (/= ',') name

join_name (last,first) = BS.concat [ last , ", " , first ]

Try it with:

$ cabal build readme-sample

$ dist/build/readme-sample/readme-sample < sample/declaration-signers.txt