case-insensitive-match: A simplified, faster way to do case-insensitive matching.

[ bsd3, library, program, text ] [ Propose Tags ] [ Report a vulnerability ]

A way to do case-insensitive string matching and comparison with less overhead and more speed. The Data.CaseInsensitive.Eq module offers simplified syntax and optimized instances for ByteString, String and Text. In particular, the ByteString implementation assumes ISO-8859-1 (8-bit) encoding and performs benchmark testing significantly faster than other implementations.

[Skip to Readme]

Modules

[Index]

Data
- Data.CaseInsensitive
  - Data.CaseInsensitive.Eq
  - Data.CaseInsensitive.Ord

Downloads

case-insensitive-match-0.1.1.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

mikehat

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.1.0
Change log	CHANGELOG
Dependencies	base (>=4 && <5), bytestring, case-insensitive-match, text [details]
License	BSD-3-Clause
Copyright	(c) 2016 Michael Hatfield
Author	Michael Hatfield
Maintainer	github@michael-hatfield.com
Uploaded	by mikehat at 2016-08-04T03:44:46Z
Category	Text
Home page	https://github.com/mikehat/case-insensitive-match
Bug tracker	https://github.com/mikehat/case-insensitive-match
Source repo	head: git clone git://github.com/mikehat/case-insensitive-match.git -b master this: git clone git://github.com/mikehat/case-insensitive-match.git -b master(tag 0.1.1.0)
Distributions
Executables	readme-example
Downloads	1818 total (6 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for case-insensitive-match-0.1.1.0

[back to package description]

case-insensitive-match 0.1.1.0

Here is a simplified library for matching and comparing strings in a case-insensitive manner. The only dependencies are base, bytestring and text.

Usage is simple

-- normal string comparison
"href"    /=  "HREF"
"apples"  /=  "oranges"
"Smith"   <   "Jones

-- case-insensitive comparison
"href"    ^== "HREF"
"appples" ^/= "oranges"
"jones"   ^<  "Smith"

-- sorting some data structurue
get_names p = (last_name p,first_name p)
sortBy (caseInsensitiveComparing get_names) people

Benchmarks

The benchmarks are pretty comprehensive, offering comparisons with other algorithms, including the case-insensitive package and simple case folding using the base package. Before simply running the bench-others executable, check the source code or you'll end up with a long series of 360 benchmark tests. You'll want something like bench-others -m glob ByteString/Short/*/*, which runs only 36 benchmarks. The heirarchy is <data-type>/<string-length>/<match-type>/<algorithm>. As usual, performance comparisons depend heavily on use-cases, but for matching shorter strings that are often unequal this algorithm is clearly fastest.

There is also a real-world bench test that compares different algorithms while looking for links in an HTML file with Text.HTML.TagSoup. This bench involves a lot of work other than string comparison, so the differences between algorithms is slim, but usually measurable. Build an run:

$ cabal build bench-tagsoup
...

$ curl -s 'https://hackage.haskell.org/packages/names' > sample/hackage-names.html
$ dist/build/bench-tagsoup/bench-tagsoup < sample/hackage-names.html
...

Testing

It would be quite involved to build a perfectly comprehensive testing module, but the test-basics executable is tests multiple cases against all supported data types.

Sample

Here is a sample:

{-# LANGUAGE OverloadedStrings #-}

module Main ( main ) where

import           Data.List
import           Data.CaseInsensitive
import qualified Data.ByteString.Char8 as BS

main = do
    stdin <- BS.getContents
    let sorted_names = map join_name $ sortBy caseInsensitiveCompare $ map split_name $ BS.lines stdin
    mapM_ BS.putStrLn sorted_names


split_name name = (last,BS.drop 2 first)
    where (last,first) = BS.span (/= ',') name

join_name (last,first) = BS.concat [ last , ", " , first ]

Try it with:

$ cabal build readme-sample
...

$ dist/build/readme-sample/readme-sample < sample/declaration-signers.txt