unicode-transforms: Unicode normalization

[ bsd3, data, library, text, unicode ] [ Propose Tags ]

Fast Unicode 12.1.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

[Skip to Readme]
Versions [faq], 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6
Change log Changelog.md
Dependencies base (>=4.7 && <5), bitarray (>=0.0.1 && <0.1), bytestring (>=0.9 && <0.11), text (>=1.1.1 && <1.3) [details]
License BSD-3-Clause
Copyright 2016-2017 Harendra Kumar, 2014–2015 Antonio Nikishaev
Author Harendra Kumar
Maintainer harendra.kumar@gmail.com
Category Data, Text, Unicode
Home page http://github.com/harendra-kumar/unicode-transforms
Bug tracker https://github.com/harendra-kumar/unicode-transforms/issues
Source repo head: git clone https://github.com/harendra-kumar/unicode-transforms
Uploaded by harendra at Fri Jun 14 05:00:50 UTC 2019
Distributions Arch:0.3.6, Debian:0.3.4, Fedora:0.3.6, LTSHaskell:0.3.6, NixOS:0.3.6, Stackage:0.3.6, openSUSE:0.3.6
Executables chart
Downloads 12394 total (733 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2019-06-14 [all 1 reports]


[Index] [Quick Jump]



Developer build


Use bench-show to compare benchmarks


Use text-icu for benchmark and test comparisons


Use llvm backend (faster) for compilation


Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Maintainer's Corner

For package maintainers and hackage trustees

Readme for unicode-transforms-0.3.6

[back to package description]

Unicode Transforms

Fast Unicode 12.1.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"


Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.