unicode-transforms: Unicode normalization

[ bsd3, data, library, text, unicode ] [ Propose Tags ]

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).


[Skip to Readme]
Versions 0.1.0.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4
Change log Changelog.md
Dependencies base (>=4.7 && <5), bitarray (>=0.0.1 && <0.1), bytestring (>=0.9 && <0.11), text (>=1.1.1 && <1.3) [details]
License BSD-3-Clause
Copyright 2016-2017 Harendra Kumar, 2014–2015 Antonio Nikishaev
Author Harendra Kumar
Maintainer harendra.kumar@gmail.com
Category Data, Text, Unicode
Home page http://github.com/harendra-kumar/unicode-transforms
Bug tracker https://github.com/harendra-kumar/unicode-transforms/issues
Source repo head: git clone https://github.com/harendra-kumar/unicode-transforms
Uploaded by harendra at Thu Apr 5 10:17:29 UTC 2018
Distributions Arch:0.3.4, LTSHaskell:0.3.4, NixOS:0.3.4, Stackage:0.3.4, openSUSE:0.3.4
Downloads 4702 total (171 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-04-14 [all 1 reports]
Hackage Matrix CI

Modules

[Index]

Flags

NameDescriptionDefaultType
dev

Developer build

DisabledManual
has-icu

Use text-icu for benchmark and test comparisons

DisabledManual
has-llvm

Use llvm backend (faster) for compilation

DisabledManual

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for unicode-transforms-0.3.4

[back to package description]

Unicode Transforms

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.