unicode-transforms: Unicode normalization

[ bsd3, data, library, text, unicode ] [ Propose Tags ]

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).


[Skip to Readme]

Flags

Manual Flags

NameDescriptionDefault
dev

Developer build

Disabled
has-icu

Use text-icu for benchmark and test comparisons

Disabled
has-llvm

Use llvm backend (faster) for compilation

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.7.1, 0.3.8, 0.4.0, 0.4.0.1 (info)
Change log Changelog.md
Dependencies base (>=4.7 && <5), bitarray (>=0.0.1 && <0.1), bytestring (>=0.9 && <0.11), text (>=1.1.1 && <1.3) [details]
License BSD-3-Clause
Copyright 2016-2017 Harendra Kumar, 2014–2015 Antonio Nikishaev
Author Harendra Kumar
Maintainer harendra.kumar@gmail.com
Category Data, Text, Unicode
Home page http://github.com/harendra-kumar/unicode-transforms
Bug tracker https://github.com/harendra-kumar/unicode-transforms/issues
Source repo head: git clone https://github.com/harendra-kumar/unicode-transforms
Uploaded by harendra at 2018-04-05T10:17:29Z
Distributions Arch:0.4.0.1, Debian:0.3.6, Fedora:0.4.0.1, LTSHaskell:0.4.0.1, NixOS:0.4.0.1, Stackage:0.4.0.1, openSUSE:0.4.0.1
Reverse Dependencies 11 direct, 181 indirect [details]
Downloads 37252 total (210 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-04-14 [all 1 reports]

Readme for unicode-transforms-0.3.4

[back to package description]

Unicode Transforms

Fast Unicode 9.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.