unicode-transforms: Unicode normalization

[ bsd3, data, library, text, unicode ] [ Propose Tags ] [ Report a vulnerability ]

Fast Unicode 8.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

[Skip to Readme]

Modules

[Index]

Data
- ByteString
  - UTF8
    - Data.ByteString.UTF8.Normalize
- Text
  - Data.Text.Normalize
- Unicode
  - Data.Unicode.Types

Flags

Manual Flags

Name	Description	Default
dev	Developer build	Disabled
has-icu	Use text-icu for benchmark and test comparisons	Disabled
has-llvm	Use llvm backend (faster) for compilation	Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

unicode-transforms-0.2.0.tar.gz [browse] (Cabal source package)
Package description (revised from the package)

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

Bodigrim, harendra, adithyaov, wismill

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.7.1, 0.3.8, 0.4.0, 0.4.0.1 (info)
Change log	Changelog.md
Dependencies	base (>=4.7 && <5), bitarray (>=0.0.1 && <0.1), bytestring (>=0.9 && <0.11), text (>=1.1.1 && <1.3) [details]
Tested with	ghc ==7.8.4, ghc ==7.10.3, ghc ==8.0.1
License	BSD-3-Clause
Copyright	2016 Harendra Kumar, 2014–2015 Antonio Nikishaev
Author	Harendra Kumar
Maintainer	harendra.kumar@gmail.com
Uploaded	by harendra at 2016-10-23T18:20:34Z
Revised	Revision 1 made by harendra at 2016-11-10T01:30:47Z
Category	Data, Text, Unicode
Home page	http://github.com/harendra-kumar/unicode-transforms
Bug tracker	https://github.com/harendra-kumar/unicode-transforms/issues
Source repo	head: git clone https://github.com/harendra-kumar/unicode-transforms
Distributions	Arch:0.4.0.1, Debian:0.3.6, Fedora:0.4.0.1, LTSHaskell:0.4.0.1, NixOS:0.4.0.1, Stackage:0.4.0.1, openSUSE:0.4.0.1
Reverse Dependencies	11 direct, 187 indirect [details]
Downloads	40126 total (44 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for unicode-transforms-0.2.0

[back to package description]

Unicode Transforms

Fast Unicode 8.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.