# Missing Features

List of unicode transforms that are not available in this package.

## Casemapping and Casefolding
The `text` package already provides proper unicode casemapping and casefolding
operations. This package does not aim to expose these though the implementation
is available.

## Available in utf8proc but not exposed
The following additional features are available but not exposed via an API. If
you need any of those they can be exposed quickly, please raise an issue or
send a pull request.

* Boundary Analysis (No locale specific handling)
* NLF sequence conversion
* Stripping certain character classes
* Lumping certain characters

## Available only in text-icu

`text-icu` is a full featured implementation of unicode operations via bindings
to the `icu` libraries. If you do not mind a dependency on the `icu` libraries
(separately installed) or need a comprehensive set of unicode operations then
`text-icu` will be a better choice.

The following features provided by `text-icu` are missing in this package:
* Normalization checks
* FCD normalization for collation
* String collation
* Iteration
* Regular expressions

# Haskell Unicode Landscape

Unicode functionality in Haskell is fragmented across various packages.  The
most comprehensive functionality is provided by `text-icu` which is based on
the `icu` C++ libraries.

* [text-icu](https://stackage.org/lts/package/text-icu)

## Basic

* [base](https://www.stackage.org/lts/package/base) Data.Char module
* [charset](https://www.stackage.org/lts/package/charset) Fast unicode character sets

## Unicode Character Database
* [unicode-properties](https://hackage.haskell.org/package/unicode-properties) Unicode 3.2.0 character properties
* [hxt-charproperties](http://www.stackage.org/lts/package/hxt-charproperties) Character properties and classes for XML and Unicode
* [unicode-names](http://hackage.haskell.org/package/unicode-names) Unicode 3.2.0 character names
* [unicode](https://hackage.haskell.org/package/unicode) Construct and transform unicode characters

## Unicode Strings
### ByteStrings (UTF8)
* [utf8-string](https://www.stackage.org/lts/package/utf8-string) Support for reading and writing UTF8 Strings
* [utf8-light](https://www.stackage.org/lts/package/utf8-light) Lightweight UTF8 handling
* [hxt-unicode](https://www.stackage.org/lts/package/hxt-unicode) Unicode en-/decoding functions for utf8, iso-latin-\* and other encodings
### Text (UTF16)
* [text](https://www.stackage.org/lts/package/text) An efficient packed Unicode text type
* [text-normal](https://hackage.haskell.org/package/text-normal) Data types for Unicode-normalized text - depends on text-icu

# Thoughts on package structuring

In my opinion, it will be good to consolidate all native haskell packages into
a standard module structure under a minimum number of packages and evolve
those. The following structure in three layers should be enough to cover
unicode handling:

1. **_unicode-properties_**: A single package for character database with
scripts to update it based on unicode standard database updates.
2. **_unicode-transforms_**: A lightweight native Haskell package for basic unicode
string transforms (normalization, case folding etc.) based on unicode-chars.
Not a replacement for text-icu.
3. **_utf8-string_**: A single UTF8 bytestring package including a normalized
string representation like text-normal
4. **_text_**: Existing text package (UTF16 representation). Include normalized
text (text-normal) in the text package based on the native Haskell
unicode-transforms package

# Unicode resources

* [Unicode Character Database](http://www.unicode.org/Public/UCD/latest/ucd)