unicode-collation: Haskell implementation of the Unicode Collation Algorithm

[ bsd2, library, text ] [ Propose Tags ]

This library provides a pure Haskell implementation of the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/. It is not as fully-featured or as performant as text-icu, but it avoids a dependency on a large C library. Locale-specific tailorings are also provided.


[Skip to Readme]

Flags

Automatic Flags
NameDescriptionDefault
doctests

Run doctests as part of test suite. Use with: --write-ghc-environment-files=always.

Disabled
executable

Build the unicode-collate executable.

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1, 0.1.1, 0.1.2, 0.1.3, 0.1.3.1, 0.1.3.2, 0.1.3.3, 0.1.3.4, 0.1.3.5, 0.1.3.6
Change log CHANGELOG.md
Dependencies base (>=4.9 && <4.16), binary, bytestring, bytestring-lexing (>=0.5 && <0.6), containers, filepath, parsec, template-haskell, text (>=1.2 && <1.3), th-lift-instances, unicode-collation, unicode-transforms (>=0.3.7.1) [details]
License BSD-2-Clause
Copyright 2021 John MacFarlane
Author John MacFarlane
Maintainer John MacFarlane <jgm@berkeley.edu>
Category Text
Home page https://github.com/jgm/unicode-collation
Bug tracker https://github.com/jgm/unicode-collation/issues
Source repo head: git clone https://github.com/jgm/unicode-collation.git
Uploaded by JohnMacFarlane at 2021-04-18T18:32:18Z
Distributions Arch:0.1.3.6, Fedora:0.1.3.5, LTSHaskell:0.1.3.6, NixOS:0.1.3.6, Stackage:0.1.3.6, openSUSE:0.1.3.6
Reverse Dependencies 4 direct, 171 indirect [details]
Executables unicode-collate
Downloads 18258 total (460 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for unicode-collation-0.1.1

[back to package description]

unicode-collation

GitHubCI Hackage BSD-2-Clause license

Haskell implementation of unicode collation algorithm.

Motivation

Previously there was no way to do correct unicode collation (sorting) in Haskell without depending on the C library icu and the barely maintained Haskell wrapper text-icu. This library offers a pure Haskell solution.

Conformance

The library passes UCA conformance tests (except for tests involving unmatched surrogates and a few Tibetan characters, which seem to be changed in unexpected ways by Text.pack or normalization).

Localized collations have not been tested extensively.

Performance

  sort a list of 10000 random Texts: OK (2.19s)
    8.2 ms ± 599 μs,  27 MB allocated, 911 KB copied
  sort same list with text-icu:      OK (2.04s)
    2.0 ms ± 112 μs, 7.1 MB allocated, 147 KB copied

Localized collations

The following localized collations are available. For languages not listed here, the root collation is used.

af
ar
as
az
be
bn
ca
cs
cu
cy
da
de-AT-u-co-phonebk
de-u-co-phonebk
dsb
ee
eo
es
es-u-co-trad
et
fa
fi
fi-u-co-phonebk
fil
fo
fr-CA
gu
ha
haw
he
hi
hr
hu
hy
ig
is
ja
kk
kl
kn
ko
kok
lkt
ln
lt
lv
mk
ml
mr
mt
nb
nn
nso
om
or
pa
pl
ro
sa
se
si
si-u-co-dict
sk
sl
sq
sr
sv
sv-u-co-reformed
ta
te
th
tn
to
tr
ug-Cyrl
uk
ur
vi
vo
wae
wo
yo
zh
zh-u-co-big5han
zh-u-co-gb2312
zh-u-co-pinyin
zh-u-co-stroke
zh-u-co-zhuyin

Collation reordering (e.g. [reorder Latn Kana Hani]) is not suported

Data files

Version 13.0.0 of the Unicode data is used: http://www.unicode.org/Public/UCA/13.0.0/

Locale-specific tailorings are derived from the Perl module Unicode::Collate: https://cpan.metacpan.org/authors/id/S/SA/SADAHIRO/Unicode-Collate-1.29.tar.gz

Executable

The package includes an executable component, unicode-collate, which may be used for testing and for collating in scripts. To build it, enable the executable flag. For usage instructions, unicode-collate --help.

References