unicode-data-names: Unicode characters names and aliases

[ apache, data, library, text, unicode ] [ Propose Tags ]

unicode-data-names provides Haskell APIs to access the Unicode character names and aliases from the Unicode character database (UCD).

The Haskell data structures are generated programmatically from the UCD files. The latest Unicode version supported by this library is 15.1.0.


[Skip to Readme]

Modules

[Index] [Quick Jump]

  • Unicode
    • Char
    • Internal
      • Char
        • Unicode.Internal.Char.Label
        • Names
          • Unicode.Internal.Char.Names.Version
        • UnicodeData
          • Unicode.Internal.Char.UnicodeData.DerivedName
          • Unicode.Internal.Char.UnicodeData.NameAliases

Flags

Manual Flags

NameDescriptionDefault
has-text

Expose an API using the text package

Disabled
has-bytestring

Expose an API using the bytestring package

Disabled
dev-has-icu

Use ICU for test and benchmark. Intended for development on the repository.

Disabled
export-all-chars

Build the export-all-chars executable

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0, 0.2.0, 0.3.0, 0.4.0
Change log Changelog.md
Dependencies base (>=4.7 && <4.21), ghc-prim (>=0.3.1 && <1.0), unicode-data (>=0.6 && <0.7) [details]
License Apache-2.0
Copyright 2022 Composewell Technologies and Contributors
Author Composewell Technologies and Contributors
Maintainer dev@wismill.eu
Category Data, Text, Unicode
Home page http://github.com/composewell/unicode-data
Bug tracker https://github.com/composewell/unicode-data/issues
Source repo head: git clone https://github.com/composewell/unicode-data
Uploaded by wismill at 2024-07-03T14:37:24Z
Distributions
Executables export-all-chars
Downloads 165 total (38 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2024-07-03 [all 1 reports]

Readme for unicode-data-names-0.4.0

[back to package description]

README

unicode-data-names provides Haskell APIs to efficiently access the Unicode character names and aliases from the Unicode character database.

There are 3 APIs:

  • String API: enabled by default.
  • ByteString API: enabled via the package flag has-bytestring.
  • Text API: enabled via the package flag has-text.

The Haskell data structures are generated programmatically from the Unicode character database (UCD) files. The latest Unicode version supported by this library is 15.1.0.

Please see the Haddock documentation for reference documentation.

Comparing with ICU

We can compare the implementation against ICU. This requires working with the source repository, as we need the internal package icu.

Warning: An ICU version with the exact same Unicode version is required.

cabal run -O2 --flag dev-has-icu unicode-data-names:tests -- -m ICU

Comparing with Python

In order to check Unicode implementation in Haskell, we compare the results obtained with Python.

Warning: A Python version with the exact same Unicode version is required.

cabal run -O2 -f "export-all-chars" -v0 export-all-chars > ./test/all_chars.csv
python3 ./test/check.py -v ./test/all_chars.csv

Licensing

unicode-data-names is an open source project available under a liberal Apache-2.0 license.

Contributing

As an open project we welcome contributions.