|
Data.Text.ICU.Converter | Portability | GHC | Stability | experimental | Maintainer | bos@serpentine.com |
|
|
|
|
|
Description |
Character set conversion functions for Unicode, implemented as
bindings to the International Components for Unicode (ICU)
libraries.
|
|
Synopsis |
|
|
|
|
Character set conversion
|
|
|
Character set converter type. Note: this structure is not
thread safe. It is not safe to use value of this type
simultaneously from multiple threads.
| Instances | |
|
|
Basic functions
|
|
|
Create a Converter with the name of a coded character set
specified as a string. The actual name will be resolved with the
alias file using a case-insensitive string comparison that ignores
leading zeroes and all non-alphanumeric characters. E.g., the
names "UTF8", "utf-8", "u*T@f08" and "Utf 8" are
all equivalent (see also compareNames). If an empty string is
passed for the converter name, it will create one with the
getDefaultName return value.
A converter name may contain options like a locale specification to
control the specific behavior of the newly instantiated converter.
The meaning of the options depends on the particular converter. If
an option is not defined for or recognized by a given converter,
then it is ignored.
Options are appended to the converter name string, with a comma
between the name and the first option and also between adjacent
options.
If the alias is ambiguous, then the preferred converter is used.
The conversion behavior and names can vary between platforms. ICU
may convert some characters differently from other
platforms. Details on this topic are in the ICU User's Guide at
http://icu-project.org/userguide/conversion.html. Aliases
starting with a "cp" prefix have no specific meaning other than
its an alias starting with the letters "cp". Please do not
associate any meaning to these aliases.
|
|
|
Convert the Unicode string into a codepage string using the given
converter.
|
|
|
Convert the codepage string into a Unicode string using the given
converter.
|
|
Converter metadata
|
|
|
Gets the internal, canonical name of the converter.
|
|
|
Determines whether the converter uses fallback mappings or not.
This flag has restrictions; see setFallback.
|
|
|
Sets the converter to use fallback mappings or not. Regardless
of this flag, the converter will always use fallbacks from Unicode
Private Use code points, as well as reverse fallbacks (to Unicode).
For details see ".ucm File Format" in the Conversion Data chapter
of the ICU User Guide:
http://www.icu-project.org/userguide/conversion-data.html#ucmformat
|
|
|
Indicates whether the converter contains ambiguous mappings of
the same character or not.
|
|
Functions for controlling global behavior
|
|
|
Returns the current default converter name. If you want to open
a default converter, you do not need to use this function. It is
faster to pass the empty string to open the default converter.
|
|
|
Sets the current default converter name. If this function needs
to be called, it should be called during application
initialization. Most of the time, the results from getDefaultName
or open with an empty string argument is sufficient for your
application.
Note: this function is not thread safe. Do not call this
function when any ICU function is being used from more than one
thread!
|
|
Miscellaneous functions
|
|
|
Do a fuzzy compare of two converter/alias names. The comparison
is case-insensitive, ignores leading zeroes if they are not
followed by further digits, and ignores all but letters and digits.
Thus the strings "UTF-8", "utf_8", "u*T@f08" and
"Utf 8" are exactly equivalent. See section 1.4, Charset Alias
Matching in Unicode Technical Standard #22 at
http://www.unicode.org/reports/tr22/
|
|
|
Return the aliases for a given converter or alias name.
|
|
Metadata
|
|
|
A list of the canonical names of all available converters.
|
|
|
The list of supported standard names.
|
|
Produced by Haddock version 2.4.2 |