| Portability | GHC | 
|---|---|
| Stability | experimental | 
| Maintainer | bos@serpentine.com | 
| Safe Haskell | None | 
Data.Text.ICU.Convert
Contents
Description
Character set conversion functions for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.
- data Converter
 - open :: String -> Maybe Bool -> IO Converter
 - fromUnicode :: Converter -> Text -> ByteString
 - toUnicode :: Converter -> ByteString -> Text
 - getName :: Converter -> String
 - usesFallback :: Converter -> Bool
 - isAmbiguous :: Converter -> Bool
 - getDefaultName :: IO String
 - setDefaultName :: String -> IO ()
 - compareNames :: String -> String -> Ordering
 - aliases :: String -> [String]
 - converterNames :: [String]
 - standardNames :: [String]
 
Character set conversion
Character set converter type. Note: this structure is not thread safe. It is not safe to use value of this type simultaneously from multiple threads.
Basic functions
Arguments
| :: String | Name of the converter to use.  | 
| -> Maybe Bool | Whether to use fallback mappings
 (see   | 
| -> IO Converter | 
Create a Converter with the name of a coded character set
 specified as a string.  The actual name will be resolved with the
 alias file using a case-insensitive string comparison that ignores
 leading zeroes and all non-alphanumeric characters.  E.g., the
 names "UTF8", "utf-8", "u*T@f08" and "Utf 8" are
 all equivalent (see also compareNames).  If an empty string is
 passed for the converter name, it will create one with the
 getDefaultName return value.
A converter name may contain options like a locale specification to control the specific behavior of the newly instantiated converter. The meaning of the options depends on the particular converter. If an option is not defined for or recognized by a given converter, then it is ignored.
Options are appended to the converter name string, with a comma between the name and the first option and also between adjacent options.
If the alias is ambiguous, then the preferred converter is used.
The conversion behavior and names can vary between platforms. ICU
 may convert some characters differently from other
 platforms. Details on this topic are in the ICU User's Guide at
 http://icu-project.org/userguide/conversion.html. Aliases
 starting with a "cp" prefix have no specific meaning other than
 its an alias starting with the letters "cp". Please do not
 associate any meaning to these aliases.
fromUnicode :: Converter -> Text -> ByteStringSource
Convert the Unicode string into a codepage string using the given converter.
toUnicode :: Converter -> ByteString -> TextSource
Convert the codepage string into a Unicode string using the given converter.
Converter metadata
usesFallback :: Converter -> BoolSource
Determines whether the converter uses fallback mappings or not. This flag has restrictions. Regardless of this flag, the converter will always use fallbacks from Unicode Private Use code points, as well as reverse fallbacks (to Unicode). For details see ".ucm File Format" in the Conversion Data chapter of the ICU User Guide: http://www.icu-project.org/userguide/conversion-data.html#ucmformat
isAmbiguous :: Converter -> BoolSource
Indicates whether the converter contains ambiguous mappings of the same character or not.
Functions for controlling global behavior
setDefaultName :: String -> IO ()Source
Sets the current default converter name. If this function needs
 to be called, it should be called during application
 initialization. Most of the time, the results from getDefaultName
 or open with an empty string argument is sufficient for your
 application.
Note: this function is not thread safe. Do not call this function when any ICU function is being used from more than one thread!
Miscellaneous functions
compareNames :: String -> String -> OrderingSource
Do a fuzzy compare of two converter/alias names.  The comparison
 is case-insensitive, ignores leading zeroes if they are not
 followed by further digits, and ignores all but letters and digits.
 Thus the strings "UTF-8", "utf_8", "u*T@f08" and
 "Utf 8" are exactly equivalent.  See section 1.4, Charset Alias
 Matching in Unicode Technical Standard #22 at
 http://www.unicode.org/reports/tr22/
Metadata
converterNames :: [String]Source
A list of the canonical names of all available converters.
standardNames :: [String]Source
The list of supported standard names.