|
Text.XML.HXT.DOM.Unicode | Portability | portable | Stability | experimental | Maintainer | Uwe Schmidt (uwe@fh-wedel.de) |
|
|
|
|
|
Description |
Version : $Id$
Unicode and UTF-8 Conversion Functions
|
|
Synopsis |
|
|
|
|
Unicode Type declarations
|
|
|
Unicode is represented as the Char type
Precondition for this is the support of Unicode character range
in the compiler (e.g. ghc but not hugs)
|
|
|
the type for Unicode strings
|
|
|
UTF-8 charachters are represented by the Char type
|
|
|
UTF-8 strings are implemented as Haskell strings
|
|
|
|
|
Decoding function with a pair containing the result string and a list of decoding errors as result
|
|
|
Decoding function where decoding errors are interleaved with decoded characters
|
|
XML char predicates
|
|
|
checking for valid XML characters
|
|
|
test for a legal latin1 XML char
|
|
|
checking for XML space character: \n, \r, \t and " "
|
|
|
checking for XML1.1 space character: additional space 0x85 and 0x2028
see also : isXmlSpaceChar
|
|
|
checking for XML name character
|
|
|
checking for XML name start character
see also : isXmlNameChar
|
|
|
checking for XML NCName character: no ":" allowed
see also : isXmlNameChar
|
|
|
checking for XML NCName start character: no ":" allowed
see also : isXmlNameChar, isXmlNCNameChar
|
|
|
checking for XML public id character
|
|
|
checking for XML letter
|
|
|
checking for XML base charater
|
|
|
checking for XML ideographic charater
|
|
|
checking for XML combining charater
|
|
|
checking for XML digit
|
|
|
checking for XML extender
|
|
|
checking for XML control or permanently discouraged char
see Errata to XML1.0 (http://www.w3.org/XML/xml-V10-2e-errata) No 46
Document authors are encouraged to avoid compatibility characters,
as defined in section 6.8 of [Unicode] (see also D21 in section 3.6 of [Unicode3]).
The characters defined in the following ranges are also discouraged.
They are either control characters or permanently undefined Unicode characters:
|
|
UTF-8 and Unicode conversion functions
|
|
|
UTF-8 to Unicode conversion with deletion of leading byte order mark, as described in XML standard F.1
|
|
|
|
|
code conversion from latin1 to Unicode
|
|
|
UCS-2 to UTF-8 conversion with byte order mark analysis
|
|
|
UCS-2 big endian to Unicode conversion
|
|
|
UCS-2 little endian to Unicode conversion
|
|
|
UTF-16 big endian to UTF-8 conversion with removal of byte order mark
|
|
|
UTF-16 little endian to UTF-8 conversion with removal of byte order mark
|
|
|
conversion from Unicode (Char) to a UTF8 encoded string.
|
|
|
conversion from Unicode strings (UString) to UTF8 encoded strings.
|
|
|
substitute all Unicode characters, that are not legal 1-byte
UTF-8 XML characters by a character reference.
This function can be used to translate all text nodes and
attribute values into pure ascii.
see also : unicodeToLatin1
|
|
|
substitute all Unicode characters, that are not legal latin1
UTF-8 XML characters by a character reference.
This function can be used to translate all text nodes and
attribute values into ISO latin1.
see also : unicodeToXmlEntity
|
|
|
removes all non ascii chars, may be used to transform
a document into a pure ascii representation by removing
all non ascii chars from tag and attibute names
see also : unicodeRemoveNoneLatin1, unicodeToXmlEntity
|
|
|
removes all non latin1 chars, may be used to transform
a document into a pure ascii representation by removing
all non ascii chars from tag and attibute names
see also : unicodeRemoveNoneAscii, unicodeToLatin1
|
|
|
convert an Unicode into a XML character reference.
see also : intToCharRefHex
|
|
|
convert an Unicode into a XML hexadecimal character reference.
see also: intToCharRef
|
|
|
the lookup function for selecting the decoding function
|
|
|
the lookup function for selecting the decoding function
|
|
|
the lookup function for selecting the encoding function
|
|
|
White Space (XML Standard 2.3) and
end of line handling (2.11)
#x0D and #x0D#x0A are mapped to #x0A
|
|
|
|
Produced by Haddock version 2.3.0 |