Encode-0.7: Encoding character dataContentsIndex
Encode
Portabilityportable
Stabilityprovisional
Maintainerotakar.smrz mff.cuni.cz
Contents
Classes
Types
Methods
Description

The Haskell analogy to the Encode module in Perl: http://search.cpan.org/dist/Encode/

Encode.Arabic Encode.Mapper Encode.Unicode

Synopsis
class Encoding e where
encode :: e -> [UPoint] -> [Char]
decode :: e -> [Char] -> [UPoint]
data UPoint
encode :: Encoding e => e -> [UPoint] -> [Char]
decode :: Encoding e => e -> [Char] -> [UPoint]
Classes
class Encoding e where

Encodings are represented as distinct datatypes of the Encoding class, which defines two essential methods:

encode
turning a list of 'internal code points' into a String, and
decode
converting the lists in the opposite direction.

Developing a new encoding means to write a new module with a structure similar to this:

    module MyEncModule (MyEncType (..)) where
   
    import Encode
   
    data MyEncType = MyEncName | MyEncAlias deriving (Enum, Show)
   
    instance Encoding MyEncType where
   
        encode enc data = show data         -- your choices ...
   
        decode enc data = map (toEnum . fromEnum) data
 

Encode.Unicode.UTF8 is one concrete implementation that realizes and illustrates this template. Encode.Arabic.Buckwalter implements symmetric recoding using finite maps, and Encode.Arabic.ArabTeX makes use of monadic parsing and the FunParsing library.

Methods
encode :: e -> [UPoint] -> [Char]
decode :: e -> [Char] -> [UPoint]
show/hide Instances
Types
data UPoint

The datatype introduced for the internal representation of Unicode code points is currently defined as newtype UPoint = UPoint Int. The shift to code points UPoint from characters Char is intentional, as Unicode support in Haskell is not yet fully implemented, and code points are, anyway, different entities. Since the UPoint type is an instance of the Enum class, the type's constructor and destructor functions are available as toEnum and fromEnum, respectively.

The UPoint datatype should be the transfer point on the way from one encoding into another. It should not be the terminal stop, though. The encode method should be used systematically, and not show, even if it might temporarily produce somehow appealing results.

show/hide Instances
Methods
encode :: Encoding e => e -> [UPoint] -> [Char]
decode :: Encoding e => e -> [Char] -> [UPoint]
Produced by Haddock version 0.8