wumpus-core-0.42.0: Pure Haskell PostScript and SVG generation.

PortabilityGHC
Stabilityunstable
Maintainerstephen.tetley@gmail.com

Wumpus.Core.Text.Base

Description

Extended character code handling.

Wumpus uses an escaping style derived from SVG to embed character codes and PostScript glyph names in regular strings.

 "regular ascii text & more ascii text"

i.e. character codes are delimited by &# on the left and ; on the right.

Glyph names are delimited by & on the left and ; on the right.

 "regular ascii text &ampersand; more ascii text"

Note that glyph names ** should always ** correspond to PostScript glyph names not SVG / HTML glyph names.

In Wumpus both glyph names and character codes can be embedded in strings - (e.g. è or è) although glyph names are preferred for PostScript (see below).

Character codes can be also be expressed as octal or hexadecimal numbers:

 myst&#0o350;re
 myst&#0xE8;re

In the generated PostScript, Wumpus uses the character name, e.g.:

 (myst) show /egrave glyphshow (re) show

The generated SVG uses the numeric code, e.g.:

 mystère

Unless you are generating only SVG, you should favour glyph names rather than code points as they are unambiguously interpreted by Wumpus. Character codes are context-dependent on the encoding of the font used to render the text. Standard fonts (e.g. Helvetica, Times, Courier) use the Standard Encoding is which has some differences to the common Latin1 character set.

Unfortunately if a glyph is not present in a font it cannot be rendered in PostScript. Wumpus-Core is oblivious to the contents of fonts, it does not warn about missing glyphs or attempt to substitute them.

Synopsis

Documentation

data EscapedText Source

Internal string representation for Wumpus-Core.

EscapedText is a list of characters, where each character may be either a regular character, an integer representing a Unicode code-point or a PostScript glyph name.

data EscapedChar Source

Internal character representation for Wumpus-Core.

An EscapedChar may be either a regular character, an integer representing a Unicode code-point or a PostScript glyph name.

type EncodingVector = IntMap StringSource

EncodingVecor - a map from code point to PostScript glyph name.

escapeString :: String -> EscapedTextSource

escapeString input is regular text and escaped glyph names or decimal character codes. Escaping in the input string should follow the SVG convention - the escape sequence starts with & (ampresand) for glyph names or &# (ampersand hash) for char codes and ends with ; (semicolon).

Escaped characters are output to PostScript as their respective glyph names:

 /egrave glyphshow

Escaped chararacters are output to SVG as an escaped decimal, e.g.:

 è

Note - for SVG output, Wumpus automatically escapes characters where the char code is above 128. This is the convention used by the Text.XHtml library.

wrapEscChar :: EscapedChar -> EscapedTextSource

Build an EscapedText from a single EscChar.

destrEscapedText :: ([EscapedChar] -> a) -> EscapedText -> aSource

Destructor for EscapedText.

textLength :: EscapedText -> IntSource

Get the character count of an EscapedText string.