HaXmlSource codeContentsIndex
Text.XML.HaXml.Lex
Contents
Entry points to the lexer
Token types
Description

You don't normally need to use this Lex module directly - it is called automatically by the parser. (This interface is only exposed for debugging purposes.)

This is a hand-written lexer for tokenising the text of an XML document so that it is ready for parsing. It attaches position information in (line,column) format to every token. The main entry point is xmlLex. A secondary entry point, xmlReLex, is provided for when the parser needs to stuff a string back onto the front of the text and re-tokenise it (typically when expanding macros).

As one would expect, the lexer is essentially a small finite state machine.

Synopsis
xmlLex :: String -> String -> [Token]
xmlReLex :: Posn -> String -> [Token]
reLexEntityValue :: (String -> Maybe String) -> Posn -> String -> [Token]
type Token = (Posn, TokenT)
data TokenT
= TokCommentOpen
| TokCommentClose
| TokPIOpen
| TokPIClose
| TokSectionOpen
| TokSectionClose
| TokSection Section
| TokSpecialOpen
| TokSpecial Special
| TokEndOpen
| TokEndClose
| TokAnyOpen
| TokAnyClose
| TokSqOpen
| TokSqClose
| TokEqual
| TokQuery
| TokStar
| TokPlus
| TokAmp
| TokSemi
| TokHash
| TokBraOpen
| TokBraClose
| TokPipe
| TokPercent
| TokComma
| TokQuote
| TokName String
| TokFreeText String
| TokNull
| TokError String
data Special
= DOCTYPEx
| ELEMENTx
| ATTLISTx
| ENTITYx
| NOTATIONx
data Section
= CDATAx
| INCLUDEx
| IGNOREx
Entry points to the lexer
xmlLex :: String -> String -> [Token]Source
The first argument to xmlLex is the filename (used for source positions, especially in error messages), and the second is the string content of the XML file.
xmlReLex :: Posn -> String -> [Token]Source
xmlReLex is used when the parser expands a macro (PE reference). The expansion of the macro must be re-lexed as if for the first time.
reLexEntityValue :: (String -> Maybe String) -> Posn -> String -> [Token]Source
reLexEntityValue is used solely within parsing an entityvalue. Normally, a PERef is logically separated from its surroundings by whitespace. But in an entityvalue, a PERef can be juxtaposed to an identifier, so the expansion forms a new identifier. Thus the need to rescan the whole text for possible PERefs.
Token types
type Token = (Posn, TokenT)Source
All tokens are paired up with a source position. Lexical errors are passed back as a special TokenT value.
data TokenT Source
The basic token type.
Constructors
TokCommentOpen<!--
TokCommentClose
  • ->
TokPIOpen<?
TokPIClose?>
TokSectionOpen<![
TokSectionClose]]>
TokSection SectionCDATA INCLUDE IGNORE etc
TokSpecialOpen<!
TokSpecial SpecialDOCTYPE ELEMENT ATTLIST etc
TokEndOpen</
TokEndClose/>
TokAnyOpen<
TokAnyClose
TokSqOpen[
TokSqClose]
TokEqual=
TokQuery?
TokStar*
TokPlus+
TokAmp&
TokSemi;
TokHash#
TokBraOpen(
TokBraClose)
TokPipe|
TokPercent%
TokComma,
TokQuote'' or ""
TokName Stringbegins with letter, no spaces
TokFreeText Stringany character data
TokNullfake token
TokError Stringlexical error
show/hide Instances
Eq TokenT
Show TokenT
data Special Source
Constructors
DOCTYPEx
ELEMENTx
ATTLISTx
ENTITYx
NOTATIONx
show/hide Instances
data Section Source
Constructors
CDATAx
INCLUDEx
IGNOREx
show/hide Instances
Produced by Haddock version 0.8