Text.XML.HaXml.Lex

Contents

Description

You don't normally need to use this Lex module directly - it is called automatically by the parser. (This interface is only exposed for debugging purposes.)

This is a hand-written lexer for tokenising the text of an XML document so that it is ready for parsing. It attaches position information in (line,column) format to every token. The main entry point is xmlLex. A secondary entry point, xmlReLex, is provided for when the parser needs to stuff a string back onto the front of the text and re-tokenise it (typically when expanding macros).

As one would expect, the lexer is essentially a small finite state machine.

Synopsis

Entry points to the lexer

xmlLex :: String -> String -> [Token]Source

The first argument to xmlLex is the filename (used for source positions, especially in error messages), and the second is the string content of the XML file.

xmlReLex :: Posn -> String -> [Token]Source

xmlReLex is used when the parser expands a macro (PE reference). The expansion of the macro must be re-lexed as if for the first time.

posInNewCxt :: String -> Maybe Posn -> Posn Source

posInNewCxt name pos creates a new source position from an old one. It is used when opening a new file (e.g. a DTD inclusion), to denote the start of the file name, but retain the stacked information that it was included from the old pos.

Token and position types

type Token = Either String (Posn, TokenT)Source

All tokens are paired up with a source position. Lexical errors are passed back through the Either type.

data Posn Source

Source positions contain a filename, line, column, and an inclusion point, which is itself another source position, recursively.

Constructors

Pn String !Int !Int (Maybe Posn)

Instances

Eq Posn
Show Posn

data TokenT Source

The basic token type.

Constructors

TokCommentOpen	<!--
TokCommentClose	->
TokPIOpen	<?
TokPIClose	?>
TokSectionOpen	<![
TokSectionClose	]]>
TokSection Section	CDATA INCLUDE IGNORE etc
TokSpecialOpen	<!
TokSpecial Special	DOCTYPE ELEMENT ATTLIST etc
TokEndOpen	</
TokEndClose	/>
TokAnyOpen	<
TokAnyClose
TokSqOpen	[
TokSqClose	]
TokEqual	=
TokQuery	?
TokStar	*
TokPlus	+
TokAmp	&
TokSemi	;
TokHash	#
TokBraOpen	(
TokBraClose	)
TokPipe	\|
TokPercent	%
TokComma	,
TokQuote	'' or ""
TokName String	begins with letter, no spaces
TokFreeText String	any character data
TokNull	fake token

Instances

Eq TokenT
Show TokenT

data Special Source

Constructors

DOCTYPEx
ELEMENTx
ATTLISTx
ENTITYx
NOTATIONx

Instances

Eq Special
Show Special

data Section Source

Constructors

CDATAx
INCLUDEx
IGNOREx

Instances

Eq Section
Show Section