pandoc-1.10.1: Conversion between markup formats

Portabilityportable
Stabilityalpha
MaintainerJohn MacFarlane <jgm@berkeley.edu>
Safe HaskellNone

Text.Pandoc.Parsing

Contents

Description

A utility library with parsers used in pandoc readers.

Synopsis

Documentation

(>>~) :: Monad m => m a -> m b -> m aSource

Like >>, but returns the operation on the left. (Suggested by Tillmann Rendel on Haskell-cafe list.)

anyLine :: Parser [Char] st [Char]Source

Parse any line of text

many1Till :: Parser [tok] st a -> Parser [tok] st end -> Parser [tok] st [a]Source

Like manyTill, but reads at least one item.

notFollowedBy' :: Show b => Parser [a] st b -> Parser [a] st ()Source

A more general form of notFollowedBy. This one allows any type of parser to be specified, and succeeds only if that parser fails. It does not consume any input.

oneOfStrings :: [String] -> Parser [Char] st StringSource

Parses one of a list of strings. If the list contains two strings one of which is a prefix of the other, the longer string will be matched if possible.

oneOfStringsCI :: [String] -> Parser [Char] st StringSource

Parses one of a list of strings (tried in order), case insensitive.

spaceChar :: Parser [Char] st CharSource

Parses a space or tab.

nonspaceChar :: Parser [Char] st CharSource

Parses a nonspace, nonnewline character.

skipSpaces :: Parser [Char] st ()Source

Skips zero or more spaces or tabs.

blankline :: Parser [Char] st CharSource

Skips zero or more spaces or tabs, then reads a newline.

blanklines :: Parser [Char] st [Char]Source

Parses one or more blank lines and returns a string of newlines.

enclosedSource

Arguments

:: Parser [Char] st t

start parser

-> Parser [Char] st end

end parser

-> Parser [Char] st a

content parser (to be used repeatedly)

-> Parser [Char] st [a] 

Parses material enclosed between start and end parsers.

stringAnyCase :: [Char] -> Parser [Char] st StringSource

Parse string, case insensitive.

parseFromString :: Parser [tok] st a -> [tok] -> Parser [tok] st aSource

Parse contents of str using parser and return result.

lineClump :: Parser [Char] st StringSource

Parse raw line block up to and including blank lines.

charsInBalanced :: Char -> Char -> Parser [Char] st Char -> Parser [Char] st StringSource

Parse a string of characters between an open character and a close character, including text between balanced pairs of open and close, which must be different. For example, charsInBalanced '(' ')' anyChar will parse (hello (there)) and return hello (there).

romanNumeralSource

Arguments

:: Bool

Uppercase if true

-> Parser [Char] st Int 

Parses a roman numeral (uppercase or lowercase), returns number.

emailAddress :: Parser [Char] st (String, String)Source

Parses an email address; returns original and corresponding escaped mailto: URI.

uri :: Parser [Char] st (String, String)Source

Parses a URI. Returns pair of original and URI-escaped version.

withHorizDisplacementSource

Arguments

:: Parser [Char] st a

Parser to apply

-> Parser [Char] st (a, Int)

(result, displacement)

Applies a parser, returns tuple of its results and its horizontal displacement (the difference between the source column at the end and the source column at the beginning). Vertical displacement (source row) is ignored.

withRaw :: Parser [Char] st a -> Parser [Char] st (a, [Char])Source

Applies a parser and returns the raw string that was parsed, along with the value produced by the parser.

escapedSource

Arguments

:: Parser [Char] st Char

Parser for character to escape

-> Parser [Char] st Char 

Parses backslash, then applies character parser.

characterReference :: Parser [Char] st CharSource

Parse character entity.

anyOrderedListMarker :: Parser [Char] ParserState ListAttributesSource

Parses an ordered list marker and returns list attributes.

orderedListMarker :: ListNumberStyle -> ListNumberDelim -> Parser [Char] ParserState IntSource

Parses an ordered list marker with a given style and delimiter, returns number.

charRef :: Parser [Char] st InlineSource

Parses a character reference and returns a Str element.

lineBlockLines :: Parser [Char] st [String]Source

Parses an RST-style line block and returns a list of strings.

tableWith :: Parser [Char] ParserState ([[Block]], [Alignment], [Int]) -> ([Int] -> Parser [Char] ParserState [[Block]]) -> Parser [Char] ParserState sep -> Parser [Char] ParserState end -> Parser [Char] ParserState BlockSource

Parse a table using headerParser, rowParser, lineParser, and footerParser.

gridTableWithSource

Arguments

:: Parser [Char] ParserState [Block]

Block list parser

-> Bool

Headerless table

-> Parser [Char] ParserState Block 

readWithSource

Arguments

:: Parser [t] ParserState a

parser

-> ParserState

initial state

-> [t]

input

-> a 

Parse a string with a given parser and state.

testStringWith :: Show a => Parser [Char] ParserState a -> String -> IO ()Source

Parse a string with parser (for testing).

guardEnabled :: Extension -> Parser s ParserState ()Source

Succeed only if the extension is enabled.

guardDisabled :: Extension -> Parser s ParserState ()Source

Succeed only if the extension is disabled.

data ParserState Source

Parsing options.

Constructors

ParserState 

Fields

stateOptions :: ReaderOptions

User options

stateParserContext :: ParserContext

Inside list?

stateQuoteContext :: QuoteContext

Inside quoted environment?

stateAllowLinks :: Bool

Allow parsing of links

stateMaxNestingLevel :: Int

Max # of nested Strong/Emph

stateLastStrPos :: Maybe SourcePos

Position after last str parsed

stateKeys :: KeyTable

List of reference keys (with fallbacks)

stateSubstitutions :: SubstTable

List of substitution references

stateNotes :: NoteTable

List of notes (raw bodies)

stateNotes' :: NoteTable'

List of notes (parsed bodies)

stateTitle :: [Inline]

Title of document

stateAuthors :: [[Inline]]

Authors of document

stateDate :: [Inline]

Date of document

stateHeaderTable :: [HeaderType]

Ordered list of header types used

stateHeaders :: [[Inline]]

List of headers (used for implicit ref links)

stateIdentifiers :: [String]

List of header identifiers used

stateNextExample :: Int

Number of next example

stateExamples :: Map String Int

Map from example labels to numbers

stateHasChapters :: Bool

True if chapter encountered

stateMacros :: [Macro]

List of macros defined so far

stateRstDefaultRole :: String

Current rST default interpreted text role

stateWarnings :: [String]

Warnings generated by the parser

Instances

data HeaderType Source

Constructors

SingleHeader Char

Single line of characters underneath

DoubleHeader Char

Lines of characters above and below

data ParserContext Source

Constructors

ListItemState

Used when running parser on list item contents

NullState

Default state

data QuoteContext Source

Constructors

InSingleQuote

Used when parsing inside single quotes

InDoubleQuote

Used when parsing inside double quotes

NoQuote

Used when not parsing inside quotes

newtype Key Source

Constructors

Key String 

Instances

macro :: Parser [Char] ParserState BlocksSource

Parse a newcommand or renewcommand macro definition.

applyMacros' :: String -> Parser [Char] ParserState StringSource

Apply current macros to string.

type Parser t s = Parsec t sSource

newtype F a Source

Constructors

F 

Fields

unF :: Reader ParserState a
 

Instances

Monad F 
Functor F 
Monoid a => Monoid (F a) 

runF :: F a -> ParserState -> aSource

asksF :: (ParserState -> a) -> F aSource

Re-exports from Text.Pandoc.Parsec

runParser :: Stream s Identity t => Parsec s u a -> u -> SourceName -> s -> Either ParseError a

The most general way to run a parser over the Identity monad. runParser p state filePath input runs parser p on the input list of tokens input, obtained from source filePath with the initial user state st. The filePath is only used in error messages and may be the empty string. Returns either a ParseError (Left) or a value of type a (Right).

  parseFromFile p fname
    = do{ input <- readFile fname
        ; return (runParser p () fname input)
        }

parse :: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a

parse p filePath input runs a parser p over Identity without user state. The filePath is only used in error messages and may be the empty string. Returns either a ParseError (Left) or a value of type a (Right).

  main    = case (parse numbers "" "11, 2, 43") of
             Left err  -> print err
             Right xs  -> print (sum xs)

  numbers = commaSep integer

anyToken :: (Stream s m t, Show t) => ParsecT s u m t

The parser anyToken accepts any kind of token. It is for example used to implement eof. Returns the accepted token.

getInput :: Monad m => ParsecT s u m s

Returns the current input

setInput :: Monad m => s -> ParsecT s u m ()

setInput input continues parsing with input. The getInput and setInput functions can for example be used to deal with #include files.

unexpected :: Stream s m t => String -> ParsecT s u m a

The parser unexpected msg always fails with an unexpected error message msg without consuming any input.

The parsers fail, (<?>) and unexpected are the three parsers used to generate error messages. Of these, only (<?>) is commonly used. For an example of the use of unexpected, see the definition of notFollowedBy.

char :: Stream s m Char => Char -> ParsecT s u m Char

char c parses a single character c. Returns the parsed character (i.e. c).

  semiColon  = char ';'

letter :: Stream s m Char => ParsecT s u m Char

Parses a letter (an upper case or lower case character). Returns the parsed character.

digit :: Stream s m Char => ParsecT s u m Char

Parses a digit. Returns the parsed character.

alphaNum :: Stream s m Char => ParsecT s u m Char

Parses a letter or digit (a character between '0' and '9'). Returns the parsed character.

skipMany :: ParsecT s u m a -> ParsecT s u m ()

skipMany p applies the parser p zero or more times, skipping its result.

  spaces  = skipMany space

skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

skipMany1 p applies the parser p one or more times, skipping its result.

spaces :: Stream s m Char => ParsecT s u m ()

Skips zero or more white space characters. See also skipMany.

space :: Stream s m Char => ParsecT s u m Char

Parses a white space character (any character which satisfies isSpace) Returns the parsed character.

anyChar :: Stream s m Char => ParsecT s u m Char

This parser succeeds for any character. Returns the parsed character.

satisfy :: Stream s m Char => (Char -> Bool) -> ParsecT s u m Char

The parser satisfy f succeeds for any character for which the supplied function f returns True. Returns the character that is actually parsed.

newline :: Stream s m Char => ParsecT s u m Char

Parses a newline character ('\n'). Returns a newline character.

string :: Stream s m Char => String -> ParsecT s u m String

string s parses a sequence of characters given by s. Returns the parsed string (i.e. s).

  divOrMod    =   string "div" 
              <|> string "mod"

count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a]

count n p parses n occurrences of p. If n is smaller or equal to zero, the parser equals to return []. Returns a list of n values returned by p.

eof :: (Stream s m t, Show t) => ParsecT s u m ()

This parser only succeeds at the end of the input. This is not a primitive parser but it is defined using notFollowedBy.

  eof  = notFollowedBy anyToken <?> "end of input"

noneOf :: Stream s m Char => [Char] -> ParsecT s u m Char

As the dual of oneOf, noneOf cs succeeds if the current character not in the supplied list of characters cs. Returns the parsed character.

  consonant = noneOf "aeiou"

oneOf :: Stream s m Char => [Char] -> ParsecT s u m Char

oneOf cs succeeds if the current character is in the supplied list of characters cs. Returns the parsed character. See also satisfy.

   vowel  = oneOf "aeiou"

lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a

lookAhead p parses p without consuming any input.

If p fails and consumes some input, so does lookAhead. Combine with try if this is undesirable.

notFollowedBy :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()

notFollowedBy p only succeeds when parser p fails. This parser does not consume any input. This parser can be used to implement the 'longest match' rule. For example, when recognizing keywords (for example let), we want to make sure that a keyword is not followed by a legal identifier character, in which case the keyword is actually an identifier (for example lets). We can program this behaviour as follows:

  keywordLet  = try (do{ string "let"
                       ; notFollowedBy alphaNum
                       })

many :: ParsecT s u m a -> ParsecT s u m [a]

many p applies the parser p zero or more times. Returns a list of the returned values of p.

  identifier  = do{ c  <- letter
                  ; cs <- many (alphaNum <|> char '_')
                  ; return (c:cs)
                  }

many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a]

many1 p applies the parser p one or more times. Returns a list of the returned values of p.

  word  = many1 letter

manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a]

manyTill p end applies parser p zero or more times until parser end succeeds. Returns the list of values returned by p. This parser can be used to scan comments:

  simpleComment   = do{ string "<!--"
                      ; manyTill anyChar (try (string "-->"))
                      }

Note the overlapping parsers anyChar and string "-->", and therefore the use of the try combinator.

(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a

This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried. This combinator is defined equal to the mplus member of the MonadPlus class and the (<|>) member of Alternative.

The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.

(<?>) :: ParsecT s u m a -> String -> ParsecT s u m a

The parser p ? msg behaves as parser p, but whenever the parser p fails without consuming any input, it replaces expect error messages with the expect error message msg.

This is normally used at the end of a set alternatives where we want to return an error message in terms of a higher level construct rather than returning all possible characters. For example, if the expr parser from the try example would fail, the error message is: '...: expecting expression'. Without the (<?>) combinator, the message would be like '...: expecting "let" or letter', which is less friendly.

choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a

choice ps tries to apply the parsers in the list ps in order, until one of them succeeds. Returns the value of the succeeding parser.

try :: ParsecT s u m a -> ParsecT s u m a

The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs.

This combinator is used whenever arbitrary look ahead is needed. Since it pretends that it hasn't consumed any input when p fails, the (<|>) combinator will try its second alternative even when the first parser failed while consuming input.

The try combinator can for example be used to distinguish identifiers and reserved words. Both reserved words and identifiers are a sequence of letters. Whenever we expect a certain reserved word where we can also expect an identifier we have to use the try combinator. Suppose we write:

  expr        = letExpr <|> identifier <?> "expression"

  letExpr     = do{ string "let"; ... }
  identifier  = many1 letter

If the user writes "lexical", the parser fails with: unexpected 'x', expecting 't' in "let". Indeed, since the (<|>) combinator only tries alternatives when the first alternative hasn't consumed input, the identifier parser is never tried (because the prefix "le" of the string "let" parser is already consumed). The right behaviour can be obtained by adding the try combinator:

  expr        = letExpr <|> identifier <?> "expression"

  letExpr     = do{ try (string "let"); ... }
  identifier  = many1 letter

sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

sepBy p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.

  commaSep p  = p `sepBy` (symbol ",")

sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

sepBy1 p sep parses one or more occurrences of p, separated by sep. Returns a list of values returned by p.

sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

sepEndBy p sep parses zero or more occurrences of p, separated and optionally ended by sep, ie. haskell style statements. Returns a list of values returned by p.

  haskellStatements  = haskellStatement `sepEndBy` semi

sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

sepEndBy1 p sep parses one or more occurrences of p, separated and optionally ended by sep. Returns a list of values returned by p.

endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

endBy p sep parses zero or more occurrences of p, seperated and ended by sep. Returns a list of values returned by p.

   cStatements  = cStatement `endBy` semi

endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]

endBy1 p sep parses one or more occurrences of p, seperated and ended by sep. Returns a list of values returned by p.

option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a

option x p tries to apply parser p. If p fails without consuming input, it returns the value x, otherwise the value returned by p.

  priority  = option 0 (do{ d <- digit
                          ; return (digitToInt d) 
                          })

optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()

optional p tries to apply parser p. It will parse p or nothing. It only fails if p fails after consuming input. It discards the result of p.

optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a)

optionMaybe p tries to apply parser p. If p fails without consuming input, it return Nothing, otherwise it returns Just the value returned by p.

getState :: Monad m => ParsecT s u m u

Returns the current user state.

setState :: Monad m => u -> ParsecT s u m ()

An alias for putState for backwards compatibility.

updateState :: Monad m => (u -> u) -> ParsecT s u m ()

An alias for modifyState for backwards compatibility.

data SourcePos

The abstract data type SourcePos represents source positions. It contains the name of the source (i.e. file name), a line number and a column number. SourcePos is an instance of the Show, Eq and Ord class.

getPosition :: Monad m => ParsecT s u m SourcePos

Returns the current source position. See also SourcePos.

setPosition :: Monad m => SourcePos -> ParsecT s u m ()

setPosition pos sets the current source position to pos.

sourceColumn :: SourcePos -> Column

Extracts the column number from a source position.

sourceLine :: SourcePos -> Line

Extracts the line number from a source position.

newPos :: SourceName -> Line -> Column -> SourcePos

Create a new SourcePos with the given source name, line number and column number.

token

Arguments

:: Stream s Identity t 
=> (t -> String)

Token pretty-printing function.

-> (t -> SourcePos)

Computes the position of a token.

-> (t -> Maybe a)

Matching function for the token to parse.

-> Parsec s u a 

The parser token showTok posFromTok testTok accepts a token t with result x when the function testTok t returns Just x. The source position of the t should be returned by posFromTok t and the token can be shown using showTok t.

This combinator is expressed in terms of tokenPrim. It is used to accept user defined token streams. For example, suppose that we have a stream of basic tokens tupled with source positions. We can than define a parser that accepts single tokens as:

  mytoken x
    = token showTok posFromTok testTok
    where
      showTok (pos,t)     = show t
      posFromTok (pos,t)  = pos
      testTok (pos,t)     = if x == t then Just t else Nothing