Safe Haskell	None
Language	Haskell2010

ListT.HTMLParser

Contents

Parsers
Combinators

Synopsis

Documentation

data Parser m a Source #

A backtracking HTML-tokens stream parser.

Instances

Monad m => MonadError Error (Parser m) Source #
Methods throwError :: Error -> Parser m a # catchError :: Parser m a -> (Error -> Parser m a) -> Parser m a #
Monad m => Monad (Parser m) Source #
Methods (>>=) :: Parser m a -> (a -> Parser m b) -> Parser m b # (>>) :: Parser m a -> Parser m b -> Parser m b # return :: a -> Parser m a # fail :: String -> Parser m a #
Monad m => Functor (Parser m) Source #
Methods fmap :: (a -> b) -> Parser m a -> Parser m b # (<$) :: a -> Parser m b -> Parser m a #
Monad m => Applicative (Parser m) Source #
Methods pure :: a -> Parser m a # (<>) :: Parser m (a -> b) -> Parser m a -> Parser m b # (>) :: Parser m a -> Parser m b -> Parser m b # (<*) :: Parser m a -> Parser m b -> Parser m a #
Monad m => Alternative (Parser m) Source #
Methods empty :: Parser m a # (<\|>) :: Parser m a -> Parser m a -> Parser m a # some :: Parser m a -> Parser m [a] # many :: Parser m a -> Parser m [a] #
Monad m => MonadPlus (Parser m) Source #
Methods mzero :: Parser m a # mplus :: Parser m a -> Parser m a -> Parser m a #

type Error = Maybe ErrorDetails Source #

A possibly detailed parser error. When mzero or empty is used, an error value of Nothing is produced.

data ErrorDetails Source #

Constructors

ErrorDetails_Message Text	A text message
ErrorDetails_UnexpectedToken	Unexpected token
ErrorDetails_EOI	End of input

Instances

Eq ErrorDetails Source #
Methods (==) :: ErrorDetails -> ErrorDetails -> Bool # (/=) :: ErrorDetails -> ErrorDetails -> Bool #
Show ErrorDetails Source #
Methods showsPrec :: Int -> ErrorDetails -> ShowS # show :: ErrorDetails -> String # showList :: [ErrorDetails] -> ShowS #
Monad m => MonadError Error (Parser m) Source #
Methods throwError :: Error -> Parser m a # catchError :: Parser m a -> (Error -> Parser m a) -> Parser m a #

run :: Monad m => Parser m a -> ListT m Token -> m (Either Error a) Source #

Run a parser on a stream of HTML tokens, consuming only as many as needed.

Parsers

eoi :: Monad m => Parser m () Source #

End of input.

token :: Monad m => Parser m Token Source #

A token with HTML entities decoded and with spaces filtered out.

rawToken :: Monad m => Parser m Token Source #

An HTML token as it is: without HTML-decoding and ignoring of spaces.

space :: Monad m => Parser m Text Source #

A text token, which is completely composed of characters, which satisfy the isSpace predicate.

openingTag :: Monad m => Parser m OpeningTag Source #

An opening tag with HTML entities in values decoded.

closingTag :: Monad m => Parser m Identifier Source #

A closing tag.

text :: Monad m => Parser m Text Source #

A text between tags with HTML entities decoded.

comment :: Monad m => Parser m Text Source #

Contents of a comment.

html :: Monad m => Parser m Builder Source #

The auto-repaired textual HTML representation of an HTML-tree node.

Useful for consuming HTML-formatted snippets.

E.g., when the following parser:

openingTag *> html

is run against the following HTML snippet:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

it'll produce the following text builder value:

<li>I'm not your friend, <b>buddy</b>!</li>

If you want to consume all children of a node, it's recommended to use properHTML in combination with many or many1. For details consult the docs on properHTML.

This parser is smart and handles and repairs broken HTML:

It repairs unclosed tags, interpreting them as closed singletons. E.g.,   will be consumed as  .
It handles orphan closing tags by ignoring them. E.g. it'll consume the input <a></a> as <a></a>.

properHTML :: Monad m => Parser m Builder Source #

Same as html, but fails if the input begins with an orphan closing tag. I.e., the input "</a>" will make this parser fail.

This parser is particularly useful for consuming all children in the current context. E.g., running the following parser:

openingTag *> (mconcat <$> many properHTML)

on the following input:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

will produce a merged text builder, which consists of the following nodes:

  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>

Notice that unlike with html, it's safe to assume that it will not consume the following closing </ul> tag, because it does not begin a valid HTML-tree node.

Notice also that despite failing in case of the first broken token, this parser handles the broken tokens in other cases the same way as html.

xmlNode :: Monad m => Parser m Node Source #

Works the same way as properHTML, but constructs an XML-tree.

Combinators

many1 :: Monad m => Parser m a -> Parser m [a] Source #

Apply a parser at least one time.

manyTill :: Monad m => Parser m a -> Parser m b -> Parser m ([a], b) Source #

Apply a parser multiple times until another parser is satisfied. Returns results of both parsers.

skipTill :: Monad m => Parser m a -> Parser m a Source #

Skip any tokens until the provided parser is satisfied.

total :: Monad m => Parser m a -> Parser m a Source #

Greedily consume all the input until the end, while running the provided parser. Same as:

theParser <* eoi