list-t-html-parser-0.4.0.0: Streaming HTML parser

Safe HaskellNone
LanguageHaskell2010

ListT.HTMLParser

Contents

Synopsis

Documentation

data Parser m a Source

A backtracking HTML-tokens stream parser.

Instances

type Error = Maybe ErrorDetails Source

A possibly detailed parser error. When mzero or empty is used, an error value of Nothing is produced.

run :: Monad m => Parser m a -> ListT m Token -> m (Either Error a) Source

Run a parser on a stream of HTML tokens, consuming only as many as needed.

Parsers

eoi :: Monad m => Parser m () Source

End of input.

token :: Monad m => Parser m Token Source

A token with HTML entities decoded and with spaces filtered out.

rawToken :: Monad m => Parser m Token Source

An HTML token as it is: without HTML-decoding and ignoring of spaces.

space :: Monad m => Parser m Text Source

A text token, which is completely composed of characters, which satisfy the isSpace predicate.

openingTag :: Monad m => Parser m OpeningTag Source

An opening tag with HTML entities in values decoded.

closingTag :: Monad m => Parser m Identifier Source

A closing tag.

text :: Monad m => Parser m Text Source

A text between tags with HTML entities decoded.

comment :: Monad m => Parser m Text Source

Contents of a comment.

html :: Monad m => Parser m Builder Source

The auto-repaired textual HTML representation of an HTML-tree node.

Useful for consuming HTML-formatted snippets.

E.g., when the following parser:

openingTag *> html

is run against the following HTML snippet:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

it'll produce the following text builder value:

<li>I'm not your friend, <b>buddy</b>!</li>

If you want to consume all children of a node, it's recommended to use properHTML in combination with many or many1. For details consult the docs on properHTML.

This parser is smart and handles and repairs broken HTML:

  • It repairs unclosed tags, interpreting them as closed singletons. E.g., <br> will be consumed as <br/>.
  • It handles orphan closing tags by ignoring them. E.g. it'll consume the input <a></b></a> as <a></a>.

properHTML :: Monad m => Parser m Builder Source

Same as html, but fails if the input begins with an orphan closing tag. I.e., the input "</a><b></b>" will make this parser fail.

This parser is particularly useful for consuming all children in the current context. E.g., running the following parser:

openingTag *> (mconcat <$> many properHTML)

on the following input:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

will produce a merged text builder, which consists of the following nodes:

  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>

Notice that unlike with html, it's safe to assume that it will not consume the following closing </ul> tag, because it does not begin a valid HTML-tree node.

Notice also that despite failing in case of the first broken token, this parser handles the broken tokens in other cases the same way as html.

Combinators

many1 :: Monad m => Parser m a -> Parser m [a] Source

Apply a parser at least one time.

manyTill :: Monad m => Parser m a -> Parser m b -> Parser m ([a], b) Source

Apply a parser multiple times until another parser is satisfied. Returns results of both parsers.

skipTill :: Monad m => Parser m a -> Parser m a Source

Skip any tokens until the provided parser is satisfied.

total :: Monad m => Parser m a -> Parser m a Source

Greedily consume all the input until the end, while running the provided parser. Same as:

theParser <* eoi