list-t-html-parser-0.4.2: Streaming HTML parser

Safe HaskellNone
LanguageHaskell2010

ListT.HTMLParser

Contents

Synopsis

Documentation

data Parser m a Source #

A backtracking HTML-tokens stream parser.

Instances

Monad m => MonadError Error (Parser m) Source # 

Methods

throwError :: Error -> Parser m a #

catchError :: Parser m a -> (Error -> Parser m a) -> Parser m a #

Monad m => Monad (Parser m) Source # 

Methods

(>>=) :: Parser m a -> (a -> Parser m b) -> Parser m b #

(>>) :: Parser m a -> Parser m b -> Parser m b #

return :: a -> Parser m a #

fail :: String -> Parser m a #

Monad m => Functor (Parser m) Source # 

Methods

fmap :: (a -> b) -> Parser m a -> Parser m b #

(<$) :: a -> Parser m b -> Parser m a #

Monad m => Applicative (Parser m) Source # 

Methods

pure :: a -> Parser m a #

(<*>) :: Parser m (a -> b) -> Parser m a -> Parser m b #

(*>) :: Parser m a -> Parser m b -> Parser m b #

(<*) :: Parser m a -> Parser m b -> Parser m a #

Monad m => Alternative (Parser m) Source # 

Methods

empty :: Parser m a #

(<|>) :: Parser m a -> Parser m a -> Parser m a #

some :: Parser m a -> Parser m [a] #

many :: Parser m a -> Parser m [a] #

Monad m => MonadPlus (Parser m) Source # 

Methods

mzero :: Parser m a #

mplus :: Parser m a -> Parser m a -> Parser m a #

type Error = Maybe ErrorDetails Source #

A possibly detailed parser error. When mzero or empty is used, an error value of Nothing is produced.

run :: Monad m => Parser m a -> ListT m Token -> m (Either Error a) Source #

Run a parser on a stream of HTML tokens, consuming only as many as needed.

Parsers

eoi :: Monad m => Parser m () Source #

End of input.

token :: Monad m => Parser m Token Source #

A token with HTML entities decoded and with spaces filtered out.

rawToken :: Monad m => Parser m Token Source #

An HTML token as it is: without HTML-decoding and ignoring of spaces.

space :: Monad m => Parser m Text Source #

A text token, which is completely composed of characters, which satisfy the isSpace predicate.

openingTag :: Monad m => Parser m OpeningTag Source #

An opening tag with HTML entities in values decoded.

closingTag :: Monad m => Parser m Identifier Source #

A closing tag.

text :: Monad m => Parser m Text Source #

A text between tags with HTML entities decoded.

comment :: Monad m => Parser m Text Source #

Contents of a comment.

html :: Monad m => Parser m Builder Source #

The auto-repaired textual HTML representation of an HTML-tree node.

Useful for consuming HTML-formatted snippets.

E.g., when the following parser:

openingTag *> html

is run against the following HTML snippet:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

it'll produce the following text builder value:

<li>I'm not your friend, <b>buddy</b>!</li>

If you want to consume all children of a node, it's recommended to use properHTML in combination with many or many1. For details consult the docs on properHTML.

This parser is smart and handles and repairs broken HTML:

  • It repairs unclosed tags, interpreting them as closed singletons. E.g., <br> will be consumed as <br/>.
  • It handles orphan closing tags by ignoring them. E.g. it'll consume the input <a></b></a> as <a></a>.

properHTML :: Monad m => Parser m Builder Source #

Same as html, but fails if the input begins with an orphan closing tag. I.e., the input "</a><b></b>" will make this parser fail.

This parser is particularly useful for consuming all children in the current context. E.g., running the following parser:

openingTag *> (mconcat <$> many properHTML)

on the following input:

<ul>
  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>
</ul>

will produce a merged text builder, which consists of the following nodes:

  <li>I'm not your friend, <b>buddy</b>!</li>
  <li>I'm not your buddy, <b>guy</b>!</li>
  <li>He's not your guy, <b>friend</b>!</li>
  <li>I'm not your friend, <b>buddy</b>!</li>

Notice that unlike with html, it's safe to assume that it will not consume the following closing </ul> tag, because it does not begin a valid HTML-tree node.

Notice also that despite failing in case of the first broken token, this parser handles the broken tokens in other cases the same way as html.

xmlNode :: Monad m => Parser m Node Source #

Works the same way as properHTML, but constructs an XML-tree.

Combinators

many1 :: Monad m => Parser m a -> Parser m [a] Source #

Apply a parser at least one time.

manyTill :: Monad m => Parser m a -> Parser m b -> Parser m ([a], b) Source #

Apply a parser multiple times until another parser is satisfied. Returns results of both parsers.

skipTill :: Monad m => Parser m a -> Parser m a Source #

Skip any tokens until the provided parser is satisfied.

total :: Monad m => Parser m a -> Parser m a Source #

Greedily consume all the input until the end, while running the provided parser. Same as:

theParser <* eoi