tagchup-0.4.1: alternative package for processing of tag soups

Safe HaskellSafe
LanguageHaskell98

Text.HTML.Tagchup.Process

Synopsis

Documentation

type Encoded = String #

should be [Word8]

decodeAdaptive :: (Attribute name, Tag name) => (Encoding -> Encoded -> String) -> [T name [T]] -> State (Encoded -> String) [T name String] Source #

Selects a decoder dynamically according to xml-encoding and meta-http-equiv tags. The ?xml tag should only appear at the beginning of a document, but we respect it at every occurence.

import qualified Text.XML.HXT.DOM.Unicode as Unicode
evalDecodeAdaptive .
decodeAdaptive
   (maybe Unicode.latin1ToUnicode (fst.) .
    Unicode.getDecodingFct)

decodeTagAdaptive :: (Attribute name, Tag name) => (Encoding -> Encoded -> String) -> T name [T] -> State (Encoded -> String) (T name String) Source #

decodeTagAdaptive decoderSelector tag generates a state monad, with a decoder as state. It decodes encoding specific byte sequences using the current decoder and XML references using a fixed table.

getXMLEncoding :: (Tag name, Attribute name) => [T name String] -> Maybe String Source #

Check whether the first tag is an xml processing instruction tag and return the value of its encoding attribute.

findMetaEncoding :: (Tag name, Attribute name) => [T name String] -> Maybe String Source #

Rather the same as wraxml:HTML.Tree.findMetaEncoding

getMetaHTTPHeaders :: (Tag name, Attribute name) => [T name string] -> [(string, string)] Source #

Extract META tags which contain HTTP-EQUIV attribute and present these values like HTTP headers.

getHeadTags :: (Tag name, Attribute name) => [T name string] -> [T name string] Source #

partAttrs :: Tag name => (Name name -> Bool) -> (([T name string], [T name string]) -> ([T name string], [T name string])) -> [T name string] -> [T name string] Source #

Modify attributes and tags of certain parts. For limitations, see parts.

parts :: Tag name => (Name name -> Bool) -> [T name string] -> [Either ((Name name, [T name string]), [T name string]) [T name string]] Source #

Extract parts from the tag soup that are enclosed in corresponding open and close tags. If a close tag is missing, the soup end is considered as end of the part. However nested tags are not supported, e.g. in <a><a></a></a> the second <a> is considered to be enclosed in the first <a> and the first </a>, and the second </a> is ignored.

takeBeforeMatchingClose :: Eq name => Name name -> [T name string] -> [T name string] Source #

Take all tags until the one that matches the opening tag. The matching closing tag is not included in the list. The list must begin with the according opening tag. Nesting of the considered tag is respected, but the nesting of other tags is ignored.

takeUntilMatchingClose :: Eq name => Name name -> [T name string] -> Maybe [T name string] Source #

This is like takeBeforeMatchingClose but the matching close tag is included in the result.