tagchup-0.4.0.6: alternative package for processing of tag soups

Safe HaskellSafe
LanguageHaskell98

Text.HTML.Tagchup.Process

Synopsis

Documentation

type Encoded = String #

should be [Word8]

decodeAdaptive :: (Attribute name, Tag name) => (Encoding -> Encoded -> String) -> [T name [T]] -> State (Encoded -> String) [T name String] Source #

Selects a decoder dynamically according to xml-encoding and meta-http-equiv tags. The ?xml tag should only appear at the beginning of a document, but we respect it at every occurence.

import qualified Text.XML.HXT.DOM.Unicode as Unicode
evalDecodeAdaptive .
decodeAdaptive
   (maybe Unicode.latin1ToUnicode (fst.) .
    Unicode.getDecodingFct)

decodeTagAdaptive :: (Attribute name, Tag name) => (Encoding -> Encoded -> String) -> T name [T] -> State (Encoded -> String) (T name String) Source #

decodeTagAdaptive decoderSelector tag generates a state monad, with a decoder as state. It decodes encoding specific byte sequences using the current decoder and XML references using a fixed table.

getXMLEncoding :: (Tag name, Attribute name) => [T name String] -> Maybe String Source #

Check whether the first tag is an xml processing instruction tag and return the value of its encoding attribute.

findMetaEncoding :: (Tag name, Attribute name) => [T name String] -> Maybe String Source #

Rather the same as wraxml:HTML.Tree.findMetaEncoding

getMetaHTTPHeaders :: (Tag name, Attribute name) => [T name string] -> [(string, string)] Source #

Extract META tags which contain HTTP-EQUIV attribute and present these values like HTTP headers.

getHeadTags :: (Tag name, Attribute name) => [T name string] -> [T name string] Source #

partAttrs :: Tag name => (Name name -> Bool) -> (([T name string], [T name string]) -> ([T name string], [T name string])) -> [T name string] -> [T name string] Source #

Modify attributes and tags of certain parts. For limitations, see parts.

parts :: Tag name => (Name name -> Bool) -> [T name string] -> [Either ((Name name, [T name string]), [T name string]) [T name string]] Source #

Extract parts from the tag soup that are enclosed in corresponding open and close tags. If a close tag is missing, the soup end is considered as end of the part. However nested tags are not supported, e.g. in <a><a></a></a> the second <a> is considered to be enclosed in the first <a> and the first </a>, and the second </a> is ignored.