|Stability||moving towards stable|
This module is for extracting information out of unstructured HTML code, sometimes known as tag-soup. This is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information.
- data Tag
- type Attribute = (String, String)
- parseTags :: String -> [Tag]
- module Data.Html.Download
- (~==) :: Tag -> Tag -> Bool
- (~/=) :: Tag -> Tag -> Bool
- isTagOpen :: Tag -> Bool
- isTagClose :: Tag -> Bool
- isTagText :: Tag -> Bool
- fromTagText :: Tag -> String
- fromAttrib :: String -> Tag -> String
- isTagOpenName :: String -> Tag -> Bool
- isTagCloseName :: String -> Tag -> Bool
- sections :: (a -> Bool) -> [a] -> [[a]]
- partitions :: (a -> Bool) -> [a] -> [[a]]
Data structures and parsing
|TagOpen String [Attribute]|
An open tag with
A closing tag
A text node, guranteed not to be the empty string
Parse an HTML document to a list of
Automatically expands out escape characters.
Performs an inexact match, the first item should be the thing to match. If the second item is a blank string, that is considered to match anything. For example:
(TagText "test" ~== TagText "" ) == True (TagText "test" ~== TagText "test") == True (TagText "test" ~== TagText "soup") == False
TagOpen missing attributes on the right are allowed.
This function takes a list, and returns all initial lists whose first item matches the function.