Portability | portable |
---|---|
Stability | moving towards stable |
Maintainer | http://www.cs.york.ac.uk/~ndm/ |
This module is for extracting information out of unstructured HTML code, sometimes known as tag-soup. This is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information.
The standard practice is to parse a String to Tag
s using parseTags
, then
operate upon it to extract the necessary information.
- data Tag
- type Attribute = (String, String)
- parseTags :: String -> [Tag]
- module Data.Html.Download
- (~==) :: Tag -> Tag -> Bool
- (~/=) :: Tag -> Tag -> Bool
- isTagOpen :: Tag -> Bool
- isTagClose :: Tag -> Bool
- isTagText :: Tag -> Bool
- fromTagText :: Tag -> String
- fromAttrib :: String -> Tag -> String
- isTagOpenName :: String -> Tag -> Bool
- isTagCloseName :: String -> Tag -> Bool
- sections :: (a -> Bool) -> [a] -> [[a]]
- partitions :: (a -> Bool) -> [a] -> [[a]]
Data structures and parsing
parseTags :: String -> [Tag]Source
Parse an HTML document to a list of Tag
.
Automatically expands out escape characters.
module Data.Html.Download
Tag Combinators
(~==) :: Tag -> Tag -> BoolSource
Performs an inexact match, the first item should be the thing to match. If the second item is a blank string, that is considered to match anything. For example:
(TagText "test" ~== TagText "" ) == True (TagText "test" ~== TagText "test") == True (TagText "test" ~== TagText "soup") == False
For TagOpen
missing attributes on the right are allowed.
fromAttrib :: String -> Tag -> StringSource
isTagOpenName :: String -> Tag -> BoolSource
isTagCloseName :: String -> Tag -> BoolSource
sections :: (a -> Bool) -> [a] -> [[a]]Source
This function takes a list, and returns all initial lists whose first item matches the function.
partitions :: (a -> Bool) -> [a] -> [[a]]Source
This function is similar to sections
, but splits the list
so no element appears in any two partitions