Safe Haskell | None |
---|---|
Language | Haskell2010 |
A library for making extraction of information from parsed XML easier.
Example
Suppose you have an xml file of books like this:
<?xml version="1.0"?> <library> <book id="1" isbn="23234-1"> <author>John Doe</author> <title>Some book</title> </book> <book id="2"> <author>You</author> <title>The Great Event</title> </book> ... </library>
And a data type for a book:
data Book = Book { bookdId :: Int , isbn :: Maybe String , author, title :: String }
You can parse the xml file into a generic tree structure using
parseXMLDoc
, then extract information from
the tree using this library.
library =element
"library" $children
$many
book book =element
"book" $ do i <-attribAs
"id"integer
s <-optional
(attrib
"isbn")children
$ do a <-element
"author" $contents
$text
t <-element
"title" $contents
$text
return $ Book { bookId = i, author = a, title = t, isbn = s } extractLibrary ::Element
->Either
ExtractionErr
[Book] extractLibrary =extractDocContents
library
Note: The Applicative
module contains some useful
combinators like optional
, many
and <|>
.
- type Path = [String]
- data Err
- data ExtractionErr = ExtractionErr {}
- data ElementExtractor a
- extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a
- attrib :: String -> ElementExtractor String
- attribAs :: String -> (String -> Either Err a) -> ElementExtractor a
- children :: ContentsExtractor a -> ElementExtractor a
- contents :: ContentsExtractor a -> ElementExtractor a
- data ContentsExtractor a
- extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a
- extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a
- element :: String -> ElementExtractor a -> ContentsExtractor a
- text :: ContentsExtractor String
- textAs :: (String -> Either Err a) -> ContentsExtractor a
- eoc :: ContentsExtractor ()
- only :: ContentsExtractor a -> ContentsExtractor a
Errors
Extraction errors.
Element extraction
data ElementExtractor a Source
extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a Source
extractElement p element
extracts element
with p
.
attrib :: String -> ElementExtractor String Source
attrib name
extracts the value of attribute name
.
attribAs :: String -> (String -> Either Err a) -> ElementExtractor a Source
attribAs name f
extracts the value of attribute name
and runs
it through a conversion/validation function.
children :: ContentsExtractor a -> ElementExtractor a Source
children p
extract only child elements with p
.
contents :: ContentsExtractor a -> ElementExtractor a Source
contents p
extract contents with p
.
Contents extraction
data ContentsExtractor a Source
extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a Source
extractContents p contents
extracts the contents with p
.
extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a Source
Using parseXMLDoc
produces a single
Element
. Such an element can be extracted using this function.
element :: String -> ElementExtractor a -> ContentsExtractor a Source
element name p
extracts a name
element with p
.
text :: ContentsExtractor String Source
Extracts text.
textAs :: (String -> Either Err a) -> ContentsExtractor a Source
Extracts text applied to a conversion function.
eoc :: ContentsExtractor () Source
Succeeds only when there is no more content.
only :: ContentsExtractor a -> ContentsExtractor a Source
only p
fails if there is more contents than extracted by p
.