Safe Haskell | None |
---|---|
Language | Haskell2010 |
Functions to extract data from parsed XML.
Example
Suppose you have an xml file of books like this:
<?xml version="1.0"?> <library> <book id="1" isbn="23234-1"> <author>John Doe</author> <title>Some book</title> </book> <book id="2"> <author>You</author> <title>The Great Event</title> </book> ... </library>
And a data type for a book:
data Book = Book { bookId :: Int , isbn :: Maybe String , author, title :: String }
You can parse the xml file into a generic tree structure using
parseXMLDoc
from the xml
package.
Using this library one can define extractors to extract Books from the generic tree.
book =element
"book" $ do i <-attribAs
"id"integer
s <-optional
(attrib
"isbn")children
$ do a <-element
"author" $contents
$text
t <-element
"title" $contents
$text
return Book { bookId = i, author = a, title = t, isbn = s } library =element
"library" $children
$only
$many
book extractLibrary ::Element
->Either
ExtractionErr
[Book] extractLibrary =extractDocContents
library
Notes
- The
only
combinator can be used to exhaustively extract contents.
- The Control.Applicative module contains some useful
combinators like
optional
,many
and<|>
. - The Text.XML.Light.Extractors.ShowErr contains some predefined functions to convert error values to strings.
- type Path = [String]
- data Err
- = ErrExpectContent { }
- | ErrExpectAttrib { }
- | ErrAttribValue { }
- | ErrEnd { }
- | ErrNull { }
- | ErrMsg String
- data ExtractionErr = ExtractionErr {}
- data ElementExtractor a
- extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a
- attrib :: String -> ElementExtractor String
- attribAs :: String -> (String -> Either String a) -> ElementExtractor a
- children :: ContentsExtractor a -> ElementExtractor a
- contents :: ContentsExtractor a -> ElementExtractor a
- data ContentsExtractor a
- extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a
- extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a
- element :: String -> ElementExtractor a -> ContentsExtractor a
- text :: ContentsExtractor String
- textAs :: (String -> Either Err a) -> ContentsExtractor a
- choice :: [ContentsExtractor a] -> ContentsExtractor a
- anyContent :: ContentsExtractor Content
- eoc :: ContentsExtractor ()
- only :: ContentsExtractor a -> ContentsExtractor a
- showExtractionErr :: ExtractionErr -> String
- eitherMessageOrValue :: Either ExtractionErr a -> Either String a
- integer :: (Integral a, Read a) => String -> Either String a
- float :: (Floating a, Read a) => String -> Either String a
Errors
Location for some content.
For now it is a reversed list of content indices (starting at 1) and element names. This may change to something less "stringly typed".
Extraction errors.
ErrExpectContent | Some expected content is missing |
ErrExpectAttrib | An expected attribute is missing |
| |
ErrAttribValue | An attribute value was bad |
| |
ErrEnd | Expected end of contents |
ErrNull | Unexpected end of contents |
ErrMsg String |
Element extraction
data ElementExtractor a Source #
extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a Source #
extractElement p element
extracts element
with p
.
attrib :: String -> ElementExtractor String Source #
attrib name
extracts the value of attribute name
.
attribAs :: String -> (String -> Either String a) -> ElementExtractor a Source #
attribAs name f
extracts the value of attribute name
and runs
it through a conversion/validation function.
The conversion function takes a string with the value and returns either a description of the expected format of the value or the converted value.
children :: ContentsExtractor a -> ElementExtractor a Source #
children p
extract only child elements with p
.
contents :: ContentsExtractor a -> ElementExtractor a Source #
contents p
extract contents with p
.
Contents extraction
data ContentsExtractor a Source #
extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a Source #
extractContents p contents
extracts the contents with p
.
extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a Source #
Using parseXMLDoc
produces a single
Element
. Such an element can be extracted using this function.
element :: String -> ElementExtractor a -> ContentsExtractor a Source #
element name p
extracts a name
element with p
.
text :: ContentsExtractor String Source #
Extracts text.
textAs :: (String -> Either Err a) -> ContentsExtractor a Source #
Extracts text applied to a conversion function.
choice :: [ContentsExtractor a] -> ContentsExtractor a Source #
Extracts first matching.
anyContent :: ContentsExtractor Content Source #
Extracts one Content
item.
eoc :: ContentsExtractor () Source #
Succeeds only when there is no more content.
only :: ContentsExtractor a -> ContentsExtractor a Source #
only p
fails if there is more contents than extracted by p
.
only p = p <* eoc
Utils
showExtractionErr :: ExtractionErr -> String Source #
Converts an extraction error to a multi line string message.
Paths are shown according to showPath
.
eitherMessageOrValue :: Either ExtractionErr a -> Either String a Source #
Convenience function to convert extraction errors to string
messages using showExtractionErr
.
eitherMessageOrValue = either (Left . showExtractionErr) Right