xml-extractors- Wrapper over xml to extract data from parsed xml

Safe HaskellNone




Functions to extract data from parsed XML.


Suppose you have an xml file of books like this:

<?xml version="1.0"?>
  <book id="1" isbn="23234-1">
    <author>John Doe</author>
    <title>Some book</title>
  <book id="2">
    <title>The Great Event</title>

And a data type for a book:

data Book = Book { bookId        :: Int
                 , isbn          :: Maybe String
                 , author, title :: String

You can parse the xml file into a generic tree structure using parseXMLDoc from the xml package.

Using this library one can define extractors to extract data from the generic tree.

   library = element "library" $ children $ only $ many book

   book = element "book" $ do
            i <- attribAs "id" integer
            s <- optional (attrib "isbn")
            children $ do
              a <- element "author" $ contents $ text
              t <- element "title" $ contents $ text
              return $ Book { bookId = i, author = a, title = t, isbn = s }

   extractLibrary :: Element -> Either ExtractionErr [Book]
   extractLibrary = extractDocContents library




type Path = [String] Source

Location for some content.

For now it is a reversed list of content indices (starting at 1) and element names. This may change to something less "stringly typed".

data Err Source

Extraction errors.



Some expected content is missing


An expected attribute is missing


expectedAttrib :: String

name of expected attribute

atElement :: Element

element with missing attribute


An attribute value was bad


expectedValue :: String

description of expected value

foundValue :: String

the value found

atElement :: Element

element with missing attribute


Expected end of contents


Unexpected end of contents

ErrMsg String 


data ExtractionErr Source

Error with a context.




err :: Err
context :: Path

Element extraction

extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a Source

extractElement p element extracts element with p.

attrib :: String -> ElementExtractor String Source

attrib name extracts the value of attribute name.

attribAs :: String -> (String -> Either String a) -> ElementExtractor a Source

attribAs name f extracts the value of attribute name and runs it through a conversion/validation function.

The conversion function takes a string with the value and returns either a description of the expected format of the value or the converted value.

children :: ContentsExtractor a -> ElementExtractor a Source

children p extract only child elements with p.

contents :: ContentsExtractor a -> ElementExtractor a Source

contents p extract contents with p.

Contents extraction

extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a Source

extractContents p contents extracts the contents with p.

extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a Source

Using parseXMLDoc produces a single Element. Such an element can be extracted using this function.

element :: String -> ElementExtractor a -> ContentsExtractor a Source

element name p extracts a name element with p.

textAs :: (String -> Either Err a) -> ContentsExtractor a Source

Extracts text applied to a conversion function.

choice :: [ContentsExtractor a] -> ContentsExtractor a Source

Extracts first matching.

eoc :: ContentsExtractor () Source

Succeeds only when there is no more content.

only :: ContentsExtractor a -> ContentsExtractor a Source

only p fails if there is more contents than extracted by p.

only p = p <* eoc


showExtractionErr :: ExtractionErr -> String Source

Converts an extraction error to a multi line string message.

Paths are shown according to showPath.

eitherMessageOrValue :: Either ExtractionErr a -> Either String a Source

Convenience function to convert extraction errors to string messages using showExtractionErr.

eitherMessageOrValue = either (Left . showExtractionErr) Right

integer :: (Integral a, Read a) => String -> Either String a Source

Reads an integer value or return Left "integer" if the read fails.

float :: (Floating a, Read a) => String -> Either String a Source

Reads a floating point value or return Left "float" if the read fails.