xml-extractors- Extension to the xml package to extract data from parsed xml

Functions to extract data from parsed XML.


Suppose you have an xml file of books like this:

<?xml version="1.0"?>
  <book id="1" isbn="23234-1">
    <author>John Doe</author>
    <title>Some book</title>
  <book id="2">
    <title>The Great Event</title>

And a data type for a book:

data Book = Book { bookId        :: Int
                 , isbn          :: Maybe String
                 , author, title :: String

You can parse the xml file into a generic tree structure using parseXMLDoc from the xml package.

Using this library one can define extractors to extract Books from the generic tree.

   book = element "book" $ do
            i <- attribAs "id" integer
            s <- optional (attrib "isbn")
            children $ do
              a <- element "author" $ contents $ text
              t <- element "title" $ contents $ text
              return Book { bookId = i, author = a, title = t, isbn = s }

   library = element "library" $ children $ only $ many book

   extractLibrary :: Element -> Either ExtractionErr [Book]
   extractLibrary = extractDocContents library


  • The only combinator can be used to exhaustively extract contents.



type Path = [String] Source #

Location for some content.

For now it is a reversed list of content indices (starting at 1) and element names. This may change to something less "stringly typed".

data Err Source #

Extraction errors.



Some expected content is missing


An expected attribute is missing



An attribute value was bad



Expected end of contents


Unexpected end of contents

ErrMsg String 


Show Err Source # 


showsPrec :: Int -> Err -> ShowS #

show :: Err -> String #

showList :: [Err] -> ShowS #

Element extraction

extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a Source #

extractElement p element extracts element with p.

attrib :: String -> ElementExtractor String Source #

attrib name extracts the value of attribute name.

attribAs :: String -> (String -> Either String a) -> ElementExtractor a Source #

attribAs name f extracts the value of attribute name and runs it through a conversion/validation function.

The conversion function takes a string with the value and returns either a description of the expected format of the value or the converted value.

children :: ContentsExtractor a -> ElementExtractor a Source #

children p extract only child elements with p.

contents :: ContentsExtractor a -> ElementExtractor a Source #

contents p extract contents with p.

Contents extraction

extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a Source #

extractContents p contents extracts the contents with p.

extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a Source #

Using parseXMLDoc produces a single Element. Such an element can be extracted using this function.

element :: String -> ElementExtractor a -> ContentsExtractor a Source #

element name p extracts a name element with p.

textAs :: (String -> Either Err a) -> ContentsExtractor a Source #

Extracts text applied to a conversion function.

choice :: [ContentsExtractor a] -> ContentsExtractor a Source #

Extracts first matching.

eoc :: ContentsExtractor () Source #

Succeeds only when there is no more content.

only :: ContentsExtractor a -> ContentsExtractor a Source #

only p fails if there is more contents than extracted by p.

only p = p <* eoc


showExtractionErr :: ExtractionErr -> String Source #

Converts an extraction error to a multi line string message.

Paths are shown according to showPath.

eitherMessageOrValue :: Either ExtractionErr a -> Either String a Source #

Convenience function to convert extraction errors to string messages using showExtractionErr.

eitherMessageOrValue = either (Left . showExtractionErr) Right

integer :: (Integral a, Read a) => String -> Either String a Source #

Reads an integer value or return Left "integer" if the read fails.

float :: (Floating a, Read a) => String -> Either String a Source #

Reads a floating point value or return Left "float" if the read fails.