xml-extractors-0.4.0.1: Extension to the xml package to extract data from parsed xml

Safe HaskellNone
LanguageHaskell2010

Text.XML.Light.Extractors

Contents

Description

Functions to extract data from parsed XML.

Example

Suppose you have an xml file of books like this:

<?xml version="1.0"?>
<library>
  <book id="1" isbn="23234-1">
    <author>John Doe</author>
    <title>Some book</title>
  </book>
  <book id="2">
    <author>You</author>
    <title>The Great Event</title>
  </book>
  ...
</library>

And a data type for a book:

data Book = Book { bookId        :: Int
                 , isbn          :: Maybe String
                 , author, title :: String
                 }

You can parse the xml file into a generic tree structure using parseXMLDoc from the xml package.

Using this library one can define extractors to extract Books from the generic tree.

   book = element "book" $ do
            i <- attribAs "id" integer
            s <- optional (attrib "isbn")
            children $ do
              a <- element "author" $ contents $ text
              t <- element "title" $ contents $ text
              return Book { bookId = i, author = a, title = t, isbn = s }

   library = element "library" $ children $ only $ many book

   extractLibrary :: Element -> Either ExtractionErr [Book]
   extractLibrary = extractDocContents library

Notes

Synopsis

Errors

type Path = [String] Source

Location for some content.

For now it is a reversed list of content indices (starting at 1) and element names. This may change to something less "stringly typed".

data Err Source

Extraction errors.

Constructors

ErrExpectContent

Some expected content is missing

ErrExpectAttrib

An expected attribute is missing

Fields

expectedAttrib :: String

name of expected attribute

atElement :: Element

element with missing attribute

ErrAttribValue

An attribute value was bad

Fields

expectedValue :: String

description of expected value

foundValue :: String

the value found

atElement :: Element

element with missing attribute

ErrEnd

Expected end of contents

ErrNull

Unexpected end of contents

ErrMsg String 

Instances

data ExtractionErr Source

Error with a context.

Constructors

ExtractionErr 

Fields

err :: Err
 
context :: Path
 

Element extraction

extractElement :: ElementExtractor a -> Element -> Either ExtractionErr a Source

extractElement p element extracts element with p.

attrib :: String -> ElementExtractor String Source

attrib name extracts the value of attribute name.

attribAs :: String -> (String -> Either String a) -> ElementExtractor a Source

attribAs name f extracts the value of attribute name and runs it through a conversion/validation function.

The conversion function takes a string with the value and returns either a description of the expected format of the value or the converted value.

children :: ContentsExtractor a -> ElementExtractor a Source

children p extract only child elements with p.

contents :: ContentsExtractor a -> ElementExtractor a Source

contents p extract contents with p.

Contents extraction

extractContents :: ContentsExtractor a -> [Content] -> Either ExtractionErr a Source

extractContents p contents extracts the contents with p.

extractDocContents :: ContentsExtractor a -> Element -> Either ExtractionErr a Source

Using parseXMLDoc produces a single Element. Such an element can be extracted using this function.

element :: String -> ElementExtractor a -> ContentsExtractor a Source

element name p extracts a name element with p.

textAs :: (String -> Either Err a) -> ContentsExtractor a Source

Extracts text applied to a conversion function.

choice :: [ContentsExtractor a] -> ContentsExtractor a Source

Extracts first matching.

eoc :: ContentsExtractor () Source

Succeeds only when there is no more content.

only :: ContentsExtractor a -> ContentsExtractor a Source

only p fails if there is more contents than extracted by p.

only p = p <* eoc

Utils

showExtractionErr :: ExtractionErr -> String Source

Converts an extraction error to a multi line string message.

Paths are shown according to showPath.

eitherMessageOrValue :: Either ExtractionErr a -> Either String a Source

Convenience function to convert extraction errors to string messages using showExtractionErr.

eitherMessageOrValue = either (Left . showExtractionErr) Right

integer :: (Integral a, Read a) => String -> Either String a Source

Reads an integer value or return Left "integer" if the read fails.

float :: (Floating a, Read a) => String -> Either String a Source

Reads a floating point value or return Left "float" if the read fails.