The hexpat package
This package provides a general purpose Haskell XML library using Expat to
do its parsing (http://expat.sourceforge.net/ - a fast stream-oriented XML
parser written in C). It is extensible to any string type, with
Text provided out of the box.
Basic usage: Parsing a tree (Tree), formatting a tree (Format). Other features: Helpers for processing XML trees (Proc), trees annotated with XML source location (Annotated), extended XML trees with comments, processing instructions, etc (Extended), XML cursors (Cursor), SAX-style parse (SAX), and access to the low-level interface in case speed is paramount (Internal.IO).
The design goals are speed, speed, speed, interface simplicity and modularity.
For introduction and examples, see the Text.XML.Expat.Tree module. For benchmarks, http://haskell.org/haskellwiki/Hexpat/
If you want to do interactive I/O, an obvious option is to use lazy parsing with one of the lazy I/O functions such as hGetContents. However, this can be problematic in some applications because it doesn't handle I/O errors properly and can give no guarantee of timely resource cleanup. In these cases, chunked I/O is a better approach: Take a look at the hexpat-iteratee package.
IO is filed under Internal because it's low-level and most users won't want it. The other Internal modules are re-exported by Annotated, Tree and Extended, so you won't need to import them directly.
Credits to Iavor Diatchki and the
xml (XML.Light) package for Proc and Cursor.
Thanks to the many contributors.
INSTALLATION: Unix install requires an OS package called something like
On MacOSX, expat comes with Apple's optional X11 package, or you can install it from source.
To install on Windows, first install the Windows binary that's available from
http://expat.sourceforge.net/, then type (assuming you're using v2.0.1):
cabal install hexpat --extra-lib-dirs="C:\Program Files\Expat 2.0.1\Bin" --extra-include-dirs="C:\Program Files\Expat 2.0.1\Source\Lib"
libexpat.dll can be found in your system PATH (or copy it into your executable's directory).
BOUND VS. UNBOUND THREADS: GHC (at least versions 6.12.X) will spawn threads if you call a safe FFI callback from an unbound thread. This can get out of control in a busy application. To avoid this, from version 0.19.1 we now delegate processing to a single worker thread if the calling thread is not bound. This essentially means that hexpat currently won't exploit multicores very well. It also means that hexpat may be more efficient on threads spawned with forkOS (to give you a bound thread) rather than forkIO.
ChangeLog: 0.15 changes intended to fix a (rare) "error: a C finalizer called back into Haskell." that seemed only to happen only on ghc6.12.X; 0.15.1 Fix broken Annotated parse; 0.16 switch from mtl to transformers; 0.17 fix mapNodeContainer & rename some things.; 0.18 rename defaultEncoding to overrideEncoding. 0.18.3 formatG and indent were demanding list items more than once (inefficient in chunked processing); 0.19 add Extended.hs; 0.19.1 fix a memory leak introduced in 0.19, delegate parsing to bound thread if unbound (see note above)
|Versions||0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.15.1, 0.16, 0.17, 0.18, 0.18.1, 0.18.2, 0.18.3, 0.19, 0.19.1, 0.19.2, 0.19.3, 0.19.4, 0.19.5, 0.19.6, 0.19.7, 0.19.8, 0.19.9, 0.19.10, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 0.20.7, 0.20.8, 0.20.9, 0.20.10, 0.20.11, 0.20.12, 0.20.13|
|Dependencies||base (>=3 && <5), bytestring, containers, deepseq (==1.1.*), extensible-exceptions (==0.1.*), List (==0.4.*), text (>=0.5 && <0.9), transformers, utf8-string (==0.3.*) [details]|
|Copyright||(c) 2009 Doug Beardsley <email@example.com>, (c) 2009-2010 Stephen Blackheath <http://blacksapphire.com/antispam/>, (c) 2009 Gregory Collins, (c) 2008 Evan Martin <firstname.lastname@example.org>, (c) 2009 Matthew Pocock <email@example.com>, (c) 2007-2009 Galois Inc., (c) 2010 Kevin Jardine|
|Author||Stephen Blackheath [blackh] (the primary author), Doug Beardsley, Gregory Collins, Evan Martin (who started the project), Matthew Pocock [drdozer], Kevin Jardine|
|Source repo||head: darcs get http://code.haskell.org/hexpat/|
|Uploaded||Thu Sep 2 13:52:54 UTC 2010 by StephenBlackheath|
|Distributions||FreeBSD:0.20.9, LTSHaskell:0.20.13, NixOS:0.20.13, Stackage:0.20.13, openSUSE:0.20.13|
|Downloads||35977 total (276 in the last 30 days)|
|Rating||(no votes yet) [estimated by rule of succession]|
|Status||Docs uploaded by user
Build status unknown [no reports yet]
Hackage Matrix CI
For package maintainers and hackage trustees