polyparse
polyparse is a collection of parser combinator libraries in
Haskell. It is distributed as a package, but you are likely to use only
one of the included modules at any one time - they are generally
alternatives to each other, as well as an alternative to other
widely-used parser libraries available elsewhere.
- Text.ParserCombinators.HuttonMeijer The most venerable of all
monadic parser combinator libraries, this version dates from 1996.
Originally distributed with Gofer, then Hugs, as ParseLib. It uses
the idea of "failure as a list of successes" to give multiple
possible parses through backtracking. (But in practice, almost
nobody wants any parse except the first complete one.)
- Text.ParserCombinators.HuttonMeijerWallace The
Hutton/Meijer combinators, extended to take an arbitrary token type
as input (not just characters), plus a running state (e.g. to
collect a symbol table, or macros), plus some facilities for
simple error-reporting.
- Text.ParserCombinators.Poly
The name Poly comes from the arbitrary token type. Thus, you can
write your own lexer if you wish, rather than needing to encode
lexical analysis within the parser itself. This is a fresh
set of combinators, improving on the HuttonMeijer variety by
keeping only a single success, not a list of them. This is
more space-efficient, whilst still permitting backtracking.
Error-handling is also much improved: there are essentially two
kinds of failure, soft and hard. Soft failure just means that the
current parse did not work out, but another parse might be OK.
Hard failure means that no parse will succeed, because we have
already passed a point of commitment. Thus you can give far more
accurate error messages to the user, including multi-layered
locations.
- Text.ParserCombinators.PolyState is just like Poly, except it
adds an arbitrary running state parameter.
- Text.ParserCombinators.PolyLazy is just like Poly, except it
does not return explicit failures. Instead, an exception is raised.
Thus, it can return a partial parse result, before a full parse is
complete. The word partial indicates that, having committed to
return some outer data constructor, we might later discover some
parse error further inside, so the value will be partial, as in
incomplete: containing bottom. However, if you are confident that
the input is error-free, then you will gain hugely in
space-efficiency - essentially you can stream-process your parsed
data-structure within very small constant space. This is especially
useful for large structures like e.g. XML trees.
- Text.ParserCombinators.PolyStateLazy combines PolyState and
PolyLazy.
- Text.Parse The Text.Read class from Haskell'98 is widely
recognised to have many problems. It is inefficient. If a read
fails, there is no indication of why. Worst of all, a read failure
crashes your whole program! Text.Parse is a proposed replacement
for the Read class. It defines a new class, Parse, with methods
that return an explicit notification of errors, through the Either
type. It also defines a number of useful helper functions to
enable the construction of parsers for textual representations of
Haskell data structures, e.g. named fields. Unsurprisingly,
Text.Parse is really just a specialisation of the Poly combinators
for String input, and the entire Poly API is also re-exported.
The DrIFT
tool can derive instances of the Parse class for you
automatically. (Use the syntax {-! derive : Parse !-})
All the Poly* variations share the same basic API, so it is easy
to switch from one set to another, when you discover you need an extra
facility, just by changing a single import.
Detailed documentation of the polyparse APIs
is generated automatically by Haddock directly from the source code.
In general, you can just add an import of the relevant module to your
source code, and everything should just compile OK. However, if the
package is not 'exposed' (in ghc-pkg terminology), then you might need
to use a command-line option --package polyparse at compile
time.
I wrote some motivation for Text.Parse (including simple examples)
on my blog a while back. Here is the posting.
User-contributed documentation for polyparse is on the Haskell wiki at:
http://haskell.org/haskellwiki/Polyparse
Please edit the wiki if you discover any nice tricks!
Known problems:
- No problems currently known.
- Report bugs to Malcolm.Wallace@cs.york.ac.uk
- Even better, fix any bugs you find, and then darcs send a patch.
Development version:
darcs get
http://www.cs.york.ac.uk/fp/darcs/polyparse
Released version:
polyparse-1.0, release date 2007.01.26
By HTTP:
.tar.gz,
.zip.
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/polyparse/
To install polyparse, you must have a Haskell compiler: ghc-6.2
or later, and/or nhc98-1.16/hmake-3.06 or later, and/or
Hugs98 (Sept 2003) or later. For more recent compilers,
use the standard Cabal method of installation:
runhaskell Setup.hs configure [--prefix=...] [--buildwith=...]
runhaskell Setup.hs build
runhaskell Setup.hs install
For older compilers, use:
sh configure [--prefix=...] [--buildwith=...]
make
make install
to configure, build, and install polyparse as a package for your
compiler(s). If you don't use the --prefix option, you may need write
permission on the library installation directories of your compiler(s).
Afterwards, to gain access to the polyparse libraries, you only need to
add the option -package polyparse to your compiler commandline
(no option required for Hugs).
Version 1.00 is the first release of polyparse as a separate package.
It was previously part of the HaXml suite. HaXml continues to use
polyparse, but polyparse will be useful more widely. If you are looking
for examples of the usage of polyparse, the implementations of
Text.XML.HaXml.Parse and Text.XML.HaXml.XmlContent are two good places
to look.
Complete Changelog
We are interested in hearing your feedback on these parser combinators -
suggestions for improvements, comments, criticisms, bug reports. Please mail
Licence: The library is Free and Open Source Software,
i.e., the bits we wrote are copyright to us, but freely licensed
for your use, modification, and re-distribution, provided you don't
restrict anyone else's use of it. The polyparse library is distributed
under the GNU Lesser General Public Licence (LGPL) - see file
LICENCE-LGPL for more details. We allow one
special exception to the LGPL - see COPYRIGHT.
(If you don't like any of these licensing conditions, please contact us
to discuss your requirements.)
-
Parser combinators have a long history in Haskell. The first(?) monadic
combinator tutorial was introduced by Hutton and Meijer in 1994, and the
accompanying library was distributed with Gofer (a precursor to Hugs),
and known simply as ParseLib. That library lives on here as
Text.ParserCombinators.HuttonMeijer.
- Niklas Rojemo's combinators.
The parser combinators developed and used in the implementation of the
nhc98 compiler are monadic and space-efficient. However, they do not
use the standard do-notation, because in fact they are more general than
the standard monad category.
- Daan Leijen's parsec.
The parsec library is widely used, since it is distributed with ghc.
Its combinators are fairly robust, but you need to place explicit
backtracking into your parsers, using the try operator. This
can be tricky.
- Doaitse Swierstra's UU_Parse.
An all-singing, all-dancing parsing library. Deeply sophisticated.
- Koen Claessen's ReadP.
This is a different proposed replacement for the standard Haskell'98
Read class. It is a whole lot more efficient that Read, but because it
is also API-compatible with Read, that unfortunately means it suffers
from not giving good error messages. Also, it requires rank-2 types,
which is a non-Haskell'98 extension.