parcom-lib- A simple parser-combinator library, a bit like Parsec but without the frills

Safe HaskellSafe-Inferred




A parser-combinator library.

The primary goal in writing Parcom was to facilitate parsing Unicode string data from various source streams, including raw ByteStrings - while Attoparsec can parse ByteStrings, it sacrifices some convenience for performance, and using it to parse textual data is not as comfortable as I would like; Parsec can handle textual data much better, but it needs the input to be converetd to Unicode for this to work nicely. Nonetheless, Parcom's interface is quite obviously heavily inspired by both Parsec and Attoparsec.

Parcom supports String, ByteString (lazy and strict) and Text (lazy and strict) as its input format out-of-the-box. By implementing one or more of the typeclasses in Stream, you can extend Parcom to work on other input types as well.



Getting Started

Parcom being a parser combinator library, the usual approach is to use predefined atomic parsers (defined in Text.Parcom.Prim and re-exported here for convenience) and combine them using predefined combinators (defined in Text.Parcom.Combinators). Anyone with prior exposure to Parsec or Attoparsec should be familiar with the concept. Here's an example that parses a value, which can be a positive integer literal or NULL:

 myParser :: Parcom String Char (Maybe Int)
 myParser = intLiteral <|> nullLiteral <?> "value (integer or NULL)"

 intLiteral :: Parcom String Char (Maybe Int)
 intLiteral = do
      x <- oneOf ['1'..'9']
      xs <- many (oneOf ['0'..'9'])
      return $ Just $ read (x:xs)
 nullLiteral :: Parcom String Char (Maybe Int)
 nullLiteral = do
      tokens "NULL"
      notFollowedBy (satisfy (not . isSpace))
      return Nothing

Such a parser can then be run against some input using parse or parseT, the monadic equivalent.

 main = do
      src <- getContents
      let parseResult = parse myParser "<STDIN>" src
      case parseResult of
          Left err -> do
              putStrLn "Sorry, there has been an error, namely:"
              print err
          Right (Just i) -> putStrLn $ "Found an integer value: " ++ show i
          Right Nothing -> putStrLn "Found NULL"


As you build more complex parsers, you may encounter situations where a parser fails after having consumed some input already. Combining such a parser with other alternatives will yield undesired results: the parser fails, but it will not push the input it has already consumed back onto the input stream. To fix this, use the try primitive, which modifies a parser such that when it fails, it undoes any input consumption it may have caused.

Input types other than String

To support input from Text or ByteStrings, import one of the following modules:

Parsing textual data

Parcom provides two primitives for textual data: char and string. While textual data can be extracted from input streams that are textual already (e.g. String or Text) using the normal token-based primities (token, tokens, and prefix), doing so for bytestrings isn't trivial. The textual-data