inchworm: Simple parser combinators for lexical analysis.

[ library, mit, parsing ] [ Propose Tags ]

Parser combinator framework specialized to lexical analysis. Tokens are specified via simple fold functions, and we include baked in source location handling. Comes with matchers for standard lexemes like integers, comments, and Haskell style strings with escape handling. No dependencies other than the Haskell base library. If you want to parse expressions instead of tokens then try try the parsec or attoparsec packages, which have more general purpose combinators.


[Skip to Readme]
Versions [faq] 1.0.0.1, 1.0.1.1, 1.0.2.1, 1.0.2.2, 1.0.2.3, 1.0.2.4, 1.1.1.1, 1.1.1.2
Change log Changelog.md
Dependencies base (>=4.8 && <4.13) [details]
License MIT
Author The Inchworm Development Team
Maintainer Ben Lippmeier <benl@ouroborus.net>
Category Parsing
Home page https://github.com/discus-lang/inchworm
Source repo head: git clone https://github.com/discus-lang/inchworm.git
Uploaded by BenLippmeier at Wed Jan 2 02:15:59 UTC 2019
Distributions NixOS:1.1.1.2
Downloads 2172 total (188 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2019-01-02 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for inchworm-1.1.1.1

[back to package description]

Inchworm

Inchworm is a simple parser combinator framework specialized to lexical analysis. Tokens are specified via simple fold functions, and we include baked in source location handling.

If you want to parse expressions instead of performing lexical analysis then try the parsec or attoparsec packages, which have more general purpose combinators.

Matchers for standard tokens like comments and strings are in the Text.Lexer.Inchworm.Char module.

No dependencies other than the Haskell base library.

Minimal example

The following code demonstrates how to perform lexical analysis of a simple LISP-like language. We use two separate name classes, one for variables that start with a lower-case letter, and one for constructors that start with an upper case letter.

Integers are scanned using the scanInteger function from the Text.Lexer.Inchworm.Char module.

The result of scanStringIO contains the list of leftover input characters that could not be parsed. In a real lexer you should check that this is empty to ensure there has not been a lexical error.

import Text.Lexer.Inchworm.Char
import qualified Data.Char as Char

-- | A source token.
data Token 
        = KBra | KKet | KVar String | KCon String | KInt Integer
        deriving Show

-- | A thing with attached location information.
data Located a
        = Located FilePath (Range Location) a
        deriving Show

-- | Scanner for a lispy language.
scanner :: FilePath
        -> Scanner IO Location [Char] (Located Token)
scanner fileName
 = skip Char.isSpace
 $ alts [ fmap (stamp id)   $ accept '(' KBra
        , fmap (stamp id)   $ accept ')' KKet
        , fmap (stamp KInt) $ scanInteger 
        , fmap (stamp KVar)
          $ munchWord (\ix c -> if ix == 0 then Char.isLower c
                                           else Char.isAlpha c) 
        , fmap (stamp KCon) 
          $ munchWord (\ix c -> if ix == 0 then Char.isUpper c
                                           else Char.isAlpha c)
        ]
 where  -- Stamp a token with source location information.
        stamp k (range, t) 
          = Located fileName range (k t)

main :: IO ()
main 
 = do   let fileName = "Source.lispy"
        let source   = "(some (Lispy like) 26 Program 93 (for you))"
        toks    <- scanStringIO source (scanner fileName)
        print toks