Extensible Haskell front-end for the Programatica project
What is in this directory (base/parse2)
This directory contains a new version of the base language lexer and
parser. There are two main differences from the old version.
Changes to the parser
The new parser produces a variant av the abstract syntax where every
identifier is paired with its position in the source file. This
variant of the syntax is defined in module PosSyntax
. It
reuses the names of all types from the plain abstract syntax, to
minimize the number of changes required in the grammar file. The type
used for identifiers with source position is called SN
and defined in module SourceNames
.
(See also the slides from the talk
A Lexer for Haskell
in Haskell).
Instead of being handwritten in Haskell, the new lexer is generated from a
lexical syntax specification. The advantages with this approach are that
- it allows the lexer to be implemented in a modular way, closely
resembling the specification in the Haskell report, so it should be
much easier to verify that the implementation agrees with what the
Haskell report specifies. (The old lexer was buggy.)
- it should be much easier to adapt the implementation to changes of
the Haskell report (and there has been subtle changes in the lexical
syntax in every new version of the Haskell report, I believe).
(The old lexer was outdated.)
- it is still as efficient as the old handwritten, monolithic lexer.
The specification is expressed in Haskell, using simple regular
expression combinators, and then compiled to a
DFA
using standard text book algorithms. The regular expression compiler is
implemented in Haskell, and the DFAs it generates are output in the form
of Haskell source code.
Two pieces of handwritten code accompany the automatically generated
code:
- A small function to recognize nested comments, since these can't be
described using regular expressions.
- Functions to implement Haskell's layout convention.
The structure of the implementation of these closely follow the
specifications in the Haskell report
(appendix B.3).
Subdirectories
Lexer
- Handwritten and automatically generated code for the lexer.
LexerGen
- The regular expression compiler.
LexerSpec
- The lexical syntax specification for Haskell 98 (based on appendix B.2).
Parser
- The Happy parser for the context free grammar (based on appendix B.4).