Copyright	(c) Sebastian Tee 2023
License	MIT
Safe Haskell	Safe-Inferred
Language	Haskell2010

Hlex

Contents

Example
Types
- Exceptions
Functions

Description

Tools needed to create a Lexer from a lexical Grammar.

Synopsis

type Grammar token = [GrammarRule token]
data GrammarRule token
- = Skip String
- | Tokenize String (String -> token)
- | JustToken String token
- | Error String String
type Lexer token = String -> Either LexException [token]
data LexException
- = UnmatchedException Int Int String
- | MatchedException Int Int String String
hlex :: Grammar token -> Lexer token

Example

Here is an example module for a simple language.

  module ExampleLang
       ( MyToken(..) -- Export the language's tokens and the lexer
       , myLexer
       ) where

  import Hlex

  data MyToken = Ident String -- String identifier token
               | Number Float -- Number token and numeric value
               | Assign       -- Assignment operator token
               deriving(Show)

  myGrammar :: Grammar MyToken
  myGrammar = [ Error ""[^"]*n" "Can't have a new line in a string"        -- Return Exception when a new line occurs in a string
              , Tokenize ""[^"]*"" $ Str . init . tail                     -- Encode string and strip the containing quotes
              , JustToken "=" Assign                                       -- "=" Operator becomes the assign token
              , Tokenize "[a-zA-Z]+" (match -> Ident match)                -- Identifier token with string
              , Tokenize "[0-9]+(\.[0-9]+)?" (match -> Number (read match) -- Number token with the parsed numeric value stored as a Float
              , Skip "[ \n\r\t]+"                                          -- Skip whitespace
              ]

  myLexer :: Lexer MyToken
  myLexer = hlex myGrammar -- hlex turns a Grammar into a Lexer

Here is the lexer being used on a simple program.

>>> lexer "x = 1.2"
Right [Ident "x", Assign, Number 1.2]

Here is the lexer being used on an program with a syntax error.

>>> lexer "x = \"a\nb\""
Left (MatchedException 1 5 "\"a\n" "Can't have a new line in a string")

The lexer uses Either. Right means the lexer successfully parsed the program to a list of MyTokens. If Left was returned it would be a LexException.

Types

type Grammar token = [GrammarRule token] Source #

Lexical grammar made up of GrammarRules.

The order is important. The Lexer will apply each GrammarRule rule in the order listed.

data GrammarRule token Source #

These are the individual rules that make up a Grammar.

Takes a POSIX regular expression then converts it to a token or skips it.

Constructors

Skip	Skips over any matches.
Fields String Regular expression.
Tokenize	Takes a function that converts the matched string to a token.
Fields String Regular expression. (String -> token) Function that converts the matched string into a token.
JustToken	Converts any regular expression matches to a given token.
Fields String Regular expression. token Given token.
Error	Returns an error with a message when a match occurs.
Fields String Regular expression. String Error message.

type Lexer token = String -> Either LexException [token] Source #

Converts a string into a list of tokens. If the string does not follow the Lexer's Grammar a LexException will be returned.

Exceptions

data LexException Source #

Exception thrown when a Lexer encounters an error when lexxing a string.

Constructors

UnmatchedException	Exception thrown when a substring cannot be matched.
Fields Int The line number where the substring that couldn't be lexed is located. Int The column where the substring that couldn't be lexed is located. String The subtring that couldn't be lexed.
MatchedException	Exception thrown when a macth is found on the `Error` `GrammarRule`.
Fields Int The line number where the matched string is located. Int The column where the matched string is located. String The matched string. String Error message.

Instances

Instances details

Read LexException Source #
Instance details Defined in Hlex Methods readsPrec :: Int -> ReadS LexException Source # readList :: ReadS [LexException] Source # readPrec :: ReadPrec LexException Source # readListPrec :: ReadPrec [LexException] Source #
Show LexException Source #
Instance details Defined in Hlex Methods showsPrec :: Int -> LexException -> ShowS Source # show :: LexException -> String Source # showList :: [LexException] -> ShowS Source #
Eq LexException Source #
Instance details Defined in Hlex Methods (==) :: LexException -> LexException -> Bool Source # (/=) :: LexException -> LexException -> Bool Source #

Functions

hlex :: Grammar token -> Lexer token Source #

Takes a given Grammar and turns it into a Lexer.