hlex-1.0.0: Simple Lexer Creation
Copyright(c) Sebastian Tee 2023
LicenseMIT
Safe HaskellSafe-Inferred
LanguageHaskell2010

Hlex

Description

Tools needed to create a Lexer from a lexical Grammar.

Synopsis

Example

Here is an example module for a simple language.

  module ExampleLang
       ( MyToken(..) -- Export the language's tokens and the lexer
       , myLexer
       ) where

  import Hlex

  data MyToken = Ident String -- String identifier token
               | Number Float -- Number token and numeric value
               | Assign       -- Assignment operator token
               deriving(Show)

  myGrammar :: Grammar MyToken
  myGrammar = [ Error ""[^"]*n" "Can't have a new line in a string"        -- Return Exception when a new line occurs in a string
              , Tokenize ""[^"]*"" $ Str . init . tail                     -- Encode string and strip the containing quotes
              , JustToken "=" Assign                                       -- "=" Operator becomes the assign token
              , Tokenize "[a-zA-Z]+" (match -> Ident match)                -- Identifier token with string
              , Tokenize "[0-9]+(\.[0-9]+)?" (match -> Number (read match) -- Number token with the parsed numeric value stored as a Float
              , Skip "[ \n\r\t]+"                                          -- Skip whitespace
              ]

  myLexer :: Lexer MyToken
  myLexer = hlex myGrammar -- hlex turns a Grammar into a Lexer

Here is the lexer being used on a simple program.

>>> lexer "x = 1.2"
Right [Ident "x", Assign, Number 1.2]

Here is the lexer being used on an program with a syntax error.

>>> lexer "x = \"a\nb\""
Left (MatchedException 1 5 "\"a\n" "Can't have a new line in a string")

The lexer uses Either. Right means the lexer successfully parsed the program to a list of MyTokens. If Left was returned it would be a LexException.

Types

type Grammar token = [GrammarRule token] Source #

Lexical grammar made up of GrammarRules.

The order is important. The Lexer will apply each GrammarRule rule in the order listed.

data GrammarRule token Source #

These are the individual rules that make up a Grammar.

Takes a POSIX regular expression then converts it to a token or skips it.

Constructors

Skip

Skips over any matches.

Fields

Tokenize

Takes a function that converts the matched string to a token.

Fields

  • String

    Regular expression.

  • (String -> token)

    Function that converts the matched string into a token.

JustToken

Converts any regular expression matches to a given token.

Fields

  • String

    Regular expression.

  • token

    Given token.

Error

Returns an error with a message when a match occurs.

Fields

type Lexer token = String -> Either LexException [token] Source #

Converts a string into a list of tokens. If the string does not follow the Lexer's Grammar a LexException will be returned.

Exceptions

data LexException Source #

Exception thrown when a Lexer encounters an error when lexxing a string.

Constructors

UnmatchedException

Exception thrown when a substring cannot be matched.

Fields

  • Int

    The line number where the substring that couldn't be lexed is located.

  • Int

    The column where the substring that couldn't be lexed is located.

  • String

    The subtring that couldn't be lexed.

MatchedException

Exception thrown when a macth is found on the Error GrammarRule.

Fields

  • Int

    The line number where the matched string is located.

  • Int

    The column where the matched string is located.

  • String

    The matched string.

  • String

    Error message.

Functions

hlex :: Grammar token -> Lexer token Source #

Takes a given Grammar and turns it into a Lexer.