gigaparsec-0.3.0.0: Refreshed parsec-style library for compatibility with Scala parsley
LicenseBSD-3-Clause
MaintainerJamie Willis, Gigaparsec Maintainers
Stabilitystable
Safe HaskellSafe
LanguageHaskell2010

Text.Gigaparsec.Errors.TokenExtractors

Description

This module contains implementations of token extractors that can be used in the Text.Gigaparsec.Errors.ErrorBuilder to decide how to extract unexpected tokens from the residual input left over from a parse error.

These are common strategies, and something here is likely to be what is needed. They are all careful to handle unprintable characters and whitespace in a sensible way, and account for unicode codepoints that are wider than a single 16-bit character.

Since: 0.2.5.0

Synopsis

Documentation

data Token Source #

This type represents an extracted token returned by unexpectedToken in ErrorBuilder.

There is deliberately no analogue for EndOfInput because we guarantee that non-empty residual input is provided to token extraction.

Since: 0.2.5.0

Constructors

Raw

This is a token that is directly extracted from the residual input itself.

Fields

Named

This is a token that has been given a name, and is treated like a labelled item.

Fields

  • !String

    the description of the token.

  • !Word

    the amount of residual input this token ate.

type TokenExtractor Source #

Arguments

 = NonEmpty Char

the remaining input, cs, at point of failure.

-> Word

the input the parser tried to read when it failed (this is not guaranteed to be smaller than the length of cs, but is guaranteed to be greater than 0).

-> Bool

was this error generated as part of "lexing", or in a wider parser (see markAsToken).

-> Token

a token extracted from cs that will be used as part of the unexpected message.

Type alias for token extractors, matches the shape of unexpectedToken.

Since: 0.2.5.0

tillNextWhitespace Source #

Arguments

:: Bool

should the extractor cap the token to the amount of input the parser demanded?

-> (Char -> Bool)

what counts as a space character

-> TokenExtractor 

This extractor provides an implementation for unexpectedToken: it will construct a token that extends to the next available whitespace in the remaining input. It can be configured to constrict this token to the minimum of the next whitespace or whatever the parser demanded.

In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.

Since: 0.2.5.0

singleChar :: TokenExtractor Source #

This extractor provides an implementation for unexpectedToken: it will unconditionally report the first character in the remaining input as the problematic token.

In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.

Since: 0.2.5.0

matchParserDemand :: TokenExtractor Source #

This extractor provides an implementation for unexpectedToken: it will make a token as wide as the amount of input the parser tried to consume when it failed.

In the case of unprintable characters or whitespace, this extractor will favour reporting a more meaningful name.

Since: 0.2.5.0

lexToken Source #

Arguments

:: [Parsec String]

The tokens that should be recognised by this extractor: each parser should return the intended name of the token exactly as it should appear in the Named token.

This should include a whitespace parser for "unexpected whitespace". However, with the exception of the whitespace parser, these tokens should not consume trailing (and certainly not leading) whitespace: if using definitions from Text.Gigaparsec.Token.Lexer functionality, the nonlexeme versions of the tokens should be used.

-> TokenExtractor

If the parser failed during the parsing of a token, this function extracts the problematic item from the remaining input.

-> TokenExtractor 

This extractor provides an implementation for unexpectedToken: it will try and parse the residual input to identify a valid lexical token to report.

When parsing a grammar that as a dedicated lexical distinction, it is nice to be able to report problematic tokens relevant to that grammar as opposed to generic input lifted straight from the input stream. The easiest way of doing this would be having a pre-lexing pass and parsing based on tokens, but this is deliberately not how Parsley is designed. Instead, this extractor can try and parse the remaining input to try and identify a token on demand.

If the lexicalError flag of the unexpectedToken function is not set, which would indicate a problem within a token reported by a classical lexer and not the parser, the extractor will try to parse each of the provided tokens in turn: whichever is the longest matched of these tokens will be reported as the problematic one, where an earlier token arbitrates ties (lexTokenWithSelect can alter which is chosen). For best effect, these tokens should not consume whitespace (which would otherwise be included at the end of the token!): this means that, if using the Lexer, the functionality in nonlexeme should be used. If one of the givens tokens cannot be parsed, the input until the next valid parsable token (or end of input) is returned as a Raw.

If lexicalError is true, then the given token extractor will be used instead to extract a default token.

Since: 0.2.5.0

lexTokenWithSelect Source #

Arguments

:: (NonEmpty (String, Word) -> (String, Word))

If the extractor is successful in identifying tokens that can be parsed from the residual input, this function will select one of them to report back.

-> [Parsec String]

The tokens that should be recognised by this extractor: each parser should return the intended name of the token exactly as it should appear in the Named token.

This should include a whitespace parser for "unexpected whitespace". However, with the exception of the whitespace parser, these tokens should not consume trailing (and certainly not leading) whitespace: if using definitions from Text.Gigaparsec.Token.Lexer functionality, the nonlexeme versions of the tokens should be used.

-> TokenExtractor

If the parser failed during the parsing of a token, this function extracts the problematic item from the remaining input.

-> TokenExtractor 

This extractor provides an implementation for unexpectedToken: it will try and parse the residual input to identify a valid lexical token to report.

When parsing a grammar that as a dedicated lexical distinction, it is nice to be able to report problematic tokens relevant to that grammar as opposed to generic input lifted straight from the input stream. The easiest way of doing this would be having a pre-lexing pass and parsing based on tokens, but this is deliberately not how Parsley is designed. Instead, this extractor can try and parse the remaining input to try and identify a token on demand.

If the lexicalError flag of the unexpectedToken function is not set, which would indicate a problem within a token reported by a classical lexer and not the parser, the extractor will try to parse each of the provided tokens in turn: the given function is used to select which is returned. For best effect, these tokens should not consume whitespace (which would otherwise be included at the end of the token!): this means that, if using the Lexer, the functionality in nonlexeme should be used. If one of the givens tokens cannot be parsed, the input until the next valid parsable token (or end of input) is returned as a Raw.

If lexicalError is true, then the given token extractor will be used instead to extract a default token.

Since: 0.2.5.0