tokenizer-0.1.0.0: Check uniqueness and tokenize safely
Copyright(c) Lev Dvorkin 2022
LicenseMIT
Maintainerlev_135@mail.ru
StabilityExperimental
Safe HaskellNone
LanguageHaskell2010

Text.Tokenizer.Split

Description

This provides simple tokenizing algorithm

Synopsis

Documentation

data TokenizeMap k c Source #

Auxillary structure for tokenizing. Should be used as opaque type, initializing by makeTokenizeMap and concatenating by Semigroup instance.

Constructors

TokenizeMap 

Fields

Instances

Instances details
(Show c, Show k) => Show (TokenizeMap k c) Source # 
Instance details

Defined in Text.Tokenizer.Split

Methods

showsPrec :: Int -> TokenizeMap k c -> ShowS #

show :: TokenizeMap k c -> String #

showList :: [TokenizeMap k c] -> ShowS #

Ord c => Semigroup (TokenizeMap k c) Source # 
Instance details

Defined in Text.Tokenizer.Split

Methods

(<>) :: TokenizeMap k c -> TokenizeMap k c -> TokenizeMap k c #

sconcat :: NonEmpty (TokenizeMap k c) -> TokenizeMap k c #

stimes :: Integral b => b -> TokenizeMap k c -> TokenizeMap k c #

Ord c => Monoid (TokenizeMap k c) Source # 
Instance details

Defined in Text.Tokenizer.Split

Methods

mempty :: TokenizeMap k c #

mappend :: TokenizeMap k c -> TokenizeMap k c -> TokenizeMap k c #

mconcat :: [TokenizeMap k c] -> TokenizeMap k c #

singleTokMap :: Ord c => Token k c -> TokenizeMap k c Source #

Make a TokenizeMap with one element

insert :: Ord c => Token k c -> TokenizeMap k c -> TokenizeMap k c Source #

Insert Token into TokenizeMap

makeTokenizeMap :: Ord c => [Token k c] -> TokenizeMap k c Source #

Create auxillary Map for tokenizing. Should be called once for initializing

data TokenizeError k c Source #

Error during tokenizing

Everywhere [(k, [c])] type is used, the list of pairs with name of token and part of string, matched by it is stored

Constructors

NoWayTokenize 

Fields

  • Int

    Position of the first character that can not be tokenized

  • [(k, [c])]

    Part of string successfully tokenized (the longest of all attempts)

TwoWaysTokenize 

Fields

  • Int

    Length of uniquely tokenized prefix

  • [(k, [c])]

    First tokenize way

  • [(k, [c])]

    Second tokenize way

Instances

Instances details
(Eq k, Eq c) => Eq (TokenizeError k c) Source # 
Instance details

Defined in Text.Tokenizer.Split

Methods

(==) :: TokenizeError k c -> TokenizeError k c -> Bool #

(/=) :: TokenizeError k c -> TokenizeError k c -> Bool #

(Show k, Show c) => Show (TokenizeError k c) Source # 
Instance details

Defined in Text.Tokenizer.Split

tokenize :: forall k c. Ord c => TokenizeMap k c -> [c] -> Either (TokenizeError k c) [(k, [c])] Source #

Split list of symbols on tokens.