= Fixity fixups The point of this code is to be able to take a table of fixity information for unary and binary operators, then adjust an ast to match these fixities. The standard way of handling this is handling fixities at the parsing stage. For the SQL parser, this is difficult because there is lots of weird syntax for operators (such as prefix and postfix multiple keyword operators, between, etc.). An alterative idea which is used in some places is to parse the tree regarding all the operators to have the same precedence and left associativity, then correct the fixity in a pass over the ast after parsing. Would also like to use this to fix the fixity for the join trees, and set operations, after parsing them. TODO: anything else? Approach Really not sure how to get this correct. So: lots of testing Basic testing idea: create an expression, then write down manually how the expression should parse with correct fixity. Can write down the expression in concrete syntax, and the correct fixity version using parens. Then can parse the expression, fix it, parse the fixed expression, remove the parens and compare them to make sure they are equal. Second layer of testing. For each source expression parsed, run it through a generator which will generate every version of that tree by choosing all possibilities of fixities on a token by token basis. This will ensure the fixity fixer is robust. An alternative approach is to guarantee the parser will produce trees where all the fixities are known (e.g. unary operators always bind tighter than binary, binary are all left associative, prefix unary bind tighter than postfix. This way, the fix code can make some assumptions and have less code. We will stick with the full general version which is more robust. Another testing approach is to parse the tree with our non fixity respecting parser then fix it, and also parse it with a fixity respecting expression parser, and check the results are the same. This is difficult with the parsec build expression parser which doesn't handle nested unary operators, so have to find or write another build expression parser. We can test the fixer with simple operators (single symbol prefix, postfix and binary ops) and then use it on the complex sql ast trees. Can also try to generate trees ala quickcheck/smallcheck, then check them with the fixer and the build expression parser. generate a tree: start with a term then roll dice: add a prefix add a postfix do nothing then roll dice add a binary op for the second arg, recurse the algo algorithm: consider possible cases: binop with two binops args binop with prefix on left binop with postfix on right postfix with prefix inside prefix with postfix inside postfix with binop inside prefix with binop inside write a function to deal with each case and try to compose Tasks: write unary op tests: on each other, and with binary ops figure out how to generate trees do the step one tests (write the fixity with parens) check out parsers expression parser see if can generate trees using smallcheck try to test these trees against expression parser otherwise, generate tree, generate variations, check fixity always produces same result todo: 1. more tests for unary operators with each other 2. moving unary operators inside and outside binary operators: have to think about how this will work in general case 3. ways to generate lots of tests and check them -> what about creating a parser which parses to a list of all possible parses with different fixities for each operator it sees? 4. ambiguous fixity cases - need position annotation to do these nicely 5. real sql: how to work with a variety of ast nodes 6. plug into simple-sql-parser 7. refactor the simple-sql-parser parsing code 8. simple-sql-parser todo for sqream: add other dml, dialects, procedural? 9. testing idea: write big expressions with explicit parens everywhere parse this remove the parens pretty print, then parse and fixfixity to see if same then generate all variations of tree as if the fixities are different and then fixfixity to check it restores the original write fixity tests write code to do the fixing add error cases: put it in the either monad to report these check the descend then: move to real sql different abstract representations of binops, etc. what is the best way to deal with this? typeclass? conversion to and from a generic tree? can the binops be fixed on their own (precedence and assocativity) and then the prefix and postfix ops in separate passes what about a pass which puts the tree into canonical form: all left associative, all unary ops tight as possible? then the fixer can be easier? > {-# LANGUAGE DeriveDataTypeable,TupleSections #-} > import Data.Data > import Text.Parsec.String (Parser) > import Text.Parsec (try) > import Text.Parsec.Char > import Text.Parsec.Combinator > import Text.Parsec (parse,ParseError) > import Control.Applicative ((<|>),many) -- ((<**>),(<$>),(<*), (*>),(<*>), (<$), (<|>), many) > --import qualified Text.Parsec.String.Expr as E > import Control.Monad > --import Data.List (intercalate) > import Data.Maybe () > --import qualified Test.HUnit as H > --import FunctionsAndTypesForParsing > import Debug.Trace > import Text.Show.Pretty > import Data.List > import Control.Applicative > import qualified Test.Tasty as T > import qualified Test.Tasty.HUnit as H > data Expr = BinOp Expr String Expr > | PrefOp String Expr > | PostOp String Expr > | Iden String > | Lit String > | App String [Expr] > | Parens Expr > deriving (Eq,Show,Data,Typeable) -------- quick parser > parensValue :: Parser Expr > parensValue = Parens <$> parens valueExpr > idenApp :: Parser Expr > idenApp = try $ do > i <- identifier > guard (i `notElem` ["not", "and", "or", "is"]) > choice [do > args <- parens (commaSep valueExpr) > return $ App i args > ,return $ Iden i > ] > lit :: Parser Expr > lit = stringLit <|> numLit > where > stringLit = Lit <$> lexeme (char '\'' *> manyTill anyChar (char '\'')) > numLit = do > x <- lexeme (many1 digit) > let y :: Integer > y = read x > return $ Lit $ show y > prefOp :: Parser Expr > prefOp = sym <|> kw > where > sym = do > let prefOps = ["+", "-"] > s <- choice $ map symbol prefOps > v <- term > return $ PrefOp s v > kw = do > let prefOps = ["not"] > i <- identifier > guard (i `elem` prefOps) > v <- term > return $ PrefOp i v > postOp :: Parser (Expr -> Expr) > postOp = try $ do > let kws = ["is null"] > kwsp = map (\a -> try $ do > let x :: [String] > x = words a > mapM_ keyword_ x > return $ PostOp a > ) kws > choice kwsp > binOp :: Parser (Expr -> Expr -> Expr) > binOp = symbolBinOp <|> kwBinOp > where > symbolBinOp = do > let binOps = ["+", "-", "*", "/"] > s <- choice $ map symbol binOps > return $ \a b -> BinOp a s b > kwBinOp = do > let kwBinOps = ["and", "or"] > i <- identifier > guard (i `elem` kwBinOps) > return $ \a b -> BinOp a i b > term :: Parser Expr > term = (parensValue > <|> try prefOp > <|> idenApp > <|> lit) > postOp > -- () :: Parser a -> Parser (a -> a) -> Parser a > -- p q = p <**> option id q > () :: Parser a -> Parser (a -> a) -> Parser a > p q = foldr ($) <$> p <*> (reverse <$> many q) > valueExpr :: Parser Expr > valueExpr = chainl1 term binOp > parens :: Parser a -> Parser a > parens = between openParen closeParen > openParen :: Parser Char > openParen = lexeme $ char '(' > closeParen :: Parser Char > closeParen = lexeme $ char ')' > symbol :: String -> Parser String > symbol s = try $ lexeme $ do > u <- many1 (oneOf "<>=+-^%/*!|") > guard (s == u) > return s > identifier :: Parser String > identifier = lexeme ((:) <$> firstChar <*> many nonFirstChar) > where > firstChar = letter <|> char '_' > nonFirstChar = digit <|> firstChar > keyword :: String -> Parser String > keyword k = try $ do > i <- identifier > guard (i == k) > return k > keyword_ :: String -> Parser () > keyword_ = void . keyword > whitespace :: Parser () > whitespace = > choice [simpleWhitespace *> whitespace > ,lineComment *> whitespace > ,blockComment *> whitespace > ,return ()] > where > lineComment = try (string "--") > *> manyTill anyChar (void (char '\n') <|> eof) > blockComment = try (string "/*") > *> manyTill anyChar (try $ string "*/") > simpleWhitespace = void $ many1 (oneOf " \t\n") > lexeme :: Parser a -> Parser a > lexeme p = p <* whitespace > comma :: Parser Char > comma = lexeme $ char ',' > commaSep :: Parser a -> Parser [a] > commaSep = (`sepBy` comma) > parseExpr :: String -> Either ParseError Expr > parseExpr = parse (whitespace *> valueExpr <* eof) "" -------------- > data Assoc = AssocLeft | AssocRight | AssocNone deriving (Eq,Show) > type Fixities = [(String, (Int, Assoc))] > fixFixity :: Fixities -> Expr -> Expr > fixFixity fixities = fixBinOpPrecedence . fixBinOpAssociativity . fixNestedPrefPostPrec > where > fixBinOpAssociativity e = case e of > BinOp a op b -> > let a' = fixBinOpAssociativity a > b' = fixBinOpAssociativity b > def = BinOp a' op b' > in case (a',b') of > -- both > -- a1 op1 a2 op b1 op2 b2 > (BinOp a1 op1 a2 > ,BinOp b1 op2 b2) > | Just (_p,opa) <- lookupFixity op > , Just (_p,op1a) <- lookupFixity op1 > , Just (_p,op2a) <- lookupFixity op2 > -> case (opa, op1a, op2a) of > (AssocRight, AssocRight, AssocRight) -> > BinOp a1 op1 (BinOp a2 op (BinOp b1 op2 b2)) > (AssocLeft, AssocLeft, AssocLeft) -> > BinOp (BinOp (BinOp a1 op1 a2) op b1) op2 b2 > --todo: other cases > _ -> def > -- just left side > (BinOp a1 op1 a2, _) > -- a1 op1 a2 op b' > | Just (_p,opa) <- lookupFixity op > , Just (_p,op1a) <- lookupFixity op1 > -> case (opa, op1a) of > (AssocRight, AssocRight) -> > BinOp a1 op1 (BinOp a2 op b') > (AssocLeft, AssocLeft) -> > BinOp (BinOp a1 op1 a2) op b' > _ -> def > -- just right side > (_, BinOp b1 op2 b2) > -- e op b1 op2 b2 > | Just (_p,opa) <- lookupFixity op > , Just (_p,op2a) <- lookupFixity op2 > -> case (opa, op2a) of > (AssocRight, AssocRight) -> > BinOp a' op (BinOp b1 op2 b2) > (AssocLeft, AssocLeft) -> > BinOp (BinOp a' op b1) op2 b2 > _ -> def > _ -> def > _ -> e > fixBinOpPrecedence e = case e of > BinOp a op b -> > let a' = fixBinOpPrecedence a > b' = fixBinOpPrecedence b > def = BinOp a' op b' > in case (a',b') of > -- both > -- a1 op1 a2 op b1 op2 b2 > -- all equal > -- p > or < p1 == p2 > -- p == p1 < or > p2 > (BinOp a1 op1 a2 > ,BinOp b1 op2 b2) > | Just (p,_opa) <- lookupFixity op > , Just (p1,_op1a) <- lookupFixity op1 > , Just (p2,_op2a) <- lookupFixity op2 > -> case () of > -- _ | trace ("both prec " ++ show (p,p1,p2)) False -> undefined > _ | p == p1 && p1 == p2 -> def > _ | p > p1 && p1 == p2 -> BinOp a1 op1 b' > _ | p < p1 && p1 == p2 -> BinOp (BinOp a1 op1 a2) op b' > _ | p == p1 && p2 > p1 -> BinOp a' op (BinOp b1 op2 b2) > _ | p == p1 && p2 < p1 -> def -- todo > _ | otherwise -> def > -- just left side > (BinOp a1 op1 a2, _) > -- a1 op1 a2 op b' > | Just (p,_opa) <- lookupFixity op > , Just (p1,_op1a) <- lookupFixity op1 > -> case () of > -- _ | trace ("left prec " ++ show (p,p1)) False -> undefined > _ | p < p1 -> {-trace "b1" $ -}BinOp (BinOp a1 op1 a2) op b' > | p > p1 -> {-trace "b2" $ -}BinOp a1 op1 (BinOp a2 op b') > | otherwise -> def > -- just right side > (_, BinOp b1 op2 b2) > -- a' op b1 op2 b2 > | Just (p,_opa) <- lookupFixity op > , Just (p2,_op1a) <- lookupFixity op2 > -> case () of > -- _ | trace ("right prec " ++ show (p,p2)) False -> undefined > _ | p > p2 -> {-trace "b1" $ -}BinOp (BinOp a' op b1) op2 b2 > | p < p2 -> {-trace "b2" $ -}BinOp a' op (BinOp b1 op2 b2) > | otherwise -> {-trace "def" $ -} def > _ -> def > _ -> e > fixNestedPrefPostPrec e = case e of > PrefOp op a -> > let a' = fixNestedPrefPostPrec a > in case a' of > PostOp op1 b | Just (p,_) <- lookupFixity op > , Just (p1,_) <- lookupFixity op1 > , p > p1 -> PostOp op1 (PrefOp op b) > _ -> PrefOp op a' > PostOp op a -> > let a' = fixNestedPrefPostPrec a > in case a' of > PrefOp op1 b | Just (p,_) <- lookupFixity op > , Just (p1,_) <- lookupFixity op1 > , p > p1 -> PrefOp op1 (PostOp op b) > _ -> PostOp op a' > _ -> e > lookupFixity :: String -> Maybe (Int,Assoc) > lookupFixity s = maybe (trace ("didn't find " ++ s ++ "\n" ++ ppShow fixities) Nothing) > Just $ lookup s fixities > sqlFixity :: [(String, (Int, Assoc))] > sqlFixity = [(".", (13, AssocLeft)) > ,("[]", (12, AssocNone)) unary + - todo: split the fixity table into prefix, binary and postfix todo: don't have explicit precedence numbers in the table?? > ,("^", (10, AssocNone))] > ++ m ["*", "/", "%"] (9, AssocLeft) > ++ m ["+","-"] (8, AssocLeft) > ++ m ["<", ">", "=", "<=", ">=", "<>"] (4, AssocNone) > ++ [("is null", (3, AssocNone)) > ,("not", (2, AssocRight)) > ,("and", (1, AssocLeft)) > ,("or", (0, AssocLeft))] > where > m l a = map (,a) l ------- some simple parser tests > data Test = Group String [Test] > | ParserTest String Expr > | FixityTest Fixities Expr Expr > parserTests :: Test > parserTests = Group "parserTests" $ map (uncurry ParserTest) $ > [("a", Iden "a") > ,("'test'", Lit "test") > ,("34", Lit "34") > ,("f()", App "f" []) > ,("f(3)", App "f" [Lit "3"]) > ,("(7)", Parens (Lit "7")) > ,("a + 3", BinOp (Iden "a") "+" (Lit "3")) > ,("1 + 2 + 3", BinOp (BinOp (Lit "1") "+" (Lit "2")) "+" (Lit "3")) > ,("a or b", BinOp (Iden "a") "or" (Iden "b")) > ,("-1", PrefOp "-" (Lit "1")) > ,("not a", PrefOp "not" (Iden "a")) > ,("not not a", PrefOp "not" (PrefOp "not" (Iden "a"))) > ,("a is null", PostOp "is null" (Iden "a")) > ,("a is null is null", PostOp "is null" (PostOp "is null" (Iden "a"))) > ,("-a+3", BinOp (PrefOp "-" (Iden "a")) "+" (Lit "3")) > ,("a is null and b is null", BinOp (PostOp "is null" (Iden "a")) > "and" > (PostOp "is null" (Iden "b"))) > ] > makeParserTest :: String -> Expr -> T.TestTree > makeParserTest s e = H.testCase s $ do > let a = parseExpr s > if (Right e == a) > then putStrLn $ s ++ " OK" > else putStrLn $ "bad parse " ++ s ++ " " ++ show a ------ fixity checks test cases: > fixityTests :: Test > fixityTests = Group "fixityTests" $ > map (\(f,s,e) -> FixityTest f s e) $ > [ 2 bin ops wrong associativity left + null versions > (sqlFixity > ,i "a" `plus` (i "b" `plus` i "c") > ,(i "a" `plus` i "b") `plus` i "c") > ,(sqlFixity > ,(i "a" `plus` i "b") `plus` i "c" > ,(i "a" `plus` i "b") `plus` i "c") 2 bin ops wrong associativity right > ,(timesRight > ,i "a" `times` (i "b" `times` i "c") > ,i "a" `times` (i "b" `times` i "c")) > ,(timesRight > ,(i "a" `times` i "b") `times` i "c" > ,i "a" `times` (i "b" `times` i "c")) 2 bin ops wrong precedence left > ,(sqlFixity > ,i "a" `plus` (i "b" `times` i "c") > ,i "a" `plus` (i "b" `times` i "c")) > ,(sqlFixity > ,(i "a" `plus` i "b") `times` i "c" > ,i "a" `plus` (i "b" `times` i "c")) 2 bin ops wrong precedence right > ,(sqlFixity > ,(i "a" `times` i "b") `plus` i "c" > ,(i "a" `times` i "b") `plus` i "c") > ,(sqlFixity > ,i "a" `times` (i "b" `plus` i "c") > ,(i "a" `times` i "b") `plus` i "c") a + b * c + d a * b + c * d check all variations > ] ++ > (let t = (i "a" `plus` i "b") > `times` > (i "c" `plus` i "d") > trs = generateTrees $ splitTree t > in [(sqlFixity, x > ,i "a" `plus` (i "b" `times` i "c") > `plus` i "d") > | x <- trs]) > ++ > (let t = (i "a" `times` i "b") > `plus` > (i "c" `times` i "d") > trs = generateTrees $ splitTree t > in [(sqlFixity, x > ,(i "a" `times` i "b") > `plus` > (i "c" `times` i "d")) > | x <- trs]) > ++ [ prefix then postfix wrong precedence > ([("+", (9, AssocNone)) > ,("is null", (3, AssocNone))] > ,PrefOp "+" (PostOp "is null" (i "a")) > ,PostOp "is null" (PrefOp "+" (i "a"))) > ,([("+", (9, AssocNone)) > ,("is null", (3, AssocNone))] > ,PostOp "is null" (PrefOp "+" (i "a")) > ,PostOp "is null" (PrefOp "+" (i "a"))) > ,([("+", (3, AssocNone)) > ,("is null", (9, AssocNone))] > ,PrefOp "+" (PostOp "is null" (i "a")) > ,PrefOp "+" (PostOp "is null" (i "a"))) > ,([("+", (3, AssocNone)) > ,("is null", (9, AssocNone))] > ,PostOp "is null" (PrefOp "+" (i "a")) > ,PrefOp "+" (PostOp "is null" (i "a"))) 3-way unary operator movement: take a starting point and generate variations postfix on first arg of binop (cannot move) make sure precedence wants it to move prefix on second arg of binop (cannot move) prefix on binop, precedence wrong postfix on binop precedence wrong prefix on first arg of binop, precedence wrong postfix on second arg of binop, precedence wrong ambiguous fixity tests sanity check: parens stops rearrangement check nesting 1 + f(expr) > ] > where > plus a b = BinOp a "+" b > times a b = BinOp a "*" b > i a = Iden a > timesRight = [("*", (9, AssocRight))] testCase > makeFixityTest :: Fixities -> Expr -> Expr -> T.TestTree > makeFixityTest fs s e = H.testCase (show s) $ do > let s' = fixFixity fs s > H.assertEqual "" s' e > {-if (s' == e) > then putStrLn $ show s ++ " OK" > else putStrLn $ "ERROR\nstart: " ++ show s ++ "\nfixed: " ++ show s' ++ "\nshould be: " ++ show e-} > tests :: Test > tests = Group "Tests" [parserTests, fixityTests] > makeTest :: Test -> T.TestTree > makeTest (Group n ts) = T.testGroup n $ map makeTest ts > makeTest (ParserTest s e) = makeParserTest s e > makeTest (FixityTest f s e) = makeFixityTest f s e -------- > tests :: T.TestTree > tests = T.testGroup "Tests" $ map makeFixityTest fixityTests > main :: IO () > main = T.defaultMain $ makeTest tests > {-do > mapM_ checkTest tests > mapM_ checkFixity fixityTests > let plus a b = BinOp a "+" b > times a b = BinOp a "*" b > i a = Iden a > let t = (i "a" `plus` i "b") > `times` > (i "c" `plus` i "d") > spl = splitTree t > trs = generateTrees spl > --putStrLn $ "\nSplit\n" > --putStrLn $ ppShow (fst spl, length $ snd spl) > --putStrLn $ show $ length trs > --putStrLn $ "\nTrees\n" > --putStrLn $ intercalate "\n" $ map show trs > return ()-} generating trees 1. tree -> list val op val op val op ... (has to be two lists? generate variations: pick numbers from 0 to n - 1 (n is the number of ops) choose the op at this position to be the root recurse on the two sides > splitTree :: Expr -> ([Expr], [Expr->Expr->Expr]) > splitTree (BinOp a op b) = let (x,y) = splitTree a > (z,w) = splitTree b > in (x++z, y++ [\a b -> BinOp a op b] ++ w) > splitTree x = ([x],[]) > generateTrees :: ([Expr], [Expr->Expr->Expr]) -> [Expr] > generateTrees (es,ops) | length es /= length ops + 1 = > error $ "mismatch in lengths " ++ show (length es, length ops) > ++"\n" ++ ppShow es ++ "\n" > generateTrees ([a,b], [op]) = [op a b] > generateTrees ([a], []) = [a] > generateTrees (vs, ops) = > let n = length ops > in --trace ("generating " ++ show (length vs, n) ++ "trees\n") $ > concat $ flip map [0..n-1] $ \m -> > let (v1,v2) = splitAt (m + 1) vs > (ops1,op':ops2) = splitAt m ops > r = [op' t u | t <- generateTrees (v1,ops1) > , u <- generateTrees (v2,ops2)] > in -- trace ("generated " ++ show (length r) ++ " trees") > r > generateTrees ([],[]) = []