{- Copyright 2009 Jake Wheat This file contains the attr and sem definitions, which do the type checking, etc.. A lot of the haskell code has been moved into other files: TypeCheckingH, AstUtils.lhs, it is intended that only small amounts of code appear (i.e. one-liners) inline in this file, and larger bits go in AstUtils.lhs. These are only divided because the attribute grammar system uses a custom syntax with a custom preprocessor. These guidelines aren't followed very well. The current type checking approach doesn't quite match how SQL works. The main problem is that you can e.g. exec create table statements inside a function. This is something that the type checker will probably not be able to deal for a while if ever. (Will need hooks into postgresql to do this properly, which might not be impossible...). Once most of the type checking is working, all the code and documentation will be overhauled quite a lot. Alternatively put, this code is in need of better documentation and organisation, and serious refactoring. An unholy mixture of Maybe, Either, Either/Error Monad and TypeCheckFailed is used to do error handling. One of the common patterns is when type checking a node: first check any types of subnodes which it depends on, if any are typecheckfailed, then this node's type is typecheckfailed and we stop there. otherwise, calculate the type of this node, or get an error if there is a problem, this is put into loc.tpe which is Either TypeError Type. Then some common code takes this value and sets the node type to the type or typecheckfailed if tpe is left, and if it is left add a type error annotation also. ================================================================================ = main attributes used Here are the main attributes used in the type checking: env is used to chain the environments up and down the tree, to allow access to the catalog information, and to store the in environment identifier names and types e.g. inside a select expression. annotatedTree is used to create a copy of the ast with the type, etc. annotations. -} ATTR AllNodes Root ExpressionRoot [ env : Environment lib : LocalIdentifierBindings || annotatedTree : SELF -- workaround: avoid bogus cycle -- detections search for this, -- replace with annotated tree -- and uuagc can't tell that -- there aren't any real cycles originalTree : SELF] INCLUDE "TypeChecking/Misc.ag" INCLUDE "TypeChecking/Expressions.ag" INCLUDE "TypeChecking/Statements.ag" {- old stuff: ================================================================================ = static tests Try to use a list of message data types to hold all sorts of information which works its way out to the top level where the client code gets it. Want to have the lists concatenated together automatically from subnodes to parent node, and then to be able to add extra messages to this list at each node also. Problem 1: can't have two sem statements for the same node type which both add messages, and then the messages get combined to provide the final message list attribute value for that node. You want this so that e.g. that different sorts of checks appear in different sections. Workaround is instead of having each check in it's own section, to combine them all into one SEM. Can work around this without too much difficulty by using local attributes. Not sure if something more clever would be an improvement. Problem 2: no shorthand to combine what the default rule for messages would be and then add a bit extra - so if you want all the children messages, plus possibly an extra message or two, have to write out the default rule in full explicitly. Can get round this by writing out loads of code, which is error prone. Don't know if there is a better way ================================================================================ = inloop testing inloop - use to check continue, exit, and other commands that can only appear inside loops (for, while, loop) the only nodes that really need this attribute are the ones which can contain statements The inloop test is the only thing which uses the messages atm. It shouldn't, at some point inloop testing will become part of the type checking. This is just some example code, will probably do something a lot more heavy weight like symbolic interpretation - want to do all sorts of loop, return, nullability, etc. analysis. -} {- ATTR AllNodes Root ExpressionRoot [||messages USE {++} {[]} : {[Message]}] ATTR AllNodes [ inLoop: Bool||] SEM Root | Root statements.inLoop = False SEM ExpressionRoot | ExpressionRoot expr.inLoop = False -- set the inloop stuff which nests, it's reset inside a create -- function statement, in case you have a create function inside a -- loop, seems unlikely you'd do this though SEM Statement | ForSelectStatement ForIntegerStatement WhileStatement sts.inLoop = True | CreateFunction body.inLoop = False -- now we can check when we hit a continue statement if it is in the -- right context SEM Statement | ContinueStatement lhs.messages = if not @lhs.inLoop then [Error ContinueNotInLoop] else [] -} {- ================================================================================ = notes and todo containment guide for select expressions: combineselect 2 selects insert ?select createtableas 1 select createview 1 select return query 1 select forselect 1 select select->subselect select expression->exists select scalarsubquery select inselect select containment guide for statements: forselect [statement] forinteger [statement] while [statement] casestatement [[statement]] if [[statement]] createfunction->fnbody [Statement] TODO some non type-check checks: check plpgsql only in plpgsql function orderby in top level select only copy followed immediately by copydata iff stdin, copydata only follows copy from stdin count args to raise, etc., check same number as placeholders in string no natural with onexpr in joins typename -> setof (& fix parsing), what else like this? expressions: positionalarg in function, window function only in select list top level review all ast checks, and see if we can also catch them during parsing (e.g. typeName parses setof, but this should only be allowed for a function return, and we can make this a parse error when parsing from source code rather than checking a generated ast. This needs judgement to say whether a parse error is better than a check error, I think for setof it is, but e.g. for a continue not in a loop (which could be caught during parsing) works better as a check error, looking at the error message the user will get. This might be wrong, haven't thought too carefully about it yet). TODO: canonicalize ast process, as part of type checking produces a canonicalized ast which: all implicit casts appear explicitly in the ast (maybe distinguished from explicit casts?) all names fully qualified all types use canonical names literal values and selectors in one form (use row style?) nodes are tagged with types what else? Canonical form only defined for type consistent asts. This canonical form should pretty print and parse back to the same form, and type check correctly. -}