{- Copyright 2009 Jake Wheat This file contains the attr and sem definitions, which do the type checking, etc.. A lot of the haskell code has been moved into AstUtils.lhs, it is intended that only small amounts of code appear (i.e. one-liners) inline in this file, and larger bits go in AstUtils.lhs. These are only divided because the attribute grammar system uses a custom syntax with a custom preprocessor. These guidelines aren't followed very well. The current type checking approach doesn't quite match how SQL works. The main problem is that you can e.g. exec create table statements inside a function. This is something that the type checker will probably not be able to deal for a while if ever. (Will need hooks into postgresql to do this properly, which might not be impossible...). The main current limitation is that the ddl statements aren't passed on in the scope so e.g. it doesn't type check a create table followed by a select from that table. The support for this is nearly complete and it should be working very soon. Once most of the type checking is working, all the code and documentation will be overhauled quite a lot. Alternatively put, this code is in need of better documentation and organisation, and serious refactoring. ================================================================================ = main attributes used Here are the main attributes used in the type checking: sourcePos - holds the source position used in messages, not very accurate at the moment, just gives you the position of the first character in the current statement actual value - add this to all nodes out of laziness. We use these values in a limited number of places in the code -} ATTR AllNodes [ sourcePos: MySourcePos | | ] ATTR AllNodes Root ExpressionRoot [ scope : Scope | | actualValue: SELF ] {- Node types Just a hack to get started, provide a default type to each node (even though it doesn't make sense for a lot of them), and provide default rules for this attribute, even though this also doesn't make a lot of sense. Will be reviewed and removed quite soon. (Some nodes will keep the node type attribute, most will lose it.) -} ATTR NonListNodes Root ExpressionRoot [ | | nodeType USE {`setUnknown`} {UnknownType} : {Type} ] ATTR ListNodes [ | | nodeType USE {`appendTypeList`} {TypeList []} : {Type} ] { setUnknown :: Type -> Type -> Type setUnknown _ _ = UnknownType appendTypeList :: Type -> Type -> Type appendTypeList t1 (TypeList ts) = TypeList (t1:ts) appendTypeList t1 t2 = TypeList [t1,t2] } {- ================================================================================ = statement info slightly hacky, to support adding useful comments to sql source, create a new data type which is more descriptive than Type, so statements can expose information here. E.g. a create table can expose the table name, a create function can expose the function prototype This will evolve into the main annotation data type which is used to supply information, e.g. to intersperse into some sql source, or to get back info on an individual sql statement interactively in an ide. -} { data StatementInfo = DefaultStatementInfo Type | RelvarInfo CompositeDef | CreateFunctionInfo FunctionPrototype | SelectInfo Type | InsertInfo String Type | UpdateInfo String Type | DeleteInfo String | CreateDomainInfo String Type | DropInfo [(String,String)] | DropFunctionInfo [(String,[Type])] deriving (Eq,Show) } { --use this to make sure type errors are propagated into the statement --infos, temporary makeStatementInfo :: Type -> StatementInfo -> StatementInfo makeStatementInfo ty st = if isError ty then DefaultStatementInfo ty else st where isError t = case t of TypeError _ _ -> True TypeList ts -> any isError ts _ -> False } ATTR Statement [ | backType : Type | statementInfo : StatementInfo ] ATTR SourcePosStatement [ | | statementInfo : StatementInfo ] ATTR Root StatementList [ | | statementInfo : {[StatementInfo]} ] SEM StatementList | Nil lhs.statementInfo = [] | Cons lhs.statementInfo = @hd.statementInfo : @tl.statementInfo -- don't know how to copy the nodetype to the statement info in the -- sem statement so bounce it out then back in like this SEM SourcePosStatement | Tuple x2.backType = @x2.nodeType SEM Statement | Copy CopyData Truncate DropFunction DropSomething Assignment Return ReturnNext ReturnQuery Raise NullStatement Perform Execute ExecuteInto ForSelectStatement ForIntegerStatement WhileStatement ContinueStatement CaseStatement If lhs.statementInfo = DefaultStatementInfo @lhs.backType {- some helpers -} SEM MaybeExpression | Just lhs.nodeType = @just.nodeType | Nothing lhs.nodeType = TypeList [] ATTR StringList [ | | strings : {[String]} ] SEM StringList | Cons lhs.strings = @hd : @tl.strings | Nil lhs.strings = [] {- ================================================================================ semantics for source positions source positions aren't collected by the parser properly yet, so we just get what's available and propagate that. All the type errors should collect the source position information properly, so when it appears in the ast nodes, we can hook it up here and the errors should start giving accurate positions. -} SEM SourcePosStatement | Tuple x2.sourcePos = @x1 SEM Root | Root statements.sourcePos = ("",0,0) SEM ExpressionRoot | ExpressionRoot expr.sourcePos = ("",0,0) {- ================================================================================ = some basic typing == type names Types with type modifiers (called PrecTypeName here, to be changed), are not supported at the moment. -} SEM TypeName | SimpleTypeName --this needs to work a bit better, a simpletypename can match --domains, composite types, etc., not just scalar types lhs.nodeType = lookupTypeByName @lhs.scope @lhs.sourcePos $ canonicalizeTypeName @tn -- | PrecTypeName -- lhs.nodeType = if @tn `elem` defaultTypes -- then ScalarType @tn -- else TypeError @lhs.sourcePos -- (UnknownTypeError @tn) | ArrayTypeName lhs.nodeType = let t = ArrayType @typ.nodeType in checkErrors [@typ.nodeType ,checkTypeExists @lhs.scope @lhs.sourcePos t] t | SetOfTypeName lhs.nodeType = checkErrors [@typ.nodeType] (SetOfType @typ.nodeType) {- == literals -} SEM Expression | IntegerLit lhs.nodeType = typeInt | StringLit lhs.nodeType = UnknownStringLit | FloatLit lhs.nodeType = typeNumeric | BooleanLit lhs.nodeType = typeBool -- I think a null types like an unknown string lit | NullLit lhs.nodeType = UnknownStringLit {- ================================================================================ = expressions == cast expression -} SEM Expression | Cast lhs.nodeType = checkErrors [@expr.nodeType] @tn.nodeType {- == operators and functions -} SEM Expression | FunCall lhs.nodeType = checkErrors [@args.nodeType] $ typeCheckFunCall @lhs.scope @lhs.sourcePos @funName @args.nodeType {- == case expression for non simple cases, we need all the when expressions to be bool, and then to collect the types of the then parts to see if we can resolve a common type for simple cases, we need to check all the when parts have the same type as the value to check against, then we collect the then parts as above. so, the caseexpressionlistexpressionpair items each set their node type to a typelist with two elements, the first is a typelist of the when expressions, and the second is the type of the then expression. These can then be checked appropriately in the case or casesimple sem code. -} SEM Expression | Case lhs.nodeType = let elseThen = case @els.nodeType of TypeList [] -> [] t -> [t] unwrappedLists = map unwrapTypeList $ unwrapTypeList @cases.nodeType whenTypes :: [Type] whenTypes = concat $ map unwrapTypeList $ map head unwrappedLists thenTypes :: [Type] thenTypes = map (head . tail) unwrappedLists ++ elseThen whensAllBool :: Type whensAllBool = if any (/= typeBool) whenTypes then TypeError @lhs.sourcePos (WrongTypes typeBool whenTypes) else TypeList [] in checkErrors (whenTypes ++ thenTypes ++ [whensAllBool]) $ resolveResultSetType @lhs.scope @lhs.sourcePos thenTypes SEM Expression | CaseSimple lhs.nodeType = let elseThen = case @els.nodeType of TypeList [] -> [] t -> [t] unwrappedLists = map unwrapTypeList $ unwrapTypeList @cases.nodeType whenTypes :: [Type] whenTypes = concat $ map unwrapTypeList $ map head unwrappedLists thenTypes :: [Type] thenTypes = map (head . tail) unwrappedLists ++ elseThen checkWhenTypes = resolveResultSetType @lhs.scope @lhs.sourcePos (@value.nodeType:whenTypes) in checkErrors (whenTypes ++ thenTypes ++ [checkWhenTypes]) $ resolveResultSetType @lhs.scope @lhs.sourcePos thenTypes {- == identifiers pull id types out of scope for identifiers -} SEM Expression | Identifier lhs.nodeType = let (correlationName,iden) = splitIdentifier @i in scopeLookupID @lhs.scope @lhs.sourcePos correlationName iden { -- i think this should be alright, an identifier referenced in an -- expression can only have zero or one dot in it. splitIdentifier :: String -> (String,String) splitIdentifier s = let (a,b) = span (/= '.') s in if b == "" then ("", a) else (a,tail b) } SEM Expression | Exists lhs.nodeType = checkErrors [@sel.nodeType] typeBool {- == scalar subquery 1 col -> type of that col 2 + cols -> row type -} SEM Expression | ScalarSubQuery lhs.nodeType = let f = map snd $ unwrapComposite $ unwrapSetOf @sel.nodeType in checkErrors [@sel.nodeType] $ case length f of 0 -> error "internal error: no columns in scalar subquery?" 1 -> head f _ -> RowCtor f {- == inlist -} SEM Expression | InPredicate lhs.nodeType = let er = resolveResultSetType @lhs.scope @lhs.sourcePos [@expr.nodeType, @list.nodeType] in checkErrors [er] typeBool SEM InList | InList lhs.nodeType = resolveResultSetType @lhs.scope @lhs.sourcePos $ unwrapTypeList @exprs.nodeType | InSelect lhs.nodeType = let attrs = map snd $ unwrapComposite $ unwrapSetOf $ @sel.nodeType in case length attrs of 0 -> error "internal error - got subquery with no columns? in inselect" 1 -> head attrs _ -> RowCtor attrs {- ================================================================================ = basic select statements == nodeTypes -} SEM Statement | SelectStatement lhs.nodeType = @ex.nodeType lhs.statementInfo = makeStatementInfo @lhs.backType $ SelectInfo @ex.nodeType SEM SelectExpression --assume we get TypeList (TypeList (Type)) out of vll | Values lhs.nodeType = checkErrors [@vll.nodeType] $ typeCheckValuesExpr @lhs.scope @lhs.sourcePos @vll.nodeType | Select lhs.nodeType = checkErrors [@selTref.nodeType ,@selSelectList.nodeType ,@selWhere.nodeType] (let t = @selSelectList.nodeType in case t of UnnamedCompositeType [(_,Pseudo Void)] -> Pseudo Void _ -> SetOfType @selSelectList.nodeType) | CombineSelect lhs.nodeType = checkErrors [@sel1.nodeType,@sel2.nodeType] $ typeCheckCombineSelect @lhs.scope @lhs.sourcePos @sel1.nodeType @sel2.nodeType SEM TableRef | SubTref lhs.nodeType = checkErrors [@sel.nodeType] $ unwrapSetOfComposite @sel.nodeType lhs.idens = [(@alias, (unwrapComposite $ unwrapSetOf @sel.nodeType, []))] lhs.joinIdens = [] | TrefAlias Tref lhs.nodeType = fst $ getRelationType @lhs.scope @lhs.sourcePos @tbl lhs.joinIdens = [] | Tref lhs.idens = [(@tbl, both unwrapComposite $ getRelationType @lhs.scope @lhs.sourcePos @tbl)] | TrefAlias lhs.idens = [(@alias, both unwrapComposite $ getRelationType @lhs.scope @lhs.sourcePos @tbl)] | TrefFun lhs.nodeType = getFnType @lhs.scope @lhs.sourcePos "" @fn.actualValue @fn.nodeType lhs.joinIdens = [] lhs.idens = [second (\l -> (unwrapComposite l, [])) $ getFunIdens @lhs.scope @lhs.sourcePos "" @fn.actualValue @fn.nodeType] | TrefFunAlias lhs.nodeType = getFnType @lhs.scope @lhs.sourcePos @alias @fn.actualValue @fn.nodeType lhs.joinIdens = [] lhs.idens = [second (\l -> (unwrapComposite l, [])) $ getFunIdens @lhs.scope @lhs.sourcePos @alias @fn.actualValue @fn.nodeType] | JoinedTref lhs.nodeType = checkErrors [@tbl.nodeType ,@tbl1.nodeType] ret where ret = case (@nat.actualValue, @onExpr.actualValue) of (Natural, _) -> unionJoinList $ commonFieldNames @tbl.nodeType @tbl1.nodeType (_,Just (JoinUsing s)) -> unionJoinList s _ -> unionJoinList [] unionJoinList s = combineTableTypesWithUsingList @lhs.scope @lhs.sourcePos s @tbl.nodeType @tbl1.nodeType lhs.idens = @tbl.idens ++ @tbl1.idens lhs.joinIdens = commonFieldNames @tbl.nodeType @tbl1.nodeType { --returns the type of the relation, and the system columns also getRelationType :: Scope -> MySourcePos -> String -> (Type,Type) getRelationType scope sp tbl = case getAttrs scope [TableComposite, ViewComposite] tbl of Just ((_,_,a@(UnnamedCompositeType _)) ,(_,_,s@(UnnamedCompositeType _)) ) -> (a,s) _ -> (TypeError sp (UnrecognisedRelation tbl), TypeList []) getFnType :: Scope -> MySourcePos -> String -> Expression -> Type -> Type getFnType scope sp alias fnVal fnType = checkErrors [fnType] $ snd $ getFunIdens scope sp alias fnVal fnType getFunIdens :: Scope -> MySourcePos -> String -> Expression -> Type -> (String, Type) getFunIdens scope sp alias fnVal fnType = case fnVal of FunCall f _ -> let correlationName = if alias /= "" then alias else f in (correlationName, case fnType of SetOfType (CompositeType t) -> getCompositeType t SetOfType x -> UnnamedCompositeType [(correlationName,x)] y -> UnnamedCompositeType [(correlationName,y)]) x -> ("", TypeError sp (ContextError "FunCall")) where getCompositeType t = case getAttrs scope [Composite ,TableComposite ,ViewComposite] t of Just ((_,_,a@(UnnamedCompositeType _)), _) -> a _ -> UnnamedCompositeType [] commonFieldNames t1 t2 = intersect (fn t1) (fn t2) where fn (UnnamedCompositeType s) = map fst s fn _ = [] both :: (a->b) -> (a,a) -> (b,b) both fn (x,y) = (fn x, fn y) } SEM MTableRef | Nothing lhs.nodeType = TypeList [] lhs.idens = [] lhs.joinIdens = [] | Just lhs.nodeType = @just.nodeType SEM Where | Nothing lhs.nodeType = typeBool | Just lhs.nodeType = checkErrors [@just.nodeType] (if @just.nodeType /= typeBool then TypeError @lhs.sourcePos ExpressionMustBeBool else typeBool) SEM SelectItem | SelExp SelectItem lhs.nodeType = @ex.nodeType SEM SelectItemList | Cons lhs.nodeType = foldr consComposite @tl.nodeType (let (correlationName,iden) = splitIdentifier @hd.columnName in if iden == "*" then scopeExpandStar @lhs.scope @lhs.sourcePos correlationName else [(iden, @hd.nodeType)]) | Nil lhs.nodeType = UnnamedCompositeType [] SEM SelectList | SelectList lhs.nodeType = @items.nodeType {- == scope passing scope flow: current simple version: from tref -> select list -> where (so we take the identifiers and types from the tref part, and send them into the selectlist and where parts) 1. from 2. where 3. group by 4. having 5. select -} ATTR TableRef [ | | idens : {[QualifiedScope]} joinIdens : {[String]} ] ATTR MTableRef [ | | idens : {[QualifiedScope]} joinIdens : {[String]} ] SEM SelectExpression | Select selSelectList.scope = scopeReplaceIds @lhs.scope @selTref.idens @selTref.joinIdens selWhere.scope = scopeReplaceIds @lhs.scope @selTref.idens @selTref.joinIdens {- == attributes columnName is used to collect the column names that the select list produces, it is combined into an unnamedcompositetype in selectitemlist, which is also where star expansion happens. -} ATTR SelectItem [ | | columnName : String ] {- if the select item is just an identifier, then that column is named after the identifier e.g. select a, b as c, b + c from d, gives three columns one named a, one named c, and one unnamed, even though only one has an alias if the select item is a function or aggregate call at the top level, then it is named after that function or aggregate if it is a cast, the column is named after the target data type name iff it is a simple type name -} --default value for non identifier nodes ATTR Expression [ | | liftedColumnName USE {`(fixedValue "")`} {""}: String ] { fixedValue :: a -> a -> a -> a fixedValue a _ _ = a } {- override for identifier nodes, this only makes it out to the selectitem node if the identifier is not wrapped in parens, function calls, etc. -} SEM Expression | Identifier lhs.liftedColumnName = @i | FunCall lhs.liftedColumnName = if isOperator @funName then "" else @funName | Cast lhs.liftedColumnName = case @tn.actualValue of SimpleTypeName tn -> tn _ -> "" -- collect the aliases and column names for use by the selectitemlist nodes SEM SelectItem | SelExp lhs.columnName = case @ex.liftedColumnName of "" -> "?column?" s -> s | SelectItem lhs.columnName = @name {- ================================================================================ = insert -} SEM Statement | Insert lhs.nodeType = checkErrors [checkTableExists @lhs.scope @lhs.sourcePos @table ,@insData.nodeType ,checkColumnConsistency @lhs.scope @lhs.sourcePos @table @targetCols.strings (unwrapComposite $ unwrapSetOf @insData.nodeType)] @insData.nodeType lhs.statementInfo = makeStatementInfo @lhs.backType $ InsertInfo @table $ UnnamedCompositeType $ getColumnTypes @lhs.scope @lhs.sourcePos @table @targetCols.strings { checkTableExists :: Scope -> MySourcePos -> String -> Type checkTableExists scope sp tbl = case getAttrs scope [TableComposite, ViewComposite] tbl of Just _ -> TypeList [] _ -> TypeError sp (UnrecognisedRelation tbl) checkColumnConsistency :: Scope -> MySourcePos -> String -> [String] -> [(String,Type)] -> Type checkColumnConsistency scope sp tbl cols' insNameTypePairs = let --todo: check the cols have no duplicates --todo: check the missing target cols have defaults targetTableType = fst $ getRelationType scope sp tbl targetTableCols = unwrapComposite targetTableType --check the num cols in the insdata match the number of cols cols = if null cols' then map fst targetTableCols else cols' wrongLengthError = if length insNameTypePairs /= length cols then TypeError sp WrongNumberOfColumns else TypeList [] --check the target cols appear in the target table and get their types nonMatchingColumns = cols \\ map fst targetTableCols nonMatchingErrors = case length nonMatchingColumns of 0 -> TypeList [] 1 -> makeUnknownColumnError $ head nonMatchingColumns _ -> TypeList $ map makeUnknownColumnError nonMatchingColumns targetNameTypePairs = map (\l -> (l, fromJust $ lookup l targetTableCols)) cols --check the types of the insdata match the column targets --name datatype columntype typeTriples = map (\((a,b),c) -> (a,b,c)) $ zip targetNameTypePairs $ map snd insNameTypePairs matchingTypeErrors = map (\(_,b,c) -> checkAssignmentValid scope sp c b) typeTriples in checkErrors [targetTableType ,wrongLengthError ,nonMatchingErrors ,TypeList matchingTypeErrors] $ TypeList [] where makeUnknownColumnError = TypeError sp . UnrecognisedIdentifier getColumnTypes :: Scope -> MySourcePos -> String -> [String] -> [(String,Type)] getColumnTypes scope sp tbl cols' = let targetTableType = fst $ getRelationType scope sp tbl targetTableCols = unwrapComposite targetTableType cols = if null cols' then map fst targetTableCols else cols' nonMatchingColumns = cols \\ map fst targetTableCols nonMatchingErrors = case length nonMatchingColumns of 0 -> TypeList [] 1 -> makeUnknownColumnError $ head nonMatchingColumns _ -> TypeList $ map makeUnknownColumnError nonMatchingColumns in map (\l -> (l, fromJust $ lookup l targetTableCols)) cols where makeUnknownColumnError = TypeError sp . UnrecognisedIdentifier } {- ================================================================================ = update -} SEM Statement | Update lhs.nodeType = checkErrors [checkTableExists @lhs.scope @lhs.sourcePos @table ,@whr.nodeType ,@assigns.nodeType ,checkColumnConsistency @lhs.scope @lhs.sourcePos @table colNames colTypes] @assigns.nodeType where colNames = map fst @assigns.pairs colTypes = @assigns.pairs lhs.statementInfo = makeStatementInfo @lhs.backType $ UpdateInfo @table $ UnnamedCompositeType $ getColumnTypes @lhs.scope @lhs.sourcePos @table $ map fst @assigns.pairs ATTR SetClauseList [ | | pairs : {[(String,Type)]} ] ATTR SetClause [ | | pairs : {[(String,Type)]} ] SEM SetClauseList | Cons lhs.pairs = @hd.pairs ++ @tl.pairs | Nil lhs.pairs = [] SEM SetClause | SetClause lhs.nodeType = checkErrors [@val.nodeType] $ TypeList [] lhs.pairs = [(@att, @val.nodeType)] | RowSetClause lhs.nodeType = let atts = @atts.strings types = getRowTypes @vals.nodeType lengthError = if length atts /= length types then TypeError @lhs.sourcePos WrongNumberOfColumns else TypeList [] in checkErrors [lengthError] $ TypeList [] lhs.pairs = zip @atts.strings $ getRowTypes @vals.nodeType { getRowTypes :: Type -> [Type] getRowTypes (TypeList [(RowCtor ts)]) = ts getRowTypes (TypeList ts) = ts getRowTypes x = error $ "cannot get row types from " ++ show x } {- ================================================================================ = delete -} SEM Statement | Delete lhs.nodeType = checkErrors [checkTableExists @lhs.scope @lhs.sourcePos @table ,@whr.nodeType] $ TypeList [] lhs.statementInfo = makeStatementInfo @lhs.backType $ DeleteInfo @table {- ================================================================================ = create table scope needs to be modified: types, typenames, typecats, attrdefs, systemcolumns produces a compositedef: (name, tablecomposite, unnamedcomp [(attrname, type)]) -} ATTR AttributeDef [ | | attrName : String ] SEM AttributeDef | AttributeDef lhs.attrName = @name lhs.nodeType = @typ.nodeType SEM AttributeDefList | Cons lhs.nodeType = checkErrors [@tl.nodeType, @hd.nodeType] $ consComposite (@hd.attrName, @hd.nodeType) @tl.nodeType | Nil lhs.nodeType = UnnamedCompositeType [] SEM Statement | CreateTable lhs.nodeType = @atts.nodeType lhs.statementInfo = makeStatementInfo @lhs.backType $ RelvarInfo (@name, TableComposite, @atts.nodeType) SEM Statement | CreateTableAs lhs.statementInfo = makeStatementInfo @lhs.backType $ RelvarInfo (@name, TableComposite, @expr.nodeType) {- ================================================================================ = create view -} SEM Statement | CreateView lhs.statementInfo = makeStatementInfo @lhs.backType $ RelvarInfo (@name, ViewComposite, @expr.nodeType) {- ================================================================================ = create type -} ATTR TypeAttributeDef [ | | attrName : String ] SEM TypeAttributeDef | TypeAttDef lhs.nodeType = @typ.nodeType lhs.attrName = @name SEM TypeAttributeDefList | Cons lhs.nodeType = checkErrors [@tl.nodeType, @hd.nodeType] $ consComposite (@hd.attrName, @hd.nodeType) @tl.nodeType | Nil lhs.nodeType = UnnamedCompositeType [] SEM Statement | CreateType lhs.statementInfo = makeStatementInfo @lhs.backType $ RelvarInfo (@name, Composite, @atts.nodeType) {- ================================================================================ = create domain -} SEM Statement | CreateDomain lhs.statementInfo = makeStatementInfo @lhs.backType $ CreateDomainInfo @name @typ.nodeType {- ================================================================================ = create function ignore body for now, just get the signature -} ATTR ParamDef [ | | paramName : String ] ATTR ParamDefList [ | | params : {[(String,Type)]} ] SEM ParamDef | ParamDef ParamDefTp lhs.nodeType = @typ.nodeType | ParamDef lhs.paramName = @name | ParamDefTp lhs.paramName = "" SEM ParamDefList | Nil lhs.params = [] | Cons lhs.params = ((@hd.paramName, @hd.nodeType) : @tl.params) SEM Statement | CreateFunction lhs.statementInfo = makeStatementInfo @lhs.backType $ CreateFunctionInfo (@name,map snd @params.params,@rettype.nodeType) {- ================================================================================ = static tests Try to use a list of message data types to hold all sorts of information which works its way out to the top level where the client code gets it. Want to have the lists concatenated together automatically from subnodes to parent node, and then to be able to add extra messages to this list at each node also. Problem 1: can't have two sem statements for the same node type which both add messages, and then the messages get combined to provide the final message list attribute value for that node. You want this so that e.g. that different sorts of checks appear in different sections. Workaround is instead of having each check in it's own section, to combine them all into one SEM. Problem 2: no shorthand to combine what the default rule for messages would be and then add a bit extra - so if you want all the children messages, plus possibly an extra message or two, have to write out the default rule in full explicitly. Can get round this by writing out loads of code. Both the workarounds to these seem a bit tedious and error prone, and will make the code much less readable. Maybe need a preprocessor to produce the ag file? Alternatively, just attach the messages to each node (so this appears in the data types and isn't an attribute, then have a tree walker collect them all). Since an annotation field in each node is going to be added anyway, so each node can be labelled with a type, will probably do this at some point. ================================================================================ = inloop testing inloop - use to check continue, exit, and other commands that can only appear inside loops (for, while, loop) the only nodes that really need this attribute are the ones which can contain statements The inloop test is the only thing which uses the messages atm. It shouldn't, at some point inloop testing will become part of the type checking. This is just some example code, will probably do something a lot more heavy weight like symbolic interpretation - want to do all sorts of loop, return, nullability, etc. analysis. -} ATTR AllNodes Root ExpressionRoot [ | | messages USE {++} {[]} : {[Message]} ] ATTR AllNodes [ inLoop: Bool | | ] SEM Root | Root statements.inLoop = False SEM ExpressionRoot | ExpressionRoot expr.inLoop = False -- set the inloop stuff which nests, it's reset inside a create -- function statement, in case you have a create function inside a -- loop, seems unlikely you'd do this though SEM Statement | ForSelectStatement ForIntegerStatement WhileStatement sts.inLoop = True | CreateFunction body.inLoop = False -- now we can check when we hit a continue statement if it is in the -- right context SEM Statement | ContinueStatement lhs.messages = if not @lhs.inLoop then [Error @lhs.sourcePos ContinueNotInLoop] else [] {- ================================================================================ = notes and todo containment guide for select expressions: combineselect 2 selects insert ?select createtableas 1 select createview 1 select return query 1 select forselect 1 select select->subselect select expression->exists select scalarsubquery select inselect select containment guide for statements: forselect [statement] forinteger [statement] while [statement] casestatement [[statement]] if [[statement]] createfunction->fnbody [Statement] TODO some non type-check checks: check plpgsql only in plpgsql function orderby in top level select only copy followed immediately by copydata iff stdin, copydata only follows copy from stdin count args to raise, etc., check same number as placeholders in string no natural with onexpr in joins typename -> setof (& fix parsing), what else like this? expressions: positionalarg in function, window function only in select list top level review all ast checks, and see if we can also catch them during parsing (e.g. typeName parses setof, but this should only be allowed for a function return, and we can make this a parse error when parsing from source code rather than checking a generated ast. This needs judgement to say whether a parse error is better than a check error, I think for setof it is, but e.g. for a continue not in a loop (which could be caught during parsing) works better as a check error, looking at the error message the user will get. This might be wrong, haven't thought too carefully about it yet). TODO: canonicalize ast process, as part of type checking produces a canonicalized ast which: all implicit casts appear explicitly in the ast (maybe distinguished from explicit casts?) all names fully qualified all types use canonical names literal values and selectors in one form (use row style?) nodes are tagged with types what else? Canonical form only defined for type consistent asts. This canonical form should pretty print and parse back to the same form, and type check correctly. -}