Annotation planning add an annotation field to most nodes in the ast something like data Annotation = NonAnnotation | SourcePosAnnotation SourcePos | CheckedAnnotation SourcePos Type [Messages] alter the parser to add sourcepositions to these nodes - assume that one source position per node will be enough for now (not necessarily a good assumption with weird sql syntax), and see if this gives us enough for good error messages, etc.. question: if a node has no source position e.g. the all in select all or select distinct may correspond to a token or may be synthesized as the default if neither all or distinct is present. Should this have the source position of where the token would have appeared, should it inherit it from its parent, should there be a separate ctor to represent a fake node with no source position? The way the type checking will then work is that instead of producing some attribute values it will produce a transformed ast tree with the type and message fields filled in. Then supply some utility functions to e.g. extract all the messages, extract all the type errors, extract the top level types, etc. Use some sort of tree walker to implement these utils The way types and type errors will work is that instead of / in addition to the types being passed in attributes, they'll be saved in the transformed tree. Type errors won't percolate up to the top level, but sit with the node that is in error. Any parent nodes which need this type to calculate their own type, will use a separate error to say type unknown. If they can calculate their type without depending on a type erroring child node, then they do that, so e.g. typing a set of statements with create functions and views which use those functions: if the statements inside the functions have type errors, we can still find the types of the views, assuming that the function params and return type check properly, and are correct. = Current TODO list TODOs in rough order that they are intended to be done stage 1: do some stuff patchily to try to get a bunch of common sql partially type-checking (focus is on getting the result types of most select statements) todos for this stage: integrate create types into scope, plus do drops with scope chain scopes when showinfo-ing multiple files Pretty printer for statementinfo stage 2: make this useful by working on the showinfodb function: instead of interspersing the statementinfo with the pretty printed statements, insert or overwrite the statementinfo comments in the original source so we preserve formatting and other comments. use the type pretty printer to make the statementinfo comments more readable want to also output types of parts of statements, e.g. the select in a for statement or insert, etc.. Other things that could be useful include adding the resolved function prototype for each function used which has overloads so you can see which overload is being called, write out the canonicalized ast: so e.g. we have all the casts made explicit, etc. With this in place, the code actually has some use for real code, in that we can use it to easily view the types of views, etc. inline in the real sql source code stage 3: review and choose from this list: * do null inference * some selective fixups here and there to the typing (e.g. type checking constraints in create tables) * selectively add some missing syntax, to cover the most glaring hole * schema qualification * type check statements inside create function * something else from the todo for milestone 0.1 below * something else ================================================================================ rough milestones for release 0.1: in addition to all the stage 3 items above, add support for nearly all syntax for parsing and type checking, instead of doing piecemeal bits, so go through the pg manual part II, support almost everything, add comprehensive simple tests, go through the sql reference section also. This is the time to document more precisely what isn't supported so there is a clear reference for this do ? placeholders, and do typesafe haskell wrapper generation using this figure out what to do about tricky operator precedence parsing, etc. ability to type check all of chaos sql example for generating sql code from haskell using the ast get database loader and typesafe access generators good enough to use in chaos example usage of each of these look at the error message formatting, particularly try to fix the parser errors so they make more sense add annotation field to most ast nodes, store type and source positioning in this field, fix parser to add lots of accurate positioning information when parsing. make sure the lint process works on text dumps of databases. try checking the sample databases: http://pgfoundry.org/projects/dbsamples/ ================================================================================ some syntax todo, not organised: ------------ add support for following sql syntax (+ type checking) alter table, common variations create index create rule create trigger + drops for all creates + maybe alters? ctes loop, exit, labels easy ones: transactions, savepoints, listen prepare, execute + using some more: create or replace alter table transactions: begin, checkpoint, commit, end, rollback cursors: declare, open, fetch, move, close, where current of copy - parse properly create database create index create rule create trigger + plpgsql support grant,revoke listen, notify, unlisten prepare, execute savepoint, release savepoint, rollback to savepoint set, reset set constraints set role set transaction correlated subquery attrs plpgsql blocks which aren't at the top level of a function % types strict on intos not null for var defs exception execute using get diagnostics return query execute raise missing bits out params elsif loop exit labels reverse, by in for for in execute expressions: process string escapes, support dollar quoting and other quoting more robustly in the pretty printer full user operator support (?) fix expression parser properly to handle things like between - see grammar in pg source for info on how to do this [:] array slices aggregate: all and distinct multi dimensional arrays: selectors and subscripting missing keyword operators datetime extract time zone subquery operators: any, some, all in general, parsing operators is wrong, the lexer needs to be able to lex sequences of symbols into single/multiple operators correctly, what happens at the moment is a kludge, also, general operator parsing will change how operators are represented in the ast ================================================================================ some other random ideas: null treatment Basic motivation is to keep nulls carefully walled off, controlled, and be able to catch them when they sneak back into expressions, etc.. For each value, etc. we determine statically if it might be null. This can be done for return types of functions, fields in a select expression, etc.. (will do mappings e.g. if a functions inputs are all non null, then the output is non null, etc.). Once this is working ok, the second stage is to implement the anti null warnings/ errors. Allow nulls in tables, outer joins, in coalesce, to be produced by selects (maybe add or remove from this allowed list, maybe make it configurable on a per project basis). Never allow nulls to be an argument to a function call, (including ops, keyword ops, etc.). So every time you have a field being used in an expression and it cannot be statically verified to be non null, you have to insert a coalesce or fix it in some other way. So nulls can still be used to represent optional values, n/a, etc.. and output to clients doing selects, but there is no need to grapple with: * 3vl (or whatever it is that sql uses instead), * what the result of a function call is if the some or all the arguments are null, * what the result of a sum aggregate is if some of the values are null, * etc., because none of these things are allowed. parser, converter and pretty printer for explain output, want to view how a query is executed in human readable pseudocode. Add lint type checks, etc. to this, which can suggest ways to rewrite the query to get better performance. Another idea is to make the dependencies on the values in the tables more explicit, so you can see how much the data can change before another plan is chosen, or you can see a bad assumption about the kind of data the query will be run on. write a replacement psql shell, which can expose parse trees, type checks, lint checks, and doesn't use a one line at a time style interface (i.e. works more like writing and executing lisp in emacs, not like bash). chain scope lookups instead of unioning them since unioning is too slow - or maybe use maps/sets, but need to quickly scan whole lists e.g. for function lookup, which can't really use any sort of key based lookup, where the key the function lookup uses is the same as the key the map/set uses. incorporate pg regression test sql into parsing and type checking tests write a show for parsec errors which formats the lex tokens and expected lists properly (was broken when moved to the separate lexer) add haddock docs to public api write some example programs with plenty of comments - will this mainly be used as a library or as a utility though? redo cabal file to add compile time options: exes, pg support, tests or split into separate packages? sort out modules/folder use work on error reporting, add tests for malformed sql add token location info to ast nodes, modify for type checking, etc support. want to report multiple parse errors, perhaps can bodge this because of the property that ';' can only appear inside a string or comment, or otherwise at the end of a statement, so add some code to jump to the next end of statement looking ';' and continue to parse to end of file in an attempt to catch at least some further syntax errors improve tests: identify each bit of syntax and make sure there is a test for it add some bigger tests: lots of sql statements, big functions look for possible corner cases and add tests get property checker working again - one problem is that the pretty printer will reject some asts (which the parser cannot produce), and the parser will probably reject some invalid sql that the pretty printer will happily produce from some asts. ability to write new lint checks, and choose which lint checks to use on a per project basis. plpgsql on 'roids: write libraries in haskell, and then write syntax extensions for plpgsql using the extension mechanism to access these libs from extended plpgsql e.g. ui lib written in haskell, accessed by syntax extensions in plpgsql then can write the database and ui all in the same source code in the same language, with first class support for properly typed relation valued expressions, avoiding multiple languages and mapping/'impedance mismatch' between database types and types in the language you write the ui in.