= Current TODO list TODOs in rough order that they are intended to be done provide type check errors with source positions sort out source position collection in parser investigate syb for annotation instances use f :: Annotation -> Doc for pretty printing annotations work on statement info get scope updating whilst checking start work on type checking inside functions - start with params, return types, non plpgsql statements, stuff with selects (e.g. for). work on parsing and type checking pg_dump output, then do a util to dump and type check a live database (this will lead to being able to run the lint process on a live database rather than source) sort out api + do haddock provide installation instructions for non haskell programmers chain scopes when typechecking multiple files from util, provide api to do this from code parse and/or type check todo list: "identifier" 6.5e-5 type 'string' style type cast [:] slice missing keyword ops default template1 operators should all parse composite field selection agg(all expr) agg(distinct expr), agg(*) window frame clauses, named windows parse inside string literals when cast, for common types multidimensional arrays implicit casting row values to composites default values serial make sure can type check everything that parses constraint names provide list of keys in info for create/alter table: include unique not null and serials type check fks, and other constraints alter table: add/remove column constraint default value column type rename column rename table what other alters/creates views, functions, operators, types, domains, triggers, rules selects: implicit joins group by, having + group by with unaggregated and aggregated fields distinct, on order by - do properly limit, offset with queries upto datatypes ch 8 in pg manual stage 2: make this useful by working on the showinfodb function: instead of interspersing the statementinfo with the pretty printed statements, insert or overwrite the statementinfo comments in the original source so we preserve formatting and other comments. use the type pretty printer to make the statementinfo comments more readable want to also output types of parts of statements, e.g. the select in a for statement or insert, etc.. Other things that could be useful include adding the resolved function prototype for each function used which has overloads so you can see which overload is being called, write out the canonicalized ast: so e.g. we have all the casts made explicit, etc. With this in place, the code actually has some use for real code, in that we can use it to easily view the types of views, etc. inline in the real sql source code stage 3: review and choose from this list: * do null inference * some selective fixups here and there to the typing (e.g. type checking constraints in create tables) * selectively add some missing syntax, to cover the most glaring holes * schema qualification * type check statements inside create function * something else from the todo for milestone 0.1 below * something else ================================================================================ rough milestones for release 0.1: in addition to all the stage 3 items above, add support for nearly all syntax for parsing and type checking, instead of doing piecemeal bits, so go through the pg manual part II, support almost everything, add comprehensive simple tests, go through the sql reference section also. This is the time to document more precisely what isn't supported so there is a clear reference for this do ? placeholders, and do typesafe haskell wrapper generation using this figure out what to do about tricky operator precedence parsing, etc. ability to type check all of chaos sql example for generating sql code from haskell using the ast get database loader and typesafe access generators good enough to use in chaos example usage of each of these look at the error message formatting, particularly try to fix the parser errors so they make more sense add annotation field to most ast nodes, store type and source positioning in this field, fix parser to add lots of accurate positioning information when parsing. make sure the lint process works on text dumps of databases. try checking the sample databases: http://pgfoundry.org/projects/dbsamples/ ================================================================================ some syntax todo, not organised: ------------ add support for following sql syntax (+ type checking) alter table, common variations create index create rule create trigger + drops for all creates + maybe alters? ctes loop, exit, labels easy ones: transactions, savepoints, listen prepare, execute + using some more: create or replace alter table transactions: begin, checkpoint, commit, end, rollback cursors: declare, open, fetch, move, close, where current of copy - parse properly create database create index create rule create trigger + plpgsql support grant,revoke listen, notify, unlisten prepare, execute savepoint, release savepoint, rollback to savepoint set, reset set constraints set role set transaction correlated subquery attrs plpgsql blocks which aren't at the top level of a function % types strict on intos not null for var defs exception execute using get diagnostics return query execute raise missing bits out params elsif loop exit labels reverse, by in for for in execute expressions: process string escapes, support dollar quoting and other quoting more robustly in the pretty printer full user operator support (?) fix expression parser properly to handle things like between - see grammar in pg source for info on how to do this [:] array slices aggregate: all and distinct multi dimensional arrays: selectors and subscripting missing keyword operators datetime extract time zone subquery operators: any, some, all in general, parsing operators is wrong, the lexer needs to be able to lex sequences of symbols into single/multiple operators correctly, what happens at the moment is a kludge, also, general operator parsing will change how operators are represented in the ast ================================================================================ some other random ideas: null treatment Basic motivation is to keep nulls carefully walled off, controlled, and be able to catch them when they sneak back into expressions, etc.. For each value, etc. we determine statically if it might be null. This can be done for return types of functions, fields in a select expression, etc.. (will do mappings e.g. if a functions inputs are all non null, then the output is non null, etc.). Once this is working ok, the second stage is to implement the anti null warnings/ errors. Allow nulls in tables, outer joins, in coalesce, to be produced by selects (maybe add or remove from this allowed list, maybe make it configurable on a per project basis). Never allow nulls to be an argument to a function call, (including ops, keyword ops, etc.). So every time you have a field being used in an expression and it cannot be statically verified to be non null, you have to insert a coalesce or fix it in some other way. So nulls can still be used to represent optional values, n/a, etc.. and output to clients doing selects, but there is no need to grapple with: * 3vl (or whatever it is that sql uses instead), * what the result of a function call is if the some or all the arguments are null, * what the result of a sum aggregate is if some of the values are null, * etc., because none of these things are allowed. parser, converter and pretty printer for explain output, want to view how a query is executed in human readable pseudocode. Add lint type checks, etc. to this, which can suggest ways to rewrite the query to get better performance. Another idea is to make the dependencies on the values in the tables more explicit, so you can see how much the data can change before another plan is chosen, or you can see a bad assumption about the kind of data the query will be run on. write a replacement psql shell, which can expose parse trees, type checks, lint checks, and doesn't use a one line at a time style interface (i.e. works more like writing and executing lisp in emacs, not like bash). chain scope lookups instead of unioning them since unioning is too slow - or maybe use maps/sets, but need to quickly scan whole lists e.g. for function lookup, which can't really use any sort of key based lookup, where the key the function lookup uses is the same as the key the map/set uses. incorporate pg regression test sql into parsing and type checking tests write a show for parsec errors which formats the lex tokens and expected lists properly (was broken when moved to the separate lexer) add haddock docs to public api write some example programs with plenty of comments - will this mainly be used as a library or as a utility though? redo cabal file to add compile time options: exes, pg support, tests or split into separate packages? sort out modules/folder use work on error reporting, add tests for malformed sql add token location info to ast nodes, modify for type checking, etc support. want to report multiple parse errors, perhaps can bodge this because of the property that ';' can only appear inside a string or comment, or otherwise at the end of a statement, so add some code to jump to the next end of statement looking ';' and continue to parse to end of file in an attempt to catch at least some further syntax errors improve tests: identify each bit of syntax and make sure there is a test for it add some bigger tests: lots of sql statements, big functions look for possible corner cases and add tests get property checker working again - one problem is that the pretty printer will reject some asts (which the parser cannot produce), and the parser will probably reject some invalid sql that the pretty printer will happily produce from some asts. ability to write new lint checks, and choose which lint checks to use on a per project basis. plpgsql on 'roids: write libraries in haskell, and then write syntax extensions for plpgsql using the extension mechanism to access these libs from extended plpgsql e.g. ui lib written in haskell, accessed by syntax extensions in plpgsql then can write the database and ui all in the same source code in the same language, with first class support for properly typed relation valued expressions, avoiding multiple languages and mapping/'impedance mismatch' between database types and types in the language you write the ui in.