new features: pretty printer for annotations for annotatesource and checksource tidy up syb stuff review use of annotations/attributes in type checker run through error handling, fix spaghetti code and bizarre stuff add either to remaining utils, convert error to either so we get internal errors in the aast instead of killing the program check code ag <-> lhs review the typechecking modules -> can get rid of astutils? probably not, but can rename it to errorhandling or similar review ast/typechecking split, move ast to separate folder?, make sure don't have typechecking stuff in ast, and ast doesn't depend on type checking stuff (problem: annotations?) lint stuff: ambiguous identifiers, null usage, duplicate definitions error handling fixup not just error handling, but most of the code in the type checking files. 1. don't use error, use InternalError as part of a Either, these end up in the ast tree 2. add eithers to a bunch of env functions - make sure we check all the preconditions, e.g. check a view exists when trying to look up the attributes. Also add a bunch of precondition checks to the updateenvironment function, e.g. checking for duplicate types 3. make sure errors from updateenvironment aren't ignored, but end up in the ast tree 4. review the error handling utilities - probably could do with a rethink, also definitely need better names. The kind of things needed are: way to propagate typecheckfailed, dealing with conversions from lists of eithers to single eithers by concatenating the errors, if no errors take the last right, etc. 5. support variadic args to get rid of the current hacks to work around this not being supported 6. go through all the actual type checking code and work out some more consistent way of writing the code, and work on making the algorithms clearer, some are really obscure at the moment, mainly due to programmer incompetence and also many changes to the way various things are handled (particularly the error handling and typecheckfailed propagation), and just need pretty much rewriting from scratch. = Additional TODO list towards alpha release run through chaos sql and fix all type checking problems, make sure roundtripping through annotate source doesn't mangle the code start work on type checking inside functions - start with params, return types, non plpgsql statements, stuff with selects (e.g. for). work on parsing and type checking pg_dump output, then do a util to dump and type check a live database (this will lead to being able to run the lint process on a live database rather than source) work out api + do haddock provide installation instructions for non haskell programmers parse and/or type check todo list: "identifier" 6.5e-5 type 'string' style type cast [:] slice missing keyword ops default template1 operators should all parse composite field selection agg(all expr) agg(distinct expr), agg(*) window frame clauses, named windows parse inside string literals when cast, for common types multidimensional arrays implicit casting row values to composites default values serial make sure can type check everything that parses constraint names provide list of keys in info for create/alter table: include unique not null and serials type check fks, and other constraints alter table: add/remove column constraint default value column type rename column rename table what other alters/creates views, functions, operators, types, domains, triggers, rules selects: implicit joins group by, having + group by with unaggregated and aggregated fields distinct, on order by - do properly limit, offset with queries type modifiers data type names with spaces in them timestamps schemas alternative text for true and false enums geometric types weird syntax composites: selector variants, rowctors, component get/update do all keyword and template1 operators any/some/all subqueries and arrays check over rowwise comparisons indexes: create/alter/drop go over what more type checking could be done stage 2: instead of interspersing the annotation output with the pretty printed statements, insert or overwrite the annotation comments in the original source so we preserve formatting and other comments. want to also output types of parts of statements, e.g. the select in a for statement or insert, etc.. Other things that could be useful include adding the resolved function prototype for each function used which has overloads so you can see which overload is being called, write out the canonicalized ast: so e.g. we have all the casts made explicit, etc. -> use the annotation system for this With this in place, the code actually has some use for real code, in that we can use it to easily view the types of views, etc. inline in the real sql source code stage 3: review and choose from this list: * do null inference * some selective fixups here and there to the typing (e.g. type checking constraints in create tables) * selectively add some missing syntax, to cover the most glaring holes * schema qualification * type check statements inside create function * something else from the todo for milestone 0.1 below * something else ================================================================================ rough milestones for first alpha in addition to all the stage 3 items above, add support for nearly all syntax for parsing and type checking, instead of doing piecemeal bits, so go through the pg manual part II, support almost everything, add comprehensive simple tests, go through the sql reference section also. This is the time to document more precisely what isn't supported so there is a clear reference for this do ? placeholders, and do typesafe haskell wrapper generation using this figure out what to do about tricky operator precedence parsing, etc. ability to type check all of chaos sql example for generating sql code from haskell using the ast get database loader and typesafe access generators good enough to use in chaos example usage of each of these look at the error message formatting, particularly try to fix the parser errors so they make more sense add annotation field to most ast nodes, store type and source positioning in this field, fix parser to add lots of accurate positioning information when parsing. make sure the lint process works on text dumps of databases. try checking the sample databases: http://pgfoundry.org/projects/dbsamples/ ================================================================================ some syntax todo, not organised: ------------ add support for following sql syntax (+ type checking) alter table, common variations create index create rule create trigger + drops for all creates + maybe alters? ctes loop, exit, labels easy ones: transactions, savepoints, listen prepare, execute + using some more: create or replace alter table transactions: begin, checkpoint, commit, end, rollback cursors: declare, open, fetch, move, close, where current of copy - parse properly create database create index create rule create trigger + plpgsql support grant,revoke listen, notify, unlisten prepare, execute savepoint, release savepoint, rollback to savepoint set, reset set constraints set role set transaction correlated subquery attrs plpgsql blocks which aren't at the top level of a function % types strict on intos not null for var defs exception execute using get diagnostics return query execute raise missing bits out params elsif loop exit labels reverse, by in for for in execute expressions: process string escapes, support dollar quoting and other quoting more robustly in the pretty printer full user operator support (?) fix expression parser properly to handle things like between - see grammar in pg source for info on how to do this [:] array slices aggregate: all and distinct multi dimensional arrays: selectors and subscripting missing keyword operators datetime extract time zone subquery operators: any, some, all in general, parsing operators is wrong, the lexer needs to be able to lex sequences of symbols into single/multiple operators correctly, what happens at the moment is a kludge, also, general operator parsing will change how operators are represented in the ast ================================================================================ some other random ideas: null treatment Basic motivation is to keep nulls carefully walled off, controlled, and be able to catch them when they sneak back into expressions, etc.. For each value, etc. we determine statically if it might be null. This can be done for return types of functions, fields in a select expression, etc.. (will do mappings e.g. if a functions inputs are all non null, then the output is non null, etc.). Once this is working ok, the second stage is to implement the anti null warnings/ errors. Allow nulls in tables, outer joins, in coalesce, to be produced by selects (maybe add or remove from this allowed list, maybe make it configurable on a per project basis). Never allow nulls to be an argument to a function call, (including ops, keyword ops, etc.). So every time you have a field being used in an expression and it cannot be statically verified to be non null, you have to insert a coalesce or fix it in some other way. So nulls can still be used to represent optional values, n/a, etc.. and output to clients doing selects, but there is no need to grapple with: * 3vl (or whatever it is that sql uses instead), * what the result of a function call is if the some or all the arguments are null, * what the result of a sum aggregate is if some of the values are null, * etc., because none of these things are allowed. parser, converter and pretty printer for explain output, want to view how a query is executed in human readable pseudocode. Add lint type checks, etc. to this, which can suggest ways to rewrite the query to get better performance. Another idea is to make the dependencies on the values in the tables more explicit, so you can see how much the data can change before another plan is chosen, or you can see a bad assumption about the kind of data the query will be run on. write a replacement psql shell, which can expose parse trees, type checks, lint checks, and doesn't use a one line at a time style interface (i.e. works more like writing and executing lisp in emacs, not like bash). chain scope lookups instead of unioning them since unioning is too slow - or maybe use maps/sets, but need to quickly scan whole lists e.g. for function lookup, which can't really use any sort of key based lookup, where the key the function lookup uses is the same as the key the map/set uses. incorporate pg regression test sql into parsing and type checking tests write a show for parsec errors which formats the lex tokens and expected lists properly (was broken when moved to the separate lexer) add haddock docs to public api write some example programs with plenty of comments - will this mainly be used as a library or as a utility though? redo cabal file to add compile time options: exes, pg support, tests or split into separate packages? sort out modules/folder use work on error reporting, add tests for malformed sql add token location info to ast nodes, modify for type checking, etc support. want to report multiple parse errors, perhaps can bodge this because of the property that ';' can only appear inside a string or comment, or otherwise at the end of a statement, so add some code to jump to the next end of statement looking ';' and continue to parse to end of file in an attempt to catch at least some further syntax errors improve tests: identify each bit of syntax and make sure there is a test for it add some bigger tests: lots of sql statements, big functions look for possible corner cases and add tests get property checker working again - one problem is that the pretty printer will reject some asts (which the parser cannot produce), and the parser will probably reject some invalid sql that the pretty printer will happily produce from some asts. ability to write new lint checks, and choose which lint checks to use on a per project basis. plpgsql on 'roids: write libraries in haskell, and then write syntax extensions for plpgsql using the extension mechanism to access these libs from extended plpgsql e.g. ui lib written in haskell, accessed by syntax extensions in plpgsql then can write the database and ui all in the same source code in the same language, with first class support for properly typed relation valued expressions, avoiding multiple languages and mapping/'impedance mismatch' between database types and types in the language you write the ui in.