This catalog code: 1. is used by the typechecker internally to hssqlppp 2. does most of the validation on ddl updates and provides nice error message when an update is not valid. It also tracks ownership, dependencies and models cascade and restrict on drop The CatalogUpdate's model alter and drop, as well as create. The CatalogUpdate's are supposed to represent updates after they have been desugared, so they don't correspond 1-1 with dml statements. The catalog information is only to model enough for these features, and e.g. doesn't have enough information for a complete catalog in a sql dbms, for instance it doesn't track the current value for a sequence. The catalog code api follows these principles: unquoted identifiers must be folded to the correct case for the dialect select * from t -> use 't' in postgres, and 'T' in sql server for instance names of built in types must use the canonical names in the current dialect for those types integer -> int4 in postgres, int in ansi escapes should already be applied for identifiers and strings e.g. select * from "my "" table" this code should get: some_table_function "my \" table" (using haskell escaping) names must always be schema qualified This code does no validation on the identifier syntax. There are separate helper functions to do the case transform, canonicalize type names and to find the correct schema for a name given the schema search path. = supported objects schemas scalar types/ externally implemented types/built in types domain types composite types enums type catagories (used in implicit cast/matching resolution) prefix,postfix,binary operators special syntax operations (e.g. between, extract, position) casts functions aggregates windows tables constraints (used for row, table, domain constraints and assertions) defaults (column defaults) constraints and defaults are considered separate objects to tables so that they can be dropped as part of a cascade without dropping the table with a more straightforward implementation views sequences character sets collations roles (for ownership tracking only, permissions todo) = TODO understand routines, etc. in ansi + variadic, keyword args, function metadata/annotation-like stuff polymorphic functions and types and these in the main dialects understand aggregate and windows options summarize missing objects from sql server, oracle and db2 do the example syntax for all supported operations detail all the dependency stuff understand some of the questions like this: identifier, schema qualified identifier or string for e.g. character set do/can roles be in a schema describe the helper apis (apply escapes, fold case, find schema, etc.) make a list of all the query apis wanted, do the typeid understand composite types, anonymous, tuples, arrays, multisets understand what schema various built in objects should be in in different dialects make a detailed spec and start working on it write a catalog value consistency check function multiple updates? - can avoid hardcoding the public schema, etc. with this = not currently supported objects == ansi routines/procedures (where they differ from functions, I think the main thing missing is the function bodies and analysis based on these) transliteration (conversion between character sets?) trigger could use the dependency and some basic checking quite easily user defined ordering (is this to order composite types?) transform (conversion between sql representations and host language representations of values) grant (this is on the todo) missing from postgres: conversion database event trigger extensions foreign data wrapper foreign table index language materialized view operator stuff (for indexes?) policy (row level permissions) rule server tablespace text search stuff transform trigger user mapping sql server full list: aggregate application role assembly asymmetric key availabiltiy group broker priority certificate column encryption key column master key definition columnstore index contract credential cryptographic provider database (azure sql database) database (sql server) database audit specification database encryption key default endpoint event notification event session external data source external file format external table federation fulltext catalog fulltext index fulltext stoplist function index login master key message type partition function partition scheme procedure queue remote service binding resource pool role route rule schema search property list security policy selective xml index sequence server audit server audit specification server role service spatial index statistics symmetric key synonym table table (azure sql database) trigger type user view workload group xml index xml schema collection db2 full list: alias audit policy bufferpool database partition group event monitor event monitor (activities) event monitor (change history) event monitor (locking) event monitor (package cache) statement event monitor (statistics) event monitor (threshold violations) event monitor (unit of work) event monitor dbactivities function function (external scalar) function (external table) function (ole db external table) function (sourced or template) function (sql scalar, table, or row) function mapping global temporary table histogram template index index extension mask method module nickname permission permission csr_row_access on customer permission teller_row_access on customer procedure procedure (external) procedure (sourced) procedure (sql) role schema security label security label component security policy sequence sequence org_seq server service class stogroup synonym table tablespace threshold transform trigger trusted context type type (array) type (cursor) type (distinct) type (row) type (structurxxxxxxxxxxxxxxxxxxxxxxed) type mapping usage list user mapping variable view work action set work class set workload wrapper oracle full list: audit policy (unified auditing) cluster context controlfile database database link dimension directory diskgroup edition flashback archive function index indextype java library materialized view materialized view log materialized zonemap operator outline package package body pfile pluggable database procedure profile restore point role rollback segment schema sequence spfile synonym table tablespace trigger type type body user view what would be nice to add: triggers are pretty common indexes are pretty common synonyms: do a review of these in different dbmss what can we support synonyms for: not special syntax ops? not casts what about operator synonyms for functions and vice versa? defaults, constraints: doesn't make sense (even though constraints do have a name) everything else seems like fair game, including synonyms themselves partition stuff?? global temporary tables? distinct type: this is like a domain, but cannot be implicitly cast to the base type. Does this catalog need to know the difference between this and an external type? = implementation progress notes New catalog lower level goals: updates to the catalog are high level not low level error checking supports schemas and search paths properly object dependencies with cascade and restrict this catalog will be clear if an object is based on another object in the catalog (possibly with a bit of tweaking) (for instance, an alias to a type) or if the object is 'externally implemented' and its relationship with any possible base type is opaque How should the catalog support syntax? This is needed in: domain constraints table constraints view definitions also: function/routine bodies option 1: save them as strings option 2: parameterize on the scalarexpr and queryexpr? option 3: hardcode the scalarexp and queryexpr types the possible benefits to parameterized versions are: 1. don't have to reparse 2. this module doesn't depend on the syntax module 3. this module cannot do anything with the syntax internals later: add fuzzy matching to all relevant error messages TODO docs: how to use this catalog code: if you have no schemas if you don't have roles or permissions if you don't want to use the bootstrap items (public schema, ascii charset, binary collation) > {-# LANGUAGE DeriveDataTypeable #-} > module Database.HsSqlPpp.Internals.Catalog.CatalogNew > (CatalogUpdate(..) > ,Catalog(..) > ,Cascade(..) > ,CatError(..) > ,updateCatalog > ,emptyCatalog > ) where > import qualified Data.Map as M > import qualified Data.Set as S > import Data.Data > --data TypeId = ... a TypeId is one of: astraction of the syntax for type names (probably just cut and paste the syntax) one of these type names with nullablility information a special type (some are internal): polymorphic function parameters unknown type used to type check literals (and a few other things) in some dialects void, used to represent the return type of functions which don't return a value setof, used to represent the return type of table returning functions ... not sure about this tref type used in typechecking joins and a few other places (kind of an anonymous composite type) composite types: with names and order without names either single row or 0 or 1 rows, or 0-many rows what about tuple types? what about arrays and multisets? question: should the catalog manage the following: if a scale/precision and what kind is allowable/mandatory for a type? if a type can be used in various operations (e.g. does == exist for it, ordering?, can it be used in an array or multiset?) > -- | Represents an error from an invalid catalog update or query > data CatError = RoleAlreadyExists String > | RoleNotRecognised String > deriving (Eq,Show) > -- | A schema qualified name. These should also have the > -- canonicalization, case folding and escaping fixed > data SchemadName = SN String String > deriving (Eq,Show) > -- | The model of a catalog use in hsqlppp for validating ddl > -- updates and for typechecking sql. > data Catalog = Catalog > {cRoles :: S.Set String > ,cSchemas :: M.Map String CatSchema > ,cCharacterSets :: M.Map SchemadName CatCharacterSet > ,cCollations :: M.Map SchemadName CatCollation > ,cSequences :: M.Map SchemadName CatSequence > ,cTypeCategories :: S.Set String -- this is internal, used in the > -- implicit cast/function resolution > ,cExternalTypes :: S.Set SchemadName -- externally implemented/built in types > ,cCasts :: S.Set (SchemadName,SchemadName,CastContext) > ,cFunctions :: M.Map SchemadName [CatFunSig] > ,cPrefixOps :: M.Map SchemadName [CatFunSig] > ,cPostfixOps :: M.Map SchemadName [CatFunSig] > ,cBinaryOps :: M.Map SchemadName [CatFunSig] > ,cSpecialOps :: M.Map SchemadName [CatFunSig] > ,cWindows :: Int > ,cAggregates :: Int > ,cTables :: M.Map SchemadName [(String,SchemadName)] > ,cConstraints :: Int > ,cDefaults :: Int > ,cViews :: M.Map SchemadName [(String,SchemadName)] > ,cDomainTypes :: M.Map SchemadName SchemadName > ,cCompositeTypes :: M.Map SchemadName [(String,SchemadName)] > ,cEnums :: Int > -- dependencies? > } > deriving (Eq,Show) > data CastContext = ImplicitCastContext > | AssignmentCastContext > | ExplicitCastContext > deriving (Eq,Show,Ord,Typeable,Data) > -- | owner, default char set, default collation (if it > -- is nothing, it is the global collation for the charset > type CatSchema = (RoleName, String, Maybe String) > -- | Character set: name, owner, default collation > type CatCharacterSet = (RoleName, String) > -- | Collation: owner, character set > type CatCollation = (RoleName,String) > -- | sequence info: data type > -- this is used e.g. if we need to use an implicit > -- cast around a call to 'next value for'. > type CatSequence = SchemadName > type CatFunSig = (SchemadName,[SchemadName],SchemadName{-,Options-}) > --type FunOptions = > -- | A catalog value containing nothing > emptyCatalog :: Catalog > emptyCatalog = Catalog {cRoles = S.empty > ,cSchemas = M.empty > ,cCharacterSets = M.empty > ,cCollations = M.empty > ,cSequences = M.empty > ,cTypeCategories = S.empty > ,cExternalTypes = S.empty > ,cCasts = S.empty > ,cFunctions = M.empty > ,cPrefixOps = M.empty > ,cPostfixOps = M.empty > ,cBinaryOps = M.empty > ,cSpecialOps = M.empty > ,cWindows = 0 > ,cAggregates = 0 > ,cTables = M.empty > ,cConstraints = 0 > ,cDefaults = 0 > ,cViews = M.empty > ,cDomainTypes = M.empty > ,cCompositeTypes = M.empty > ,cEnums = 0 > } > updateCatalog :: CatalogUpdate -> Catalog -> Either CatError Catalog > updateCatalog (CreateRole r) c@(Catalog {cRoles = rs}) > | r `S.member` rs = Left $ RoleAlreadyExists r > | otherwise = Right $ c {cRoles = S.insert r rs} > updateCatalog (DropRole r _) c@(Catalog {cRoles = rs}) > | not (r `S.member` rs) = Left $ RoleNotRecognised r > | otherwise = Right $ c {cRoles = S.delete r rs} todo: cascade the rename > updateCatalog (RenameRole r rn) c@(Catalog {cRoles = rs}) > | not (r `S.member` rs) = Left $ RoleNotRecognised r > | rn `S.member` rs = Left $ RoleAlreadyExists rn > | otherwise = Right $ c {cRoles = S.insert rn $ S.delete r rs} > updateCatalog _ _ = error "update catalog not completed yet" > type RoleName = String used for dependencies: > --data ObjectType = ... now the dependencies can be tracked in one table > data CatalogUpdate = create schema _name_ [authorization _role_] [DEFAULT CHARACTER SET 'character set' [collate c]] character sets are strings and not identifiers atm roles are identifiers but don't exist in a schema or database (since this catalog only covers a single database at a time, they look like they exist in a database) not going to support path for schema right now (I think this is supposed to be for the bodies of routines in that schema, maybe it could also be for all lookups for schema elements (such as views) question: should there be a database default search path? one that can contain variables (e.g. a schema with the same name as the current user?) This would be the default for sessions. You cannot enter a schema, so the schema default search path works differently. question: I think in some dialects, character set names are identifiers or schema-d identifiers (and collations also), and in others they are strings. How should this be handled? todo: add object dependencies to the updates (from dependent to depended-upon in the dependent). Try to make these automatic as much as possible. list schemas create schema: schema already exists character set invalid collation not valid auth not valid later: more schema-based queries question: how does a minimal catalog start out? 1. has a schema called public 2. has a global default character set called ascii 3. has a collation called binary for that character set 4. has a role called public which owns the above In the tests, show an example of creating a catalog from this default without the public schema or role still existing and without the character set and collation > CreateSchema String (Maybe (SchemadName,SchemadName)) RoleName schema name, (default char set, default collation for that character set in that schema), owner > | RenameSchema String String > | SetSchemaOwner String RoleName > | SetSchemaDefaultCharacterSet (Maybe (SchemadName,SchemadName)) what happens when you change this default: existing columns are not changed (there could be some utility to change them all for consistency. This would mean storing a uses default flag on each relevent place also). > | DropSchema String Cascade = Roles roles are used for ownership tracking only at the moment create role my_role; drop role my_role; alter role my_role rename to my_new_role; dependencies: todo anomalies: role already exist role doesn't exist queries: list roles > | CreateRole RoleName > | DropRole RoleName Cascade > | RenameRole RoleName RoleName -- from, to = Collations and character sets This code doesn't track character set and collation aliases collations appear in separate namespaces according to their character set, so you can have different collations for different character sets with the same name this catalog code will create a collation called 'binary' for each new character set. If you don't want this collation, you can delete it after creating another collation for the new character set, or rename the collation. This is one way to avoid the catch-22 which can also work for sql users. create character set my_character_set create character set my_character_set as get another_character_set create character set my_character_set as 'some external source' create character set my_character_set as 'some external source' collate 'some other external source' create collation my_collation for my_character_set from some external source [set as default?] alter character set set default collation 'collation_name' drop character set drop collation alter character set my_set set schema my_schema alter collation my_coll set schema my_schema rename for both set owner for both question: does a character set have to have a collation? dependencies: todo queries: list character sets (returns owner also) list collations (returns the owner and character sets also) lookup collation/ is collation valid for character set get character set default collation anomalies: role no exist already exist doesn't exist maybe nice message when schema is ambiguous, when collation is in another schema? as well as fuzzy name matching > | CreateCharacterSet SchemadName RoleName > | CreateCollation SchemadName SchemadName RoleName --second is the character set this collation is for > > | DropCollation SchemadName SchemadName Cascade -- the first is the character set this collation is for > -- the front end can make this optional in the concrete > -- syntax iff > -- there is only > -- one collation > -- with this > -- name for > -- instance > | DropCharacterSet SchemadName Cascade > | RenameCharacterSet SchemadName SchemadName -- covers schema changes and name changes > | RenameCollation SchemadName SchemadName SchemadName -- char set name, collation name, new collation name > | SetCharacterSetOwner SchemadName RoleName > | SetCollationOwner SchemadName SchemadName RoleName -- char set name, collation name > | SetCharacterSetDefaultCollation SchemadName SchemadName = basic types TODO: understand basic syntax and semantics of types in ansi sql -- external type represents a type whose implementation is outside the -- system completely, either it is built in or implemented in C for -- instance -- external types can also be modelled using the other types such as -- domain, compose, enum if this matches their behaviour > {- | CreateExternalType SchemadName > | CreateDomainType SchemadName SchemadName -- name, base name > | CreateCompositeType > | CreateEnum + a type category for each of the above + a bool to say whether it is the preferred type for that category todo: how can these be changed? is there a reason to change them what about the ability to wipe the prefered type for a category only, then other operations can be sugar on this what about altering a type to set it as prefered or not preferred or to change its type category > | CreateTypeCategory > | AlterTypeCategory? > | DropTypeCategory > | DropType > | AlterType - rename > | AlterCompositeType - add, reorder or remove columns > | AlterEnum - add, reorder or remove enum values = scalar operations by convention, it supports keyword operations as prefix,postfix or binary if they match these, and if the operator is more than one keyword separated by space then the name in the catalog is the keywords separated by exactly one space each in lower case. Operators which don't fit this set of options (such as mixfix or ternary operators go in the SpecialOperators and are handled as special cases) prefix, postfix, binary operators > | CreatePrefixOp String String String String -- op schema, op name, > -- input type schema, input type name, output type schema,name > | CreatePostfixOp String String String String -- op schema, op name, > -- input type schema, input type name, output type schema,name > | CreateBinaryOp > | DropPrefixOp, PostfixOp, BinaryOp > | CreateSpecialOp -- should the list of names be a enum? > -- if it is freeform, then the catalog doesn't need to know as much > -- about the syntax, which is a good thing > | CreateCast > | DropCast > | AlterCast? - doesn't seem like there is a real point > | CreateFunction > | AlterFunction - rename, change some metadata? > -} > deriving (Eq,Show) aggregates windows tables views sequences character sets collations roles > data Cascade = Cascade | Restrict > deriving (Eq,Show) this e.g. is the return value when you want to get all the information on a type. some of this information is contained in the typeid syntax, and some is stored in the catalog or environment > --data TypeInfo = ... is scalar precision, scale, nullability char set, collation isdomain, domain base, domain constraint is composite, fields in the composite = dependencies, cascade, restrict roles schema character set collation sequence type category external types cast functions, operators, windows, aggregates tables constraints defaults views domain types composite types enums question: what about ownership of dependencies, and does changing owner ever cascade? (Maybe these are separate helper fns?) rough set of cascade flavours: 1. remove connection, leave depender existing 2. update connection to point to new value (for alter) 3. drop the depender object (recursively to its dependents) for alter, can use 1 or 2 for drop can use 2 or 3 policy it: make each version options on the updates, then a dialect or some other code can decide which behavior to use in order to track dependencies: create separate concept of constraints so we can handle table and domain constraints the same (plus assertions?) maybe have a concept of a sql body or something to model constraints, as well as view definitions and function bodies referencing things via sql statements syntax + column defaults things to add to this catalog: constraints defaults character set conversions? - if we want to type check string operations a bit better == schemas alter and drop schema can restrict or cascade to schema objects: all other catalog supported objects appear in a schema apart from roles rename schema just cascades automatically, there is no restrict option there can be a generic has dependencies or list dependencies function to implement a restrict like this set default character set will affect the character set of new: table columns, domains, views (in some obscure cases), composite types no existing objects will be changed to match the new characterset/collation they must be altered separately set schema owner will not cascade to objects in the schema, or will?? not sure, maybe just objects which are owned by a role with the same name as the schema? add this as an option? drop schema will not work if any objects owned by the schema on restrict it will drop them also if on cascade == role rename role cascades automatically, no options drop role will restrict if there are any objects owned cascade drop role will drop all owned items there can be a util to show all owned items and to reown them to another role easily later will have role membership in other roles to do shared ownership and group permissions, this will work slightly differently for drop: a dropped role will simply be removed from any roles it is in without affecting them: only option a dropped role will other roles in it: those roles will just be removed from this role. Restrict can be used to not allow dropping the role permissions for a being dropped role will always be dropped also == character set dependencies: table column domain schema - default char set database - default char set collations - belong to a single character set types: each character type can have a whitelist or blacklist of allowed character sets views will depend only indirectly composite types rename: always cascades alter default collation: doesn't affect the collation on any existing dependencies drop restrict: if character set is used anywhere above, restrict will stop the drop (apart from types blacklist or whitelist, this char set will be removed from these lists and the drop will succeed (if it has no other dependencies) drop cascade: will delete: tables(!) domains schema will always block?? database will definitely always block: you must set another default char set to drop the current default one collations types: same as restrict, types are not dropped composite types are dropped (could be very destructive!) == collation dependencies: character set default collation table and domain collation on char type, composite type also collation on default char set for database and schemas drop restrict: any use of collation will stop the drop drop cascade: will drop the character set (!) will drop tables, domains, and composites (!) will drop schema (!) - maybe these behaviours should be same as character sets? cannot drop database, so must restrict this == sequence dependencies: tables functions views domains? - could reference the sequence in the constraint? drop restrict: if the sequence is reference in any of the above, the drop is stopped drop cascade: views, functions dropped option table can be dropped (!) or the defaults/constraints dropped same with domain == type category dependencies: types a type can have the default category which means no category drop restrict: if used in any types, cannot drop drop cascade: option 1: types categories changed to default option 2: types are dropped == external types dependencies: domains functions tables views composite types restrict: if used anywhere, drop prevented cascade: all dependents dropped == cast dependencies: can be used in constraints or views restrict: drop stopped cascade, drops constraint and views also == functions, operators, windows, aggregates each function can have a set of dependencies supplied by the calling code this represents e.g. the tables, functions, etc. referenced in the body of the function. This is to help with cascade/restrict behaviour. what can a function depend on? what can't a function directly depend on: itself type category that's it. Potentially anything else. Unusual (direct) dependencies would be: schema, roles, constraints, defaults. == tables == constraints == defaults == views == domain types == composite types == enums