Copyright 2009 Jake Wheat

This file contains a bunch of small low level utilities to help with
type checking.

random implementation note:
If you see one of these: TypeList [] - and don't get it - is used to
represent a variety of different things, like node type checked ok
when the node doesn't produce a type but can produce a type error,
etc.. This is just a hack that will be changed soon.

> module Database.HsSqlPpp.AstUtils where

> import Data.Maybe
> import Data.List
> import Control.Monad.Error

> import Database.HsSqlPpp.TypeType
> import Database.HsSqlPpp.Scope
> import Database.HsSqlPpp.DefaultScope

================================================================================

= getOperatorType

used by the pretty printer, not sure this is a very good design

for now, assume that all the overloaded operators that have the
same name are all either binary, prefix or postfix, otherwise the
getoperatortype would need the types of the arguments to determine
the operator type, and the parser would have to be a lot cleverer

this is why binary @ operator isn't currently supported

> data OperatorType = BinaryOp | PrefixOp | PostfixOp
>                   deriving (Eq,Show)

> getOperatorType :: String -> OperatorType
> getOperatorType s = case () of
>                       _ | any (\(x,_,_) -> x == s) (scopeBinaryOperators defaultScope) ->
>                             BinaryOp
>                         | any (\(x,_,_) -> x == s ||
>                                            (x=="-" && s=="u-"))
>                               (scopePrefixOperators defaultScope) ->
>                             PrefixOp
>                         | any (\(x,_,_) -> x == s) (scopePostfixOperators defaultScope) ->
>                             PostfixOp
>                         | s `elem` ["!and", "!or","!like"] -> BinaryOp
>                         | s `elem` ["!not"] -> PrefixOp
>                         | s `elem` ["!isNull", "!isNotNull"] -> PostfixOp
>                         | otherwise ->
>                             error $ "don't know flavour of operator " ++ s

================================================================================

= type checking utils

The way most of the error handling works is all type errors live as
types and get returned as types, instead of e.g. using exceptions or
Either. All the type checking routines for each node should be in the
following form, or equivalent:

1. Take the node types of the sub nodes (e.g. the args for a function
invocation).

2. Check each type for a type error, pass any errors on as the type for
this node - so type errors filter up to the top level statements in
this way.

3. Check each type for unknown - this represents some type checking
which hasn't been coded. All the type calculations give up completely
when they come across an unknown type and just propagate unknown.

4a. If the node has no nodeType rule, it uses the default one which
just propagates the types of its subnodes. This sometimes results in
them having an uncannily accurate type, and sometimes a really weird
type, rather than returning unknown.

4b. If the node has a nodeType rule, calculate the actual type we want
for this node using the subnode types which we've already checked for
unknowns and errors. Any new type errors become this node's type, or
otherwise it is set to the calculated type.

Using laziness, in code this usually looks something like

thisNodeType =  checkErrors [@subnodea.nodeType
                            ,@subnodeb.nodetype] calcThisNodeType
where
  calcThisNodeType = some code referring to @subnodea.nodeType
                       and @subnodeb.nodetype

The checkErrors function, and its auxiliary unkErr, does the error and
unknown propagation.

Once the annotations are done properly, type errors won't need to rise
up the tree like this.

== checkErrors

runs through the types in the first list looking for type errors or
unknowns, if it finds any, return them, else return the second
argument. See unkErr below for exactly how it finds errors and
unknowns. It will only return errors from the first type containing
errors, which might need looking at when the focus is on good error
messages.

> checkErrors :: [Type] -> Type -> Type
> checkErrors (t:ts) r = case unkErr t of
>                        Just e -> e
>                        Nothing -> checkErrors ts r
> checkErrors [] r = r

takes a type and returns any type errors, or if no errors, unknowns,
returns nothing if it doesn't find any type errors or unknowns. Looks
at the immediate type, or inside the first level if passed a type
list or unnamedcompositetype.

> unkErr :: Type -> Maybe Type
> unkErr t =
>     case t of
>       a@(TypeError _ _) -> Just a
>       UnknownType -> Just UnknownType
>       TypeList l -> doTypeList l
>       UnnamedCompositeType c -> doTypeList (map snd c)
>       _ -> Nothing
>     where
>       -- run through the type list, if there are any errors, collect
>       -- them all into a list
>       -- otherwise, if there are any unknowns, then the type is
>       -- unknown
>       -- otherwise, return nothing
>       doTypeList ts =
>           let unks = filter (\u -> case u of
>                                      UnknownType -> True
>                                      _ -> False) ts
>               errs = filter (\u -> case u of
>                                      TypeError _ _ -> True
>                                      _ -> False) ts
>           in case () of
>                _ | length errs > 0 ->
>                      Just $ case () of
>                                     _ | length errs == 1 -> head errs
>                                       | otherwise -> TypeList errs
>                  | length unks > 0 -> Just UnknownType
>                  | otherwise -> Nothing

================================================================================

= basic types

random notes on pg types:

== domains:
the point of domains is you can't put constraints on types, but you
can wrap a type up in a domain and put a constraint on it there.

== literals/selectors

source strings are parsed as unknown type: they can be implicitly cast
to almost any type in the right context.

rows ctors can also be implicitly cast to any composite type matching
the elements (now sure how exactly are they matched? - number of
elements, type compatibility of elements, just by context?).

string literals are not checked for valid syntax currently, but this
will probably change so we can type check string literals statically.
Postgres defers all checking to runtime, because it has to cope with
custom data types. This code will allow adding a grammar checker for
each type so you can optionally check the string lits statically.

== notes on type checking types

=== basic type checking
Currently can lookup type names against a default template1 list of
types, or against the current list in a database (which is read before
processing and sql code).

todo: collect type names from current source file to check against
A lot of the infrastructure to do this is already in place. We also
need to do this for all other definitions, etc.

=== Type aliases

Some types in postgresql have multiple names. I think this is
hardcoded in the pg parser.

For the canonical name in this code, we use the name given in the
postgresql pg_type catalog relvar.

TODO: Change the ast canonical names: where there is a choice, prefer
the sql standard name, where there are multiple sql standard names,
choose the most concise or common one, so the ast will use different
canonical names to postgresql.

planned ast canonical names:
numbers:
int2, int4/integer, int8 -> smallint, int, bigint
numeric, decimal -> numeric
float(1) to float(24), real -> float(24)
float, float(25) to float(53), double precision -> float
serial, serial4 -> int
bigserial, serial8 -> bigint
character varying(n), varchar(n)-> varchar(n)
character(n), char(n) -> char(n)

TODO:

what about PrecTypeName? - need to fix the ast and parser (found out
these are called type modifiers in pg)

also, what can setof be applied to - don't know if it can apply to an
array or setof type

array types have to match an exact array type in the catalog, so we
can't create an arbitrary array of any type. Not sure if this is
handled quite correctly in this code.

=== canonical type name support

Introduce some aliases to protect client code if/when the ast
canonical names are changed:

> typeSmallInt,typeBigInt,typeInt,typeNumeric,typeFloat4,
>   typeFloat8,typeVarChar,typeChar,typeBool :: Type
> typeSmallInt = ScalarType "int2"
> typeBigInt = ScalarType "int8"
> typeInt = ScalarType "int4"
> typeNumeric = ScalarType "numeric"
> typeFloat4 = ScalarType "float4"
> typeFloat8 = ScalarType "float8"
> typeVarChar = ScalarType "varchar"
> typeChar = ScalarType "char"
> typeBool = ScalarType "bool"

this converts the name of a type to its canonical name

> canonicalizeTypeName :: String -> String
> canonicalizeTypeName s =
>   case () of
>                   _ | s `elem` smallIntNames -> "int2"
>                     | s `elem` intNames -> "int4"
>                     | s `elem` bigIntNames -> "int8"
>                     | s `elem` numericNames -> "numeric"
>                     | s `elem` float4Names -> "float4"
>                     | s `elem` float8Names -> "float8"
>                     | s `elem` varcharNames -> "varchar"
>                     | s `elem` charNames -> "char"
>                     | s `elem` boolNames -> "bool"
>                     | otherwise -> s
>   where
>       smallIntNames = ["int2", "smallint"]
>       intNames = ["int4", "integer", "int"]
>       bigIntNames = ["int8", "bigint"]
>       numericNames = ["numeric", "decimal"]
>       float4Names = ["real", "float4"]
>       float8Names = ["double precision", "float"]
>       varcharNames = ["character varying", "varchar"]
>       charNames = ["character", "char"]
>       boolNames = ["boolean", "bool"]

> checkTypeExists :: Scope -> MySourcePos -> Type -> Type
> checkTypeExists scope sp t =
>     if t `elem` scopeTypes scope
>       then TypeList [] -- this works with the checkErrors function
>       else TypeError sp (UnknownTypeError t)

> lookupTypeByName :: Scope -> MySourcePos -> String -> Type
> lookupTypeByName scope sp name =
>     case lookup name (scopeTypeNames scope) of
>       Just t -> t
>       Nothing -> TypeError sp (UnknownTypeName name)


================================================================================

Internal errors

TODO: work out monad transformers and try to use these. Want to throw
an internal error when a programming error is detected (instead of
e.g. letting the haskell runtime throw a pattern match failure), then
catch it in the top level type check routines in ast.ag, convert it to
a regular either style error, all without dropping into IO.

This isn't used at the moment.

> data TInternalError = TInternalError String
>                      deriving (Eq, Ord, Show)

> instance Error TInternalError where
>    noMsg  = TInternalError "oh noes!"
>    strMsg = TInternalError

================================================================================

types for the keyword operators, for use in findCallMatch. Not sure
where these should live but probably not here.

> keywordOperatorTypes :: [(String,[Type],Type)]
> keywordOperatorTypes = [
>   ("!and", [typeBool, typeBool], typeBool)
>  ,("!or", [typeBool, typeBool], typeBool)
>  ,("!like", [ScalarType "text", ScalarType "text"], typeBool)
>  ,("!not", [typeBool], typeBool)
>  ,("!isNull", [Pseudo AnyElement], typeBool)
>  ,("!isNotNull", [Pseudo AnyElement], typeBool)
>  ,("!arrayCtor", [Pseudo AnyElement], Pseudo AnyArray) -- not quite right,
>                                         -- needs variadic support
>                                         -- currently works via a special case
>                                         -- in the type checking code
>  ,("!between", [Pseudo AnyElement
>               ,Pseudo AnyElement
>               ,Pseudo AnyElement], Pseudo AnyElement)
>  ,("!substring", [ScalarType "text",typeInt,typeInt], ScalarType "text")
>  ,("!arraySub", [Pseudo AnyArray,typeInt], Pseudo AnyElement)
>  ]

special functions, stuck in here at random also

> specialFunctionTypes :: [(String,[Type],Type)]
> specialFunctionTypes = [
>   ("coalesce", [Pseudo AnyElement],
>     Pseudo AnyElement) -- needs variadic support to be correct,
>                        -- uses special case in type checking
>                        -- currently
>  ,("nullif", [Pseudo AnyElement, Pseudo AnyElement], Pseudo AnyElement)
>  ,("greatest", [Pseudo AnyElement], Pseudo AnyElement) --variadic, blah
>  ,("least", [Pseudo AnyElement], Pseudo AnyElement) --also
>  ]


================================================================================

utilities for working with Types

> isArrayType :: Type -> Bool
> isArrayType (ArrayType _) = True
> isArrayType _ = False

> unwrapTypeList :: Type -> [Type]
> unwrapTypeList (TypeList ts) = ts
> unwrapTypeList x = error $ "internal error: can't get types from list " ++ show x

> unwrapArray :: Type -> Type
> unwrapArray (ArrayType t) = t
> unwrapArray x = error $ "internal error: can't get types from non array " ++ show x

> unwrapSetOfComposite :: Type -> Type
> unwrapSetOfComposite (SetOfType a@(UnnamedCompositeType _)) = a
> unwrapSetOfComposite x = error $ "internal error: tried to unwrapSetOfComposite on " ++ show x

> unwrapSetOf :: Type -> Type
> unwrapSetOf (SetOfType a) = a
> unwrapSetOf x = error $ "internal error: tried to unwrapSetOf on " ++ show x

> unwrapComposite :: Type -> [(String,Type)]
> unwrapComposite (UnnamedCompositeType a) = a
> unwrapComposite x = error $ "internal error: cannot unwrapComposite on " ++ show x

> consComposite :: (String,Type) -> Type -> Type
> consComposite l (UnnamedCompositeType a) =
>     UnnamedCompositeType (l:a)
> consComposite a b = error $ "internal error: called consComposite on " ++ show (a,b)

> unwrapRowCtor :: Type -> [Type]
> unwrapRowCtor (RowCtor a) = a
> unwrapRowCtor x = error $ "internal error: cannot unwrapRowCtor on " ++ show x

================================================================================

message stuff, used by the continue in loop checking, will be
repurposed once the type checking is complete and lint-style checking
is introduced.

> data Message = Error MySourcePos MessageStuff
>              | Warning MySourcePos MessageStuff
>              | Notice MySourcePos MessageStuff
>                deriving (Eq)
>
> data MessageStuff = ContinueNotInLoop
>                   | CustomMessage String
>                     deriving (Eq,Show)
>
> instance Show Message where
>    show = showMessage
>
> showMessage :: Message -> String
> showMessage m = case m of
>                   Error sp s -> showit "Error" sp s
>                   Warning sp s -> showit "Warning" sp s
>                   Notice sp s -> showit "Notice" sp s
>                 where
>                   showit lev (fn,l,c) s = lev ++ "\n" ++ fn ++ ":"
>                                           ++ show l ++ ":" ++ show c ++ ":\n"
>                                           ++ show s ++ "\n"
>