Copyright | (c) Nikos Karagiannidis 2018 |
---|---|
License | BSD3 |
Maintainer | nkarag@gmail.com |
Stability | stable |
Portability | POSIX |
Safe Haskell | None |
Language | Haskell2010 |
This is the core module that implements the relational Table concept with the RTable
data type.
It defines all necessary data types like RTable
and RTuple
as well as all the basic relational algebra operations (selection -i.e., filter-
, projection, inner/outer join, aggregation, grouping etc.) on RTable
s.
When to use this module
This module should be used whenever one has "tabular data" (e.g., some CSV files, or any type of data that can be an instance of the RTabular
type class and thus define the toRTable
and fromRTable
functions) and wants to analyze them in-memory with the well-known relational algebra operations
(selection, projection, join, groupby, aggregations etc) that lie behind SQL.
This data analysis takes place within your haskell code, without the need to import the data into a database (database-less
data processing) and the result can be turned into the original format (e.g., CSV) with a simple call to the fromRTable
function.
RTable.Core gives you an interface for all common relational algebra operations, which are expressed as functions over
the basic RTable
data type. Of course, since each relational algebra operation is a function that returns a new RTable (immutability), one
can compose these operations and thus express an arbitrary complex query. Immutability also holds for DML operations also (e.g., updateRTab
). This
means that any update on an RTable operates like a CREATE AS SELECT
statement in SQL, creating a new RTable
and not modifying an existing one.
Note that the recommended method in order to perform data analysis via relational algebra operations is to use the type-level Embedded Domain Specific Language
(EDSL) Julius, defined in module Etl.Julius, which exports the RTable.Core module. This provides a standard way of expressing queries and is
simpler for expressing more complex queries (with many relational algebra operations). Moreover it supports intermediate results (i.e., subqueries). Finally,
if you need to implement some ETL/ELT data flows, that will use the relational operations defined in RTable.Core to analyze data but also
to combine them with various Column Mappings (RColMapping
), in order to achieve various data transformations, then Julius is the appropriate tool for this job.
See this Julius Tutorial
Overview
An RTable
is logically a container of RTuple
s (similar to the concept of a Relation being a set of Tuples) and is the core data type in this
module. The RTuple
is a map of (Column-Name, Column-Value) pairs. A Column-Name is modeled with the ColumnName
data type, while the
Column-Value is modelled with the RDataType
, which is a wrapper over the most common data types that one would expect to find in a column
of a Table (e.g., integers, rational numbers, strings, dates etc.).
We said that the RTable
is a container of RTuple
s and thus the RTable
is a Monad
! So one can write monadic code to implement RTable operations. For example:
-- | Return an new RTable after modifying each RTuple of the input RTable. myRTableOperation :: RTable -> RTable myRTableOperation rtab = do rtup <- rtab let new_rtup = doStuff rtup return new_rtup where doStuff :: RTuple -> RTuple doStuff = ... -- to be defined
Many different types of data can be turned into an RTable
. For example, CSV data can be easily turn into an RTable
via the toRTable
function. Many other types of data
could be represented as "tabular data" via the RTable
data type, as long as they adhere to the interface posed by the RTabular
type class. In other words, any data type
that we want to convert into an RTable and vice-versa, must become an instance of the RTabular
type class and thus define the basic toRTable
and fromRTable
functions.
An Example
In this example we read a CSV file with the use of the readCSV
function from the RTable.Data.CSV module. Then, with the use of the toRTable
function, implemented in the
RTabular
instance of the CSV
data type, we convert the CSV file into an RTable
. The data of the CSV file consist of metadata from an imaginary Oracle database and each
row represents an entry for a table stored in this database, with information (i.e., columns) pertaining to the owner of the table, the tablespace name, the status of the table
and various statistics, such as the number of rows and number of blocks.
In this example, we apply three "transformations" to the input data and we print the result after each one, with the use of the printfRTable
function. The transfomrations
are:
- a
limit
operation, where we return the first N number ofRTuple
s, - an
RFilter
operation that returns only the tables that start with a 'B', followed by a projection operation (RPrj
) - an inner-join (
RInJoin
), where we pair theRTuple
s from the previous results based on a join predicate (RJoinPredicate
): the tables that have been analyzed the same day
Finally, we store the results of the 2nd operation into a new CSV file, with the use of the fromRTable
function implemented for the RTabular
instance of the CSV
data type.
import RTable.Core import RTable.Data.CSV (CSV, readCSV, toRTable) import Data.Text as T (take, pack) -- This is the input source table metadata src_DBTab_MData :: RTableMData src_DBTab_MData = createRTableMData ( "sourceTab" -- table name ,[ ("OWNER", Varchar) -- Owner of the table ,("TABLE_NAME", Varchar) -- Name of the table ,("TABLESPACE_NAME", Varchar) -- Tablespace name ,("STATUS",Varchar) -- Status of the table object (VALID/IVALID) ,("NUM_ROWS", Integer) -- Number of rows in the table ,("BLOCKS", Integer) -- Number of Blocks allocated for this table ,("LAST_ANALYZED", Timestamp "MMDDYYYY HH24:MI:SS") -- Timestamp of the last time the table was analyzed (i.e., gathered statistics) ] ) ["OWNER", "TABLE_NAME"] -- primary key [] -- (alternative) unique keys -- Result RTable metadata result_tab_MData :: RTableMData result_tab_MData = createRTableMData ( "resultTab" -- table name ,[ ("OWNER", Varchar) -- Owner of the table ,("TABLE_NAME", Varchar) -- Name of the table ,("LAST_ANALYZED", Timestamp "MMDDYYYY HH24:MI:SS") -- Timestamp of the last time the table was analyzed (i.e., gathered statistics) ] ) ["OWNER", "TABLE_NAME"] -- primary key [] -- (alternative) unique keys main :: IO() main = do -- read source csv file srcCSV <- readCSV "./app/test-data.csv" putStrLn "\nHow many rows you want to print from the source table? :\n" n <- readLn :: IO Int -- RTable A printfRTable ( -- define the order by which the columns will appear on screen. Use the default column formatting. genRTupleFormat ["OWNER", "TABLE_NAME", "TABLESPACE_NAME", "STATUS", "NUM_ROWS", "BLOCKS", "LAST_ANALYZED"] genDefaultColFormatMap) $ limit n $ toRTable src_DBTab_MData srcCSV putStrLn "\nThese are the tables that start with a "B":\n" -- RTable B printfRTable ( genRTupleFormat ["OWNER", "TABLE_NAME","LAST_ANALYZED"] genDefaultColFormatMap) $ tabs_start_with_B $ toRTable src_DBTab_MData srcCSV putStrLn "\nThese are the tables that were analyzed the same day:\n" -- RTable C = A InnerJoin B printfRTable ( genRTupleFormat ["OWNER", "TABLE_NAME", "LAST_ANALYZED", "OWNER_1", "TABLE_NAME_1", "LAST_ANALYZED_1"] genDefaultColFormatMap) $ ropB myJoin (limit n $ toRTable src_DBTab_MData srcCSV) (tabs_start_with_B $ toRTable src_DBTab_MData srcCSV) -- save result of 2nd operation to CSV file writeCSV ".appresult-data.csv" $ fromRTable result_tab_MData $ tabs_start_with_B $ toRTable src_DBTab_MData srcCSV where -- Return RTuples with a table_name starting with aB
tabs_start_with_B :: RTable -> RTable tabs_start_with_B rtab = (ropU myProjection) . (ropU myFilter) $ rtab where -- Create a Filter Operation to return only RTuples with table_name starting with aB
myFilter = RFilter ( t -> let tbname = case toText (t <!> "TABLE_NAME") of Just t -> t Nothing -> pack "" in (T.take 1 tbname) == (pack "B") ) -- Create a Projection Operation that projects only two columns myProjection = RPrj ["OWNER", "TABLE_NAME", "LAST_ANALYZED"] -- Create an Inner Join for tables analyzed in the same day myJoin :: ROperation myJoin = RInJoin ( t1 t2 -> let RTime {rtime = RTimestampVal {year = y1, month = m1, day = d1, hours24 = hh1, minutes = mm1, seconds = ss1}} = t1<!>"LAST_ANALYZED" RTime {rtime = RTimestampVal {year = y2, month = m2, day = d2, hours24 = hh2, minutes = mm2, seconds = ss2}} = t2<!>"LAST_ANALYZED" in y1 == y2 && m1 == m2 && d1 == d2 )
And here is the output:
:l .srcRTable/example.hs :set -XOverloadedStrings main
How many rows you want to print from the source table? : 10 --------------------------------------------------------------------------------------------------------------------------------- OWNER TABLE_NAME TABLESPACE_NAME STATUS NUM_ROWS BLOCKS LAST_ANALYZED ~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~~~ ~~~~~~ ~~~~~~~~ ~~~~~~ ~~~~~~~~~~~~~ APEX_030200 SYS_IOT_OVER_71833 SYSAUX VALID 0 0 06082012 16:22:36 APEX_030200 WWV_COLUMN_EXCEPTIONS SYSAUX VALID 3 3 06082012 16:22:33 APEX_030200 WWV_FLOWS SYSAUX VALID 10 3 06082012 22:01:21 APEX_030200 WWV_FLOWS_RESERVED SYSAUX VALID 0 0 06082012 16:22:33 APEX_030200 WWV_FLOW_ACTIVITY_LOG1$ SYSAUX VALID 1 29 07202012 19:07:57 APEX_030200 WWV_FLOW_ACTIVITY_LOG2$ SYSAUX VALID 14 29 07202012 19:07:57 APEX_030200 WWV_FLOW_ACTIVITY_LOG_NUMBER$ SYSAUX VALID 1 3 07202012 19:08:00 APEX_030200 WWV_FLOW_ALTERNATE_CONFIG SYSAUX VALID 0 0 06082012 16:22:33 APEX_030200 WWV_FLOW_ALT_CONFIG_DETAIL SYSAUX VALID 0 0 06082012 16:22:33 APEX_030200 WWV_FLOW_ALT_CONFIG_PICK SYSAUX VALID 37 3 06082012 16:22:33 10 rows returned --------------------------------------------------------------------------------------------------------------------------------- These are the tables that start with a B: ------------------------------------------------------------- OWNER TABLE_NAME LAST_ANALYZED ~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~ DBSNMP BSLN_BASELINES 04152018 16:14:51 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 DBSNMP BSLN_STATISTICS 04152018 17:41:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 SYS BOOTSTRAP$ 04142014 13:53:43 5 rows returned ------------------------------------------------------------- These are the tables that were analyzed the same day: ------------------------------------------------------------------------------------------------------------------------------------- OWNER TABLE_NAME LAST_ANALYZED OWNER_1 TABLE_NAME_1 LAST_ANALYZED_1 ~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~~ APEX_030200 SYS_IOT_OVER_71833 06082012 16:22:36 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 SYS_IOT_OVER_71833 06082012 16:22:36 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_COLUMN_EXCEPTIONS 06082012 16:22:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_COLUMN_EXCEPTIONS 06082012 16:22:33 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_FLOWS 06082012 22:01:21 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_FLOWS 06082012 22:01:21 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_FLOWS_RESERVED 06082012 16:22:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_FLOWS_RESERVED 06082012 16:22:33 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALTERNATE_CONFIG 06082012 16:22:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALTERNATE_CONFIG 06082012 16:22:33 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALT_CONFIG_DETAIL 06082012 16:22:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALT_CONFIG_DETAIL 06082012 16:22:33 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALT_CONFIG_PICK 06082012 16:22:33 DBSNMP BSLN_THRESHOLD_PARAMS 06082012 16:06:41 APEX_030200 WWV_FLOW_ALT_CONFIG_PICK 06082012 16:22:33 DBSNMP BSLN_METRIC_DEFAULTS 06082012 16:06:41 14 rows returned -------------------------------------------------------------------------------------------------------------------------------------
Check the output CSV file
$ head ./app/result-data.csv OWNER,TABLE_NAME,LAST_ANALYZED DBSNMP,BSLN_BASELINES,04152018 16:14:51 DBSNMP,BSLN_METRIC_DEFAULTS,06082012 16:06:41 DBSNMP,BSLN_STATISTICS,04152018 17:41:33 DBSNMP,BSLN_THRESHOLD_PARAMS,06082012 16:06:41 SYS,BOOTSTRAP$,04142014 13:53:43
Synopsis
- type RTable = Vector RTuple
- type RTuple = HashMap ColumnName RDataType
- data RDataType
- data RTimestamp = RTimestampVal {}
- data RTableMData = RTableMData {
- rtname :: RTableName
- rtuplemdata :: RTupleMData
- pkColumns :: [ColumnName]
- uniqueKeys :: [[ColumnName]]
- type RTupleMData = (HashMap ColumnOrder ColumnName, HashMap ColumnName ColumnInfo)
- data ColumnInfo = ColumnInfo {
- name :: ColumnName
- dtype :: ColumnDType
- type Name = Text
- type ColumnName = Name
- type RTableName = Name
- data ColumnDType
- class RTabular a where
- data ROperation
- = ROperationEmpty
- | RUnion
- | RInter
- | RDiff
- | RPrj {
- colPrjList :: [ColumnName]
- | RFilter {
- fpred :: RPredicate
- | RInJoin { }
- | RLeftJoin { }
- | RRightJoin { }
- | RAggregate {
- aggList :: [RAggOperation]
- | RGroupBy {
- gpred :: RGroupPredicate
- aggList :: [RAggOperation]
- colGrByList :: [ColumnName]
- | RCombinedOp { }
- | RBinOp { }
- | ROrderBy {
- colOrdList :: [(ColumnName, OrderingSpec)]
- type UnaryRTableOperation = RTable -> RTable
- type BinaryRTableOperation = RTable -> RTable -> RTable
- data RAggOperation = RAggOperation {
- sourceCol :: ColumnName
- targetCol :: ColumnName
- aggFunc :: RTable -> RTuple
- type AggFunction = ColumnName -> RTable -> RDataType
- raggGenericAgg :: AggFunction -> ColumnName -> ColumnName -> RAggOperation
- raggSum :: ColumnName -> ColumnName -> RAggOperation
- raggCount :: ColumnName -> ColumnName -> RAggOperation
- raggAvg :: ColumnName -> ColumnName -> RAggOperation
- raggMax :: ColumnName -> ColumnName -> RAggOperation
- raggMin :: ColumnName -> ColumnName -> RAggOperation
- type RPredicate = RTuple -> Bool
- type RGroupPredicate = RTuple -> RTuple -> Bool
- type RJoinPredicate = RTuple -> RTuple -> Bool
- runUnaryROperation :: ROperation -> RTable -> RTable
- ropU :: ROperation -> RTable -> RTable
- runUnaryROperationRes :: ROperation -> RTable -> RTabResult
- ropUres :: ROperation -> RTable -> RTabResult
- runBinaryROperation :: ROperation -> RTable -> RTable -> RTable
- ropB :: ROperation -> RTable -> RTable -> RTable
- runBinaryROperationRes :: ROperation -> RTable -> RTable -> RTabResult
- ropBres :: ROperation -> RTable -> RTable -> RTabResult
- type RTuplesRet = Sum Int
- type RTabResult = Writer RTuplesRet RTable
- rtabResult :: (RTable, RTuplesRet) -> RTabResult
- runRTabResult :: RTabResult -> (RTable, RTuplesRet)
- execRTabResult :: RTabResult -> RTuplesRet
- rtuplesRet :: Int -> RTuplesRet
- getRTuplesRet :: RTuplesRet -> Int
- (.) :: (b -> c) -> (a -> b) -> a -> c
- (<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c
- runRfilter :: RPredicate -> RTable -> RTable
- f :: RPredicate -> RTable -> RTable
- runInnerJoinO :: RJoinPredicate -> RTable -> RTable -> RTable
- iJ :: RJoinPredicate -> RTable -> RTable -> RTable
- runLeftJoin :: RJoinPredicate -> RTable -> RTable -> RTable
- lJ :: RJoinPredicate -> RTable -> RTable -> RTable
- runRightJoin :: RJoinPredicate -> RTable -> RTable -> RTable
- rJ :: RJoinPredicate -> RTable -> RTable -> RTable
- runFullOuterJoin :: RJoinPredicate -> RTable -> RTable -> RTable
- foJ :: RJoinPredicate -> RTable -> RTable -> RTable
- joinRTuples :: RTuple -> RTuple -> RTuple
- runUnion :: RTable -> RTable -> RTable
- u :: RTable -> RTable -> RTable
- runIntersect :: RTable -> RTable -> RTable
- i :: RTable -> RTable -> RTable
- runDiff :: RTable -> RTable -> RTable
- d :: RTable -> RTable -> RTable
- runProjection :: [ColumnName] -> RTable -> RTable
- runProjectionMissedHits :: [ColumnName] -> RTable -> RTable
- p :: [ColumnName] -> RTable -> RTable
- runAggregation :: [RAggOperation] -> RTable -> RTable
- rAgg :: [RAggOperation] -> RTable -> RTable
- runGroupBy :: RGroupPredicate -> [RAggOperation] -> [ColumnName] -> RTable -> RTable
- rG :: RGroupPredicate -> [RAggOperation] -> [ColumnName] -> RTable -> RTable
- groupNoAggList :: RGroupPredicate -> [ColumnName] -> RTable -> [RTable]
- groupNoAgg :: RGroupPredicate -> [ColumnName] -> RTable -> RTable
- runOrderBy :: [(ColumnName, OrderingSpec)] -> RTable -> RTable
- rO :: [(ColumnName, OrderingSpec)] -> RTable -> RTable
- runCombinedROp :: (RTable -> RTable) -> RTable -> RTable
- rComb :: (RTable -> RTable) -> RTable -> RTable
- data IgnoreDefault
- decodeRTable :: ColumnName -> RDataType -> RDataType -> RDataType -> IgnoreDefault -> RTable -> RTable
- decodeColValue :: ColumnName -> RDataType -> RDataType -> RDataType -> IgnoreDefault -> RTuple -> RDataType
- toRTimestamp :: String -> String -> RTimestamp
- createRTimestamp :: String -> String -> RTimestamp
- rTimestampToRText :: String -> RTimestamp -> RDataType
- stdTimestampFormat :: [Char]
- stdDateFormat :: [Char]
- instrRText :: RDataType -> RDataType -> Maybe Int
- instr :: Eq a => [a] -> [a] -> Maybe Int
- instrText :: Text -> Text -> Maybe Int
- rdtappend :: RDataType -> RDataType -> RDataType
- stripRText :: RDataType -> RDataType
- removeCharAroundRText :: Char -> RDataType -> RDataType
- isText :: RDataType -> Bool
- nvlRTable :: ColumnName -> RDataType -> RTable -> RTable
- nvlRTuple :: ColumnName -> RDataType -> RTuple -> RTuple
- isNullRTuple :: RTuple -> Bool
- isNull :: RDataType -> Bool
- isNotNull :: RDataType -> Bool
- nvl :: RDataType -> RDataType -> RDataType
- nvlColValue :: ColumnName -> RDataType -> RTuple -> RDataType
- isRTabEmpty :: RTable -> Bool
- headRTup :: RTable -> RTuple
- limit :: Int -> RTable -> RTable
- isRTupEmpty :: RTuple -> Bool
- getRTupColValue :: ColumnName -> RTuple -> RDataType
- rtupLookup :: ColumnName -> RTuple -> Maybe RDataType
- rtupLookupDefault :: RDataType -> ColumnName -> RTuple -> RDataType
- (<!>) :: RTuple -> ColumnName -> RDataType
- (<!!>) :: RTuple -> ColumnName -> Maybe RDataType
- rtableToList :: RTable -> [RTuple]
- concatRTab :: [RTable] -> RTable
- rtupleToList :: RTuple -> [(ColumnName, RDataType)]
- toListRDataType :: RTupleMData -> RTuple -> [RDataType]
- toText :: RDataType -> Maybe Text
- fromText :: Text -> RDataType
- rtabMap :: (RTuple -> RTuple) -> RTable -> RTable
- rtabFoldr' :: (RTuple -> RTable -> RTable) -> RTable -> RTable -> RTable
- rtabFoldl' :: (RTable -> RTuple -> RTable) -> RTable -> RTable -> RTable
- rdatatypeFoldr' :: (RTuple -> RDataType -> RDataType) -> RDataType -> RTable -> RDataType
- rdatatypeFoldl' :: (RDataType -> RTuple -> RDataType) -> RDataType -> RTable -> RDataType
- insertAppendRTab :: RTuple -> RTable -> RTable
- insertPrependRTab :: RTuple -> RTable -> RTable
- updateRTab :: [(ColumnName, RDataType)] -> RPredicate -> RTable -> RTable
- upsertRTuple :: ColumnName -> RDataType -> RTuple -> RTuple
- emptyRTable :: RTable
- createSingletonRTable :: RTuple -> RTable
- rtableFromList :: [RTuple] -> RTable
- addColumn :: ColumnName -> RDataType -> RTable -> RTable
- removeColumn :: ColumnName -> RTable -> RTable
- emptyRTuple :: RTuple
- createNullRTuple :: [ColumnName] -> RTuple
- createRtuple :: [(ColumnName, RDataType)] -> RTuple
- rtupleFromList :: [(ColumnName, RDataType)] -> RTuple
- createRDataType :: Typeable a => a -> RDataType
- createRTableMData :: (RTableName, [(ColumnName, ColumnDType)]) -> [ColumnName] -> [[ColumnName]] -> RTableMData
- getColumnNamesfromRTab :: RTable -> [ColumnName]
- getColumnNamesfromRTuple :: RTuple -> [ColumnName]
- listOfColInfoRDataType :: [ColumnInfo] -> RTuple -> [(ColumnInfo, RDataType)]
- toListColumnName :: RTupleMData -> [ColumnName]
- toListColumnInfo :: RTupleMData -> [ColumnInfo]
- data ColumnDoesNotExist = ColumnDoesNotExist ColumnName
- data UnsupportedTimeStampFormat = UnsupportedTimeStampFormat String
- data EmptyInputStringsInToRTimestamp = EmptyInputStringsInToRTimestamp String String
- printRTable :: RTable -> IO ()
- eitherPrintRTable :: Exception e => (RTable -> IO ()) -> RTable -> IO (Either e ())
- printfRTable :: RTupleFormat -> RTable -> IO ()
- eitherPrintfRTable :: Exception e => (RTupleFormat -> RTable -> IO ()) -> RTupleFormat -> RTable -> IO (Either e ())
- data RTupleFormat = RTupleFormat {}
- type ColFormatMap = HashMap ColumnName FormatSpecifier
- data FormatSpecifier
- data OrderingSpec
- genRTupleFormat :: [ColumnName] -> ColFormatMap -> RTupleFormat
- genRTupleFormatDefault :: RTupleFormat
- genColFormatMap :: [(ColumnName, FormatSpecifier)] -> ColFormatMap
- genDefaultColFormatMap :: ColFormatMap
The Relational Table Concept
RTable Data Types
type RTuple = HashMap ColumnName RDataType Source #
Definition of the Relational Tuple.
An RTuple
is implemented as a HashMap
of (ColumnName
, RDataType
) pairs. This ensures fast access of the column value by column name.
Note that this implies that the RTuple
CANNOT have more than one columns with the same name (i.e. hashmap key) and more importantly that
it DOES NOT have a fixed order of columns, as it is usual in RDBMS implementations.
This gives us the freedom to perform column change operations very fast.
The only place were we need fixed column order is when we try to load an RTable
from a fixed-column structure such as a CSV file.
For this reason, we have embedded the notion of a fixed column-order in the RTuple
metadata. See RTupleMData
.
Definition of the Relational Data Type. This is the data type of the values stored in each RTable
.
This is a strict data type, meaning whenever we evaluate a value of type RDataType
,
there must be also evaluated all the fields it contains.
Instances
Eq RDataType Source # | We need to explicitly specify equation of RDataType due to SQL NULL logic (i.e., anything compared to NULL returns false):
|
Fractional RDataType Source # | In order to be able to use (/) with RDataType |
Num RDataType Source # | |
Ord RDataType Source # | |
Defined in RTable.Core | |
Read RDataType Source # | |
Show RDataType Source # | |
Generic RDataType Source # | |
FromField RDataType # | Necessary instance in order to convert a CSV file column value to an |
Defined in RTable.Data.CSV parseField :: Field -> Parser RDataType # | |
ToField RDataType # | In order to encode an input RTable into a CSV bytestring we need to make Rtuple an instance of the ToNamedRecord typeclass and implement the toNamedRecord function. Where: toNamedRecord :: a -> NamedRecord type NamedRecord = HashMap ByteString ByteString namedRecord :: [(ByteString, ByteString)] -> NamedRecord Construct a named record from a list of name-value ByteString pairs. Use .= to construct such a pair from a name and a value. (.=) :: ToField a => ByteString -> a -> (ByteString, ByteString) In our case, we dont need to do this because an RTuple is just a synonym for HM.HashMap ColumnName RDataType and the data type HashMap a b is already an instance of ToNamedRecord. Also we need to make RDataType an instance of ToField ((CV.ToField RDataType)) by implementing toField, so as to be able to convert an RDataType into a ByteString where: toField :: a -> Field type Field = ByteString |
Defined in RTable.Data.CSV | |
NFData RDataType Source # | In order to be able to force full evaluation up to Normal Form (NF) https://www.fpcomplete.com/blog/2017/09/all-about-strictness |
Defined in RTable.Core | |
type Rep RDataType Source # | |
Defined in RTable.Core type Rep RDataType = D1 (MetaData "RDataType" "RTable.Core" "DBFunctor-0.1.0.0-4dDhI1jCY88DHDs5IjLSX3" False) ((C1 (MetaCons "RInt" PrefixI True) (S1 (MetaSel (Just "rint") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Integer)) :+: (C1 (MetaCons "RText" PrefixI True) (S1 (MetaSel (Just "rtext") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Text)) :+: C1 (MetaCons "RDate" PrefixI True) (S1 (MetaSel (Just "rdate") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Text) :*: S1 (MetaSel (Just "dtformat") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Text)))) :+: (C1 (MetaCons "RTime" PrefixI True) (S1 (MetaSel (Just "rtime") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 RTimestamp)) :+: (C1 (MetaCons "RDouble" PrefixI True) (S1 (MetaSel (Just "rdouble") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Double)) :+: C1 (MetaCons "Null" PrefixI False) (U1 :: * -> *)))) |
data RTimestamp Source #
Basic data type to represent time.
This is a strict data type, meaning whenever we evaluate a value of type RTimestamp
,
there must be also evaluated all the fields it contains.
Instances
RTable Metadata Data Types
data RTableMData Source #
Metadata for an RTable
RTableMData | |
|
Instances
Eq RTableMData Source # | |
Defined in RTable.Core (==) :: RTableMData -> RTableMData -> Bool # (/=) :: RTableMData -> RTableMData -> Bool # | |
Show RTableMData Source # | |
Defined in RTable.Core showsPrec :: Int -> RTableMData -> ShowS # show :: RTableMData -> String # showList :: [RTableMData] -> ShowS # |
type RTupleMData = (HashMap ColumnOrder ColumnName, HashMap ColumnName ColumnInfo) Source #
Basic Metadata of an RTuple
.
The RTuple
metadata are accessed through a HashMap
ColumnName
ColumnInfo
structure. I.e., for each column of the RTuple
,
we access the ColumnInfo
structure to get Column-level metadata. This access is achieved by ColumnName
.
However, in order to provide the "impression" of a fixed column order per tuple (see RTuple
definition), we provide another HashMap
,
the HashMap
ColumnOrder
ColumnName
. So in the follwoing example, if we want to access the RTupleMData
tupmdata ColumnInfo by column order,
(assuming that we have N columns) we have to do the following:
(snd tupmdata)!((fst tupmdata)!0) (snd tupmdata)!((fst tupmdata)!1) ... (snd tupmdata)!((fst tupmdata)!(N-1))
In the same manner in order to access the column of an RTuple
(e.g., tup) by column order, we do the following:
tup!((fst tupmdata)!0) tup!((fst tupmdata)!1) ... tup!((fst tupmdata)!(N-1))
data ColumnInfo Source #
Basic metadata for a column of an RTuple
ColumnInfo | |
|
Instances
Eq ColumnInfo Source # | |
Defined in RTable.Core (==) :: ColumnInfo -> ColumnInfo -> Bool # (/=) :: ColumnInfo -> ColumnInfo -> Bool # | |
Show ColumnInfo Source # | |
Defined in RTable.Core showsPrec :: Int -> ColumnInfo -> ShowS # show :: ColumnInfo -> String # showList :: [ColumnInfo] -> ShowS # |
type ColumnName = Name Source #
Definition of the Column Name
type RTableName = Name Source #
Definition of the Table Name
data ColumnDType Source #
This is used only for metadata purposes (see ColumnInfo
). The actual data type of a value is an RDataType
The Text component of Date and Timestamp data constructors is the date format e.g., "DD/MM/YYYY", "DD/MM/YYYY HH24:MI:SS"
Instances
Eq ColumnDType Source # | |
Defined in RTable.Core (==) :: ColumnDType -> ColumnDType -> Bool # (/=) :: ColumnDType -> ColumnDType -> Bool # | |
Show ColumnDType Source # | |
Defined in RTable.Core showsPrec :: Int -> ColumnDType -> ShowS # show :: ColumnDType -> String # showList :: [ColumnDType] -> ShowS # |
Type Classes for "Tabular Data"
class RTabular a where Source #
Basic class to represent a data type that can be turned into an RTable
.
It implements the concept of "tabular data"
toRTable :: RTableMData -> a -> RTable Source #
fromRTable :: RTableMData -> RTable -> a Source #
Instances
RTabular CSV Source # | CSV data are "Tabular" data thus implement the |
Defined in RTable.Data.CSV toRTable :: RTableMData -> CSV -> RTable Source # fromRTable :: RTableMData -> RTable -> CSV Source # |
Relational Algebra Operations
Operations Data Types
data ROperation Source #
Definition of Relational Algebra operations. These are the valid operations between RTables
ROperationEmpty | |
RUnion | Union |
RInter | Intersection |
RDiff | Difference |
RPrj | Projection |
| |
RFilter | Filter operation (an |
| |
RInJoin | Inner Join (any type of join predicate allowed. Any function with a signature of the form:
|
RLeftJoin | Left Outer Join |
RRightJoin | Right Outer Join |
RAggregate | Performs aggregation operations on specific columns and returns a singleton RTable |
| |
RGroupBy | A Group By operation
The SQL equivalent is:
|
| |
RCombinedOp | A combination of unary (ij jpred rtab) . (p plist) . (f pred) |
RBinOp | A generic binary |
ROrderBy | Order the |
|
type UnaryRTableOperation = RTable -> RTable Source #
A generic unary operation on a RTable
type BinaryRTableOperation = RTable -> RTable -> RTable Source #
A generic binary operation on RTable
data RAggOperation Source #
This data type represents all possible aggregate operations over an RTable. Examples are : Sum, Count, Average, Min, Max but it can be any other "aggregation". The essential property of an aggregate operation is that it acts on an RTable (or on a group of RTuples - in the case of the RGroupBy operation) and produces a single RTuple.
An aggregate operation is applied on a specific column (source column) and the aggregated result will be stored in the target column. It is important to understand that the produced aggregated RTuple is different from the input RTuples. It is a totally new RTuple, that will consist of the aggregated column(s) (and the grouping columns in the case of an RGroupBy).
RAggOperation | |
|
Available Aggregate Operations
type AggFunction = ColumnName -> RTable -> RDataType Source #
Aggregation Function type.
An aggregation function receives as input a source column (i.e., a ColumnName
) of a source RTable
and returns
an aggregated value, which is the result of the aggregation on the values of the source column.
:: AggFunction | custom aggregation function |
-> ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
Returns an RAggOperation
with a custom aggregation function provided as input
:: ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
The Sum aggregate operation
:: ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
The Count aggregate operation
:: ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
The Average aggregate operation
:: ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
The Max aggregate operation
:: ColumnName | source column |
-> ColumnName | target column |
-> RAggOperation |
The Min aggregate operation
Predicates
type RPredicate = RTuple -> Bool Source #
type RGroupPredicate = RTuple -> RTuple -> Bool Source #
The Group By Predicate
It defines the condition for two RTuple
s to be included in the same group.
type RJoinPredicate = RTuple -> RTuple -> Bool Source #
The Join Predicate. It defines when two RTuple
s should be paired.
Operation Execution
:: ROperation | input ROperation |
-> RTable | input RTable |
-> RTable | output RTable |
Execute a Unary ROperation
ropU :: ROperation -> RTable -> RTable Source #
ropU operator executes a unary ROperation. A short name for the runUnaryROperation
function
runUnaryROperationRes Source #
:: ROperation | input ROperation |
-> RTable | input RTable |
-> RTabResult | output: Result of operation |
Execute a Unary ROperation and return an RTabResult
ropUres :: ROperation -> RTable -> RTabResult Source #
ropUres operator executes a unary ROperation. A short name for the runUnaryROperationRes
function
:: ROperation | input ROperation |
-> RTable | input RTable1 |
-> RTable | input RTable2 |
-> RTable | output RTabl |
Execute a Binary ROperation
ropB :: ROperation -> RTable -> RTable -> RTable Source #
ropB operator executes a binary ROperation. A short name for the runBinaryROperation
function
runBinaryROperationRes Source #
:: ROperation | input ROperation |
-> RTable | input RTable1 |
-> RTable | input RTable2 |
-> RTabResult | output: Result of operation |
Execute a Binary ROperation and return an RTabResult
ropBres :: ROperation -> RTable -> RTable -> RTabResult Source #
ropBres operator executes a binary ROperation. A short name for the runBinaryROperationRes
function
Operation Result
type RTuplesRet = Sum Int Source #
Number of RTuples returned by an RTable operation
type RTabResult = Writer RTuplesRet RTable Source #
RTabResult is the result of an RTable operation and is a Writer Monad, that includes the new RTable, as well as the number of RTuples returned by the operation.
:: (RTable, RTuplesRet) | input pair |
-> RTabResult | output Writer Monad |
Creates an RTabResult (i.e., a Writer Monad) from a result RTable and the number of RTuples that it returned
runRTabResult :: RTabResult -> (RTable, RTuplesRet) Source #
Returns the info "stored" in the RTabResult Writer Monad
execRTabResult :: RTabResult -> RTuplesRet Source #
Returns the "log message" in the RTabResult Writer Monad, which is the number of returned RTuples
rtuplesRet :: Int -> RTuplesRet Source #
Creates an RTuplesRet type
getRTuplesRet :: RTuplesRet -> Int Source #
Return the number embedded in the RTuplesRet data type
Operation Composition
An Example of Operation Composition
>>>
-- define a simple RTable with four RTuples of a single column "col1"
>>>
let tab1 = rtableFromList [rtupleFromList [("col1", RInt 1)], rtupleFromList [("col1", RInt 2)], rtupleFromList [("col1", RInt 3)], rtupleFromList [("col1", RInt 4)] ]
>>>
printRTable tab1
col1 ~~~~ 1 2 3 4 4 rows returned ---------
>>>
-- define a filter operation col1 > 2
>>>
let rop1 = RFilter (\t-> t<!>"col1" > 2)
>>>
-- define another filter operation col1 > 3
>>>
let rop2 = RFilter (\t-> t<!>"col1" > 3)
>>>
-- Composition of RTable operations via (.) (rop1 returns 2 RTuples and rop2 returns 1 RTuple)
>>>
printRTable $ (ropU rop2) . (ropU rop1) $ tab1
col1 ~~~~ 4 1 row returned ---------
>>>
-- Composition of RTabResult operations via (<=<) (Note: that the result includes the sum of the returned RTuples in each operation, i.e., 2+1 = 3)
>>>
execRTabResult $ (ropUres rop2) <=< (ropUres rop1) $ tab1
Sum {getSum = 3}>>>
printRTable $ fst.runRTabResult $ (ropUres rop2) <=< (ropUres rop1) $ tab1
col1 ~~~~ 4 1 row returned ---------
RTable Functions
Relational Algebra Functions
runRfilter :: RPredicate -> RTable -> RTable Source #
Executes an RFilter operation
f :: RPredicate -> RTable -> RTable Source #
Filter (i.e. selection operator). A short name for the runRFilter
function
runInnerJoinO :: RJoinPredicate -> RTable -> RTable -> RTable Source #
Implements an Inner Join operation between two RTables (any type of join predicate is allowed) This Inner Join implementation follows Oracle DB's convention for common column names. When we have two tuples t1 and t2 with a common column name (lets say "Common"), then the resulting tuple after a join will be "Common", "Common_1", so a "_1" suffix is appended. The tuple from the left table by convention retains the original column name. So "Column_1" is the column from the right table. If "Column_1" already exists, then "Column_2" is used.
iJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #
RTable Inner Join Operator. A short name for the runInnerJoinO
function
runLeftJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #
Implements a Left Outer Join operation between two RTables (any type of join predicate is allowed), i.e., the rows of the left RTable will be preserved. Note that when dublicate keys encountered that is, since the underlying structure for an RTuple is a Data.HashMap.Strict, only one value per key is allowed. So in the context of joining two RTuples the value of the left RTuple on the common key will be prefered.
Implements a Left Outer Join operation between two RTables (any type of join predicate is allowed),
i.e., the rows of the left RTable will be preserved.
A Left Join :
tabLeft LEFT JOIN tabRight ON joinPred
where tabLeft is the preserving table can be defined as:
the Union between the following two RTables:
- The result of the inner join: tabLeft INNER JOIN tabRight ON joinPred
- The rows from the preserving table (tabLeft) that DONT satisfy the join condition, enhanced with the columns of tabRight returning Null values.
The common columns will appear from both tables but only the left table column's will retain their original name.
lJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #
RTable Left Outer Join Operator. A short name for the runLeftJoin
function
runRightJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #
Implements a Right Outer Join operation between two RTables (any type of join predicate is allowed),
i.e., the rows of the right RTable will be preserved.
A Right Join :
tabLeft RIGHT JOIN tabRight ON joinPred
where tabRight is the preserving table can be defined as:
the Union between the following two RTables:
- The result of the inner join: tabLeft INNER JOIN tabRight ON joinPred
- The rows from the preserving table (tabRight) that DONT satisfy the join condition, enhanced with the columns of tabLeft returning Null values.
The common columns will appear from both tables but only the right table column's will retain their original name.
rJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #
RTable Right Outer Join Operator. A short name for the runRightJoin
function
runFullOuterJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #
Implements a Full Outer Join operation between two RTables (any type of join predicate is allowed) A full outer join is the union of the left and right outer joins respectively. The common columns will appear from both tables but only the left table column's will retain their original name (just by convention).
foJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #
Implements a Right Outer Join operation between two RTables (any type of join predicate is allowed) i.e., the rows of the right RTable will be preserved. Note that when dublicate keys encountered that is, since the underlying structure for an RTuple is a Data.HashMap.Strict, only one value per key is allowed. So in the context of joining two RTuples the value of the right RTuple on the common key will be prefered.
RTable Full Outer Join Operator. A short name for the runFullOuterJoin
function
joinRTuples :: RTuple -> RTuple -> RTuple Source #
Joins two RTuples into one. In this join we follow Oracle DB's convention when joining two tuples with some common column names. When we have two tuples t1 and t2 with a common column name (lets say Common), then the resulitng tuple after a join will be Common, Common_1, so a "_1" suffix is appended. The tuple from the left table by convention retains the original column name. So Column_1 is the column from the right table. If Column_1 already exists, then Column_2 is used.
u :: RTable -> RTable -> RTable Source #
RTable Union Operator. A short name for the runUnion
function
i :: RTable -> RTable -> RTable Source #
RTable Intersection Operator. A short name for the runIntersect
function
runDiff :: RTable -> RTable -> RTable Source #
Implements the set Difference of two RTables as the diff of two lists (see List
).
d :: RTable -> RTable -> RTable Source #
RTable Difference Operator. A short name for the runDiff
function
:: [ColumnName] | list of column names to be included in the final result RTable |
-> RTable | |
-> RTable |
Implements RTable projection operation. If a column name does not exist, then an empty RTable is returned.
runProjectionMissedHits Source #
:: [ColumnName] | list of column names to be included in the final result RTable |
-> RTable | |
-> RTable |
Implements RTable projection operation. If a column name does not exist, then the returned RTable includes this column with a Null value. This projection implementation allows missed hits.
p :: [ColumnName] -> RTable -> RTable Source #
RTable Projection operator. A short name for the runProjection
function
:: [RAggOperation] | Input Aggregate Operations |
-> RTable | Input RTable |
-> RTable | Output singleton RTable |
Implements the aggregation operation on an RTable It aggregates the specific columns in each AggOperation and returns a singleton RTable i.e., an RTable with a single RTuple that includes only the agg columns and their aggregated value.
rAgg :: [RAggOperation] -> RTable -> RTable Source #
Aggregation Operator. A short name for the runAggregation
function
:: RGroupPredicate | Grouping predicate, in order to form the groups of RTuples (it defines when two RTuples should be included in the same group) |
-> [RAggOperation] | Aggregations to be applied on specific columns |
-> [ColumnName] | List of grouping column names (GROUP BY clause in SQL) We assume that all RTuples in the same group have the same value in these columns |
-> RTable | input RTable |
-> RTable | output RTable |
Implements the GROUP BY operation over an RTable
.
rG :: RGroupPredicate -> [RAggOperation] -> [ColumnName] -> RTable -> RTable Source #
Group By Operator. A short name for the runGroupBy
function
:: RGroupPredicate | Grouping predicate, in order to form the groups of |
-> [ColumnName] | List of grouping column names (GROUP BY clause in SQL) We assume that all RTuples in the same group have the same value in these columns |
-> RTable | input |
-> [RTable] | output list of |
:: RGroupPredicate | Grouping predicate, in order to form the groups of |
-> [ColumnName] | List of grouping column names (GROUP BY clause in SQL)
We assume that all |
-> RTable | input |
-> RTable | output |
:: [(ColumnName, OrderingSpec)] | Input ordering specification |
-> RTable | Input RTable |
-> RTable | Output RTable |
Implements the ORDER BY operation. First column in the input list has the highest priority in the sorting order We treat Null as the maximum value (anything compared to Null is smaller). This way Nulls are send at the end (i.e., "Nulls Last" in SQL parlance). This is for Asc ordering. For Desc ordering, we have the opposite. Nulls go first and so anything compared to Null is greater. @ SQL example with q as (select case when level < 4 then level else NULL end c1 -- , level c2 from dual connect by level < 7 ) select * from q order by c1
C1 ---- 1 2 3 Null Null Null
with q as (select case when level < 4 then level else NULL end c1 -- , level c2 from dual connect by level < 7 ) select * from q order by c1 desc
rO :: [(ColumnName, OrderingSpec)] -> RTable -> RTable Source #
Order By Operator. A short name for the runOrderBy
function
:: (RTable -> RTable) | input combined RTable operation |
-> RTable | input RTable that the input function will be applied to |
-> RTable | output RTable |
runCombinedROp: A Higher Order function that accepts as input a combination of unary ROperations e.g., (p plist).(f pred) expressed in the form of a function (RTable -> Rtable) and applies this function to the input RTable. In this sense we can also include a binary operation (e.g. join), if we partially apply the join to one RTable e.g., (ij jpred rtab) . (p plist) . (f pred)
rComb :: (RTable -> RTable) -> RTable -> RTable Source #
A short name for the runCombinedROp
function
Decoding
data IgnoreDefault Source #
Instances
Eq IgnoreDefault Source # | |
Defined in RTable.Core (==) :: IgnoreDefault -> IgnoreDefault -> Bool # (/=) :: IgnoreDefault -> IgnoreDefault -> Bool # | |
Show IgnoreDefault Source # | |
Defined in RTable.Core showsPrec :: Int -> IgnoreDefault -> ShowS # show :: IgnoreDefault -> String # showList :: [IgnoreDefault] -> ShowS # |
:: ColumnName | ColumnName key |
-> RDataType | Search value |
-> RDataType | Return value |
-> RDataType | Default value |
-> IgnoreDefault | Ignore default indicator |
-> RTable | input RTable |
-> RTable |
It receives an RTable, a search value and a default value. It returns a new RTable which is identical to the source one
but for each RTuple, for the specified column:
if the search value was found then the specified Return Value is returned
else the default value is returned (if the ignore indicator is not set), otherwise (if the ignore indicator is set),
it returns the existing value for the column for each RTuple
.
If you pass an empty RTable, then it returns an empty RTable
Throws a ColumnDoesNotExist
exception, if the column does not exist
:: ColumnName | ColumnName key |
-> RDataType | Search value |
-> RDataType | Return value |
-> RDataType | Default value |
-> IgnoreDefault | Ignore default indicator |
-> RTuple | input RTuple |
-> RDataType |
It receives an RTuple and lookups the value at a specfic column name.
Then it compares this value with the specified search value. If it is equal to the search value
then it returns the specified Return Value. If not, then it returns the specified default Value, if the ignore indicator is not set,
otherwise (if the ignore indicator is set) it returns the existing value.
If you pass an empty RTuple, then it returns Null.
Throws a ColumnDoesNotExist
exception, if this map contains no mapping for the key.
Date/Time
:: String | Format string e.g., "DD/MM/YYYY HH:MI:SS" |
-> String | Timestamp string |
-> RTimestamp |
Returns an RTimestamp
from an input String
and a format String
.
Valid format patterns are:
- For year:
YYYY
, e.g.,"0001"
,"2018"
- For month:
MM
, e.g.,"01"
,"1"
,"12"
- For day:
DD
, e.g.,"01"
,"1"
,"31"
- For hours:
HH
,HH24
e.g.,"00"
,"23"
I.e., hours must be specified in 24 format - For minutes:
MI
, e.g.,"01"
,"1"
,"59"
- For seconds:
SS
, e.g.,"01"
,"1"
,"59"
Example of a typical format string is: "DD/MM/YYYY HH:MI:SS
If no valid format pattern is found then an UnsupportedTimeStampFormat
exception is thrown
:: String | Format string e.g., "DD/MM/YYYY HH24:MI:SS" |
-> String | Timestamp string |
-> RTimestamp |
Creates an RTimestamp data type from an input timestamp format string and a timestamp value represented as a String
.
Valid format patterns are:
- For year:
YYYY
, e.g.,"0001"
,"2018"
- For month:
MM
, e.g.,"01"
,"1"
,"12"
- For day:
DD
, e.g.,"01"
,"1"
,"31"
- For hours:
HH
,HH24
e.g.,"00"
,"23"
I.e., hours must be specified in 24 format - For minutes:
MI
, e.g.,"01"
,"1"
,"59"
- For seconds:
SS
, e.g.,"01"
,"1"
,"59"
Example of a typical format string is: "DD/MM/YYYY HH:MI:SS
If no valid format pattern is found then an UnsupportedTimeStampFormat
exception is thrown
:: String | Output format e.g., "DD/MM/YYYY HH24:MI:SS" |
-> RTimestamp | Input RTimestamp |
-> RDataType | Output RText |
rTimeStampToText: converts an RTimestamp value to RText Valid input formats are:
- 1.
"DD/MM/YYYY HH24:MI:SS"
- 2.
"YYYYMMDD-HH24.MI.SS"
- 3.
"YYYYMMDD"
- 4.
"YYYYMM"
- 5.
"YYYY"
stdTimestampFormat :: [Char] Source #
Standard timestamp format. For example: "DDMMYYYY HH24:MI:SS"
stdDateFormat :: [Char] Source #
Standard date format
Character/Text
stripRText : O(n) Remove leading and trailing white space from a string. If the input RDataType is not an RText, then Null is returned
removeCharAroundRText :: Char -> RDataType -> RDataType Source #
Helper function to remove a character around (from both beginning and end) of an (RText t) value
NULL-Related
:: ColumnName | ColumnName key |
-> RDataType | Default value |
-> RTable | input RTable |
-> RTable |
It receives an RTable and a default value. It returns a new RTable which is identical to the source one
but for each RTuple, for the specified column every Null value in every RTuple has been replaced by a default value
If you pass an empty RTable, then it returns an empty RTable
Throws a ColumnDoesNotExist
exception, if the column does not exist
:: ColumnName | ColumnName key |
-> RDataType | Default value in the case of Null column values |
-> RTuple | input RTuple |
-> RTuple | output RTuple |
It receives an RTuple and a default value. It returns a new RTuple which is identical to the source one but every Null value in the specified colummn has been replaced by a default value
isNullRTuple :: RTuple -> Bool Source #
Returns True if the input RTuple is a Null RTuple, otherwise it returns False
isNull :: RDataType -> Bool Source #
Use this function to compare an RDataType with the Null value because due to Null logic x == Null or x /= Null, will always return False. It returns True if input value is Null
isNotNull :: RDataType -> Bool Source #
Use this function to compare an RDataType with the Null value because deu to Null logic x == Null or x /= Null, will always return False. It returns True if input value is Not Null
:: RDataType | input value |
-> RDataType | default value returned if input value is Null |
-> RDataType | output value |
Returns the 1st parameter if this is not Null, otherwise it returns the 2nd.
:: ColumnName | ColumnName key |
-> RDataType | value returned if original value is Null |
-> RTuple | input RTuple |
-> RDataType | output value |
Returns the value of a specific column (specified by name) if this is not Null.
If this value is Null, then it returns the 2nd parameter.
If you pass an empty RTuple, then it returns Null.
Throws a ColumnDoesNotExist
exception, if this map contains no mapping for the key.
Access RTable
isRTabEmpty :: RTable -> Bool Source #
Test whether an RTable is empty
isRTupEmpty :: RTuple -> Bool Source #
Test whether an RTuple is empty
:: ColumnName | ColumnName key |
-> RTuple | Input RTuple |
-> RDataType | Output value |
getRTupColValue :: Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns Null. !!!Note that this might be confusing since there might be an existing column name with a Null value!!!
:: ColumnName | ColumnName key |
-> RTuple | Input RTuple |
-> Maybe RDataType | Output value |
Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns Nothing
:: RDataType | Default value to return in the case the column name does not exist in the RTuple |
-> ColumnName | ColumnName key |
-> RTuple | Input RTuple |
-> RDataType | Output value |
Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns a default value
:: RTuple | Input RTuple |
-> ColumnName | ColumnName key |
-> RDataType | Output value |
Operator for getting a column value from an RTuple
Throws a ColumnDoesNotExist
exception, if this map contains no mapping for the key.
:: RTuple | Input RTuple |
-> ColumnName | ColumnName key |
-> Maybe RDataType | Output value |
Safe Operator for getting a column value from an RTuple if the column name is not found, then it returns Nothing
Conversions
concatRTab :: [RTable] -> RTable Source #
rtupleToList :: RTuple -> [(ColumnName, RDataType)] Source #
Turns an RTuple to a List
toListRDataType :: RTupleMData -> RTuple -> [RDataType] Source #
toListRDataType: returns a list of RDataType values of an RTuple, in the fixed column order of the RTuple
toText :: RDataType -> Maybe Text Source #
Return the Text out of an RDataType If a non-text RDataType is given then Nothing is returned.
Container Functions
Modify RTable (DML)
:: [(ColumnName, RDataType)] | List of column names to be updated with the corresponding new values |
-> RPredicate | An RTuple -> Bool function that specifies the RTuples to be updated |
-> RTable | Input RTable |
-> RTable | Output RTable |
Update an RTable. Input includes a list of (ColumnName, new Value) pairs. Also a filter predicate is specified in order to restrict the update only to those rtuples that fulfill the predicate
:: ColumnName | key where the upset will take place |
-> RDataType | new value |
-> RTuple | input RTuple |
-> RTuple | output RTuple |
Upsert (update/insert) an RTuple at a specific column specified by name with a value If the cname key is not found then the (columnName, value) pair is inserted. If it exists then the value is updated with the input value.
Create/Alter RTable (DDL)
emptyRTable :: RTable Source #
emptyRTable: Create an empty RTable
createSingletonRTable :: RTuple -> RTable Source #
Creates an RTable with a single RTuple
rtableFromList :: [RTuple] -> RTable Source #
Creates an RTable from a list of RTuples
:: ColumnName | name of the column to be added |
-> RDataType | Default value of the new column. All RTuples will initially have this value in this column |
-> RTable | Input RTable |
-> RTable | Output RTable |
addColumn: adds a column to an RTable
:: ColumnName | Column to be removed |
-> RTable | input RTable |
-> RTable | output RTable |
removeColumn : removes a column from an RTable. The column is specified by ColumnName. If this ColumnName does not exist in the RTuple of the input RTable then nothing is happened, the RTuple remains intact.
emptyRTuple :: RTuple Source #
Creates an empty RTuple (i.e., one with no column,value mappings)
createNullRTuple :: [ColumnName] -> RTuple Source #
:: [(ColumnName, RDataType)] | input list of (columnname,value) pairs |
-> RTuple |
createRTuple: Create an Rtuple from a list of column names and values
rtupleFromList :: [(ColumnName, RDataType)] -> RTuple Source #
Create an RTuple from a list
createRDataType: Get a value of type a and return the corresponding RDataType. The input value data type must be an instance of the Typepable typeclass from Data.Typeable
Metadata Functions
:: (RTableName, [(ColumnName, ColumnDType)]) | |
-> [ColumnName] | Primary Key. [] if no PK exists |
-> [[ColumnName]] | list of unique keys. [] if no unique keys exists |
-> RTableMData |
createRTableMData : creates RTableMData from input given in the form of a list We assume that the column order of the input list defines the fixed column order of the RTuple.
getColumnNamesfromRTab :: RTable -> [ColumnName] Source #
Get the Column Names of an RTable
getColumnNamesfromRTuple :: RTuple -> [ColumnName] Source #
Returns the Column Names of an RTuple
listOfColInfoRDataType :: [ColumnInfo] -> RTuple -> [(ColumnInfo, RDataType)] Source #
Creates a list of the form [(ColumnInfo, RDataType)] from a list of ColumnInfo and an RTuple. The returned list respects the order of the [ColumnInfo] Prelude.zip listOfColInfo (Prelude.map (snd) $ HM.toList rtup) -- this code does NOT guarantee that HM.toList will return the same column order as [ColumnInfo]
toListColumnName :: RTupleMData -> [ColumnName] Source #
toListColumnName: returns a list of RTuple column names, in the fixed column order of the RTuple.
toListColumnInfo :: RTupleMData -> [ColumnInfo] Source #
toListColumnInfo: returns a list of RTuple columnInfo, in the fixed column order of the RTuple
Exceptions
data ColumnDoesNotExist Source #
This exception is thrown whenever we try to access a specific column (i.e., ColumnName
) of an RTuple
and the column does not exist.
Instances
Eq ColumnDoesNotExist Source # | |
Defined in RTable.Core (==) :: ColumnDoesNotExist -> ColumnDoesNotExist -> Bool # (/=) :: ColumnDoesNotExist -> ColumnDoesNotExist -> Bool # | |
Show ColumnDoesNotExist Source # | |
Defined in RTable.Core showsPrec :: Int -> ColumnDoesNotExist -> ShowS # show :: ColumnDoesNotExist -> String # showList :: [ColumnDoesNotExist] -> ShowS # | |
Exception ColumnDoesNotExist Source # | |
Defined in RTable.Core |
data UnsupportedTimeStampFormat Source #
This exception is thrown whenever we provide a Timestamp format with not even one valid format pattern
Instances
Eq UnsupportedTimeStampFormat Source # | |
Defined in RTable.Core | |
Show UnsupportedTimeStampFormat Source # | |
Defined in RTable.Core showsPrec :: Int -> UnsupportedTimeStampFormat -> ShowS # show :: UnsupportedTimeStampFormat -> String # showList :: [UnsupportedTimeStampFormat] -> ShowS # | |
Exception UnsupportedTimeStampFormat Source # | |
data EmptyInputStringsInToRTimestamp Source #
Length mismatch between the format String
and the input String
data RTimestampFormatLengthMismatch = RTimestampFormatLengthMismatch String String deriving(Eq,Show)
instance Exception RTimestampFormatLengthMismatch
One (or both) of the input String
s to function toRTimestamp
are empty
RTable IO Operations
RTable Printing and Formatting
An Example of RTable printing
>>>
-- define a simple RTable from a list
>>>
:set -XOverloadedStrings
>>>
:{
let tab1 = rtableFromList [ rtupleFromList [("ColInteger", RInt 1), ("ColDouble", RDouble 2.3), ("ColText", RText "We dig dig dig dig dig dig dig")] ,rtupleFromList [("ColInteger", RInt 2), ("ColDouble", RDouble 5.36879), ("ColText", RText "From early morn to night")] ,rtupleFromList [("ColInteger", RInt 3), ("ColDouble", RDouble 999.9999), ("ColText", RText "In a mine the whole day through")] ,rtupleFromList [("ColInteger", RInt 4), ("ColDouble", RDouble 0.9999), ("ColText", RText "Is what we like to do")] ] :}>>>
-- print without format specification
>>>
printRTable tab1
----------------------------------------------------------------- ColInteger ColText ColDouble ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~~ 1 We dig dig dig dig dig dig dig 2.30 2 From early morn to night 5.37 3 In a mine the whole day through 1000.00 4 Is what we like to do 1.00 4 rows returned -----------------------------------------------------------------
>>>
-- print with format specification (define column printing order and value formatting per column)
>>>
printfRTable (genRTupleFormat ["ColInteger","ColDouble","ColText"] $ genColFormatMap [("ColInteger", Format "%d"),("ColDouble", Format "%1.1e"),("ColText", Format "%50s\n")]) tab1
----------------------------------------------------------------- ColInteger ColDouble ColText ~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~ 1 2.3e0 We dig dig dig dig dig dig dig 2 5.4e0 From early morn to night 3 1.0e3 In a mine the whole day through 4 1.0e0 Is what we like to do 4 rows returned -----------------------------------------------------------------
printRTable :: RTable -> IO () Source #
printRTable : Print the input RTable on screen
eitherPrintRTable :: Exception e => (RTable -> IO ()) -> RTable -> IO (Either e ()) Source #
Safe printRTable
alternative that returns an Either
, so as to give the ability to handle exceptions
gracefully, during the evaluation of the input RTable. Example:
do p <- (eitherPrintRTable printRTable myRTab) :: IO (Either SomeException ()) case p of Left exc -> putStrLn $ "There was an error in the Julius evaluation: " ++ (show exc) Right _ -> return ()
printfRTable :: RTupleFormat -> RTable -> IO () Source #
prints an RTable with an RTuple format specification.
It can be used instead of printRTable
when one of the following two is required:
- a) When we want to specify the order that the columns will be printed on screen
- b) When we want to specify the formatting of the values by using a
printf
-likeFormatSpecifier
eitherPrintfRTable :: Exception e => (RTupleFormat -> RTable -> IO ()) -> RTupleFormat -> RTable -> IO (Either e ()) Source #
Safe printRfTable
alternative that returns an Either
, so as to give the ability to handle exceptions
gracefully, during the evaluation of the input RTable. Example:
do p <- (eitherPrintfRTable printfRTable myFormat myRTab) :: IO (Either SomeException ()) case p of Left exc -> putStrLn $ "There was an error in the Julius evaluation: " ++ (show exc) Right _ -> return ()
data RTupleFormat Source #
Basic data type for defining the desired formatting of an RTuple
when printing an RTable (see printfRTable
).
RTupleFormat | |
|
Instances
Eq RTupleFormat Source # | |
Defined in RTable.Core (==) :: RTupleFormat -> RTupleFormat -> Bool # (/=) :: RTupleFormat -> RTupleFormat -> Bool # | |
Show RTupleFormat Source # | |
Defined in RTable.Core showsPrec :: Int -> RTupleFormat -> ShowS # show :: RTupleFormat -> String # showList :: [RTupleFormat] -> ShowS # |
type ColFormatMap = HashMap ColumnName FormatSpecifier Source #
A map of ColumnName to Format Specification
data FormatSpecifier Source #
Format specifier of printf
style
Instances
Eq FormatSpecifier Source # | |
Defined in RTable.Core (==) :: FormatSpecifier -> FormatSpecifier -> Bool # (/=) :: FormatSpecifier -> FormatSpecifier -> Bool # | |
Show FormatSpecifier Source # | |
Defined in RTable.Core showsPrec :: Int -> FormatSpecifier -> ShowS # show :: FormatSpecifier -> String # showList :: [FormatSpecifier] -> ShowS # |
data OrderingSpec Source #
A sum type to help the specification of a column ordering (Ascending, or Descending)
Instances
Eq OrderingSpec Source # | |
Defined in RTable.Core (==) :: OrderingSpec -> OrderingSpec -> Bool # (/=) :: OrderingSpec -> OrderingSpec -> Bool # | |
Show OrderingSpec Source # | |
Defined in RTable.Core showsPrec :: Int -> OrderingSpec -> ShowS # show :: OrderingSpec -> String # showList :: [OrderingSpec] -> ShowS # |
:: [ColumnName] | Column Select list |
-> ColFormatMap | Column Format Map |
-> RTupleFormat | Output |
Generate an RTupleFormat data type instance
genColFormatMap :: [(ColumnName, FormatSpecifier)] -> ColFormatMap Source #
Generates a Column Format Specification
genDefaultColFormatMap :: ColFormatMap Source #
Generates a default Column Format Specification