DBFunctor-0.1.1.0: DBFunctor - Functional Data Management => ETL/ELT Data Processing in Haskell

Copyright(c) Nikos Karagiannidis 2018
LicenseBSD3
Maintainernkarag@gmail.com
Stabilitystable
PortabilityPOSIX
Safe HaskellNone
LanguageHaskell2010

RTable.Core

Contents

Description

This is the core module that implements the relational Table concept with the RTable data type. It defines all necessary data types like RTable and RTuple as well as all the basic relational algebra operations (selection -i.e., filter- , projection, inner/outer join, aggregation, grouping etc.) on RTables.

When to use this module

This module should be used whenever one has "tabular data" (e.g., some CSV files, or any type of data that can be an instance of the RTabular type class and thus define the toRTable and fromRTable functions) and wants to analyze them in-memory with the well-known relational algebra operations (selection, projection, join, groupby, aggregations etc) that lie behind SQL. This data analysis takes place within your haskell code, without the need to import the data into a database (database-less data processing) and the result can be turned into the original format (e.g., CSV) with a simple call to the fromRTable function.

RTable.Core gives you an interface for all common relational algebra operations, which are expressed as functions over the basic RTable data type. Of course, since each relational algebra operation is a function that returns a new RTable (immutability), one can compose these operations and thus express an arbitrary complex query. Immutability also holds for DML operations also (e.g., updateRTab). This means that any update on an RTable operates like a CREATE AS SELECT statement in SQL, creating a new RTable and not modifying an existing one.

Note that the recommended method in order to perform data analysis via relational algebra operations is to use the type-level Embedded Domain Specific Language (EDSL) Julius, defined in module Etl.Julius, which exports the RTable.Core module. This provides a standard way of expressing queries and is simpler for expressing more complex queries (with many relational algebra operations). Moreover it supports intermediate results (i.e., subqueries). Finally, if you need to implement some ETL/ELT data flows, that will use the relational operations defined in RTable.Core to analyze data but also to combine them with various Column Mappings (RColMapping), in order to achieve various data transformations, then Julius is the appropriate tool for this job.

See this Julius Tutorial

Overview

An RTable is logically a container of RTuples (similar to the concept of a Relation being a set of Tuples) and is the core data type in this module. The RTuple is a map of (Column-Name, Column-Value) pairs. A Column-Name is modeled with the ColumnName data type, while the Column-Value is modelled with the RDataType, which is a wrapper over the most common data types that one would expect to find in a column of a Table (e.g., integers, rational numbers, strings, dates etc.).

We said that the RTable is a container of RTuples and thus the RTable is a Monad! So one can write monadic code to implement RTable operations. For example:

   -- | Return an new RTable after modifying each RTuple of the input RTable.
   myRTableOperation :: RTable -> RTable
   myRTableOperation rtab = do
           rtup <- rtab
           let new_rtup = doStuff rtup
           return new_rtup
       where
           doStuff :: RTuple -> RTuple
           doStuff = ...  -- to be defined

Many different types of data can be turned into an RTable. For example, CSV data can be easily turn into an RTable via the toRTable function. Many other types of data could be represented as "tabular data" via the RTable data type, as long as they adhere to the interface posed by the RTabular type class. In other words, any data type that we want to convert into an RTable and vice-versa, must become an instance of the RTabular type class and thus define the basic toRTable and fromRTable functions.

An Example

In this example we read a CSV file with the use of the readCSV function from the RTable.Data.CSV module. Then, with the use of the toRTable function, implemented in the RTabular instance of the CSV data type, we convert the CSV file into an RTable. The data of the CSV file consist of metadata from an imaginary Oracle database and each row represents an entry for a table stored in this database, with information (i.e., columns) pertaining to the owner of the table, the tablespace name, the status of the table and various statistics, such as the number of rows and number of blocks.

In this example, we apply three "transformations" to the input data and we print the result after each one, with the use of the printfRTable function. The transfomrations are:

  1. a limit operation, where we return the first N number of RTuples,
  2. an RFilter operation that returns only the tables that start with a 'B', followed by a projection operation (RPrj)
  3. an inner-join (RInJoin), where we pair the RTuples from the previous results based on a join predicate (RJoinPredicate): the tables that have been analyzed the same day

Finally, we store the results of the 2nd operation into a new CSV file, with the use of the fromRTable function implemented for the RTabular instance of the CSV data type.


import  RTable.Core
import  RTable.Data.CSV     (CSV, readCSV, toRTable)
import  Data.Text as T          (take, pack)

-- This is the input source table metadata
src_DBTab_MData :: RTableMData
src_DBTab_MData = 
    createRTableMData   (   "sourceTab"  -- table name
                            ,[  ("OWNER", Varchar)                                      -- Owner of the table
                                ,("TABLE_NAME", Varchar)                                -- Name of the table
                                ,("TABLESPACE_NAME", Varchar)                           -- Tablespace name
                                ,("STATUS",Varchar)                                     -- Status of the table object (VALID/IVALID)
                                ,("NUM_ROWS", Integer)                                  -- Number of rows in the table
                                ,("BLOCKS", Integer)                                    -- Number of Blocks allocated for this table
                                ,("LAST_ANALYZED", Timestamp "MMDDYYYY HH24:MI:SS")   -- Timestamp of the last time the table was analyzed (i.e., gathered statistics) 
                            ]
                        )
                        ["OWNER", "TABLE_NAME"] -- primary key
                        [] -- (alternative) unique keys


-- Result RTable metadata
result_tab_MData :: RTableMData
result_tab_MData = 
    createRTableMData   (   "resultTab"  -- table name
                            ,[  ("OWNER", Varchar)                                        -- Owner of the table
                                ,("TABLE_NAME", Varchar)                                  -- Name of the table
                                ,("LAST_ANALYZED", Timestamp "MMDDYYYY HH24:MI:SS")   -- Timestamp of the last time the table was analyzed (i.e., gathered statistics) 
                            ]
                        )
                        ["OWNER", "TABLE_NAME"] -- primary key
                        [] -- (alternative) unique keys


main :: IO()
main = do
     -- read source csv file
    srcCSV <- readCSV "./app/test-data.csv"

    putStrLn "\nHow many rows you want to print from the source table? :\n"
    n <- readLn :: IO Int    
    
    -- RTable A
    printfRTable (  -- define the order by which the columns will appear on screen. Use the default column formatting.
                    genRTupleFormat ["OWNER", "TABLE_NAME", "TABLESPACE_NAME", "STATUS", "NUM_ROWS", "BLOCKS", "LAST_ANALYZED"] genDefaultColFormatMap) $ 
                        limit n $ toRTable src_DBTab_MData srcCSV 

    putStrLn "\nThese are the tables that start with a "B":\n"    
    
    -- RTable B
    printfRTable ( genRTupleFormat ["OWNER", "TABLE_NAME","LAST_ANALYZED"] genDefaultColFormatMap) $ 
        tabs_start_with_B $ toRTable src_DBTab_MData srcCSV 
    
    putStrLn "\nThese are the tables that were analyzed the same day:\n"    
    
    -- RTable C = A InnerJoin B
    printfRTable ( genRTupleFormat ["OWNER", "TABLE_NAME", "LAST_ANALYZED", "OWNER_1", "TABLE_NAME_1", "LAST_ANALYZED_1"] genDefaultColFormatMap) $ 
        ropB  myJoin
                    (limit n $ toRTable src_DBTab_MData srcCSV) 
                    (tabs_start_with_B $ toRTable src_DBTab_MData srcCSV)

    -- save result of 2nd operation to CSV file
    writeCSV ".appresult-data.csv" $ 
                    fromRTable result_tab_MData $ 
                        tabs_start_with_B $ 
                            toRTable src_DBTab_MData srcCSV 

    where
        -- Return RTuples with a table_name starting with a B
        tabs_start_with_B :: RTable -> RTable
        tabs_start_with_B rtab = (ropU myProjection) . (ropU myFilter) $ rtab
            where
                -- Create a Filter Operation to return only RTuples with table_name starting with a B
                myFilter = RFilter (    t ->   let 
                                                    tbname = case toText (t <!> "TABLE_NAME") of
                                                                Just t -> t
                                                                Nothing -> pack ""
                                                in (T.take 1 tbname) == (pack "B")
                                    )
                -- Create a Projection Operation that projects only two columns
                myProjection = RPrj ["OWNER", "TABLE_NAME", "LAST_ANALYZED"]

        -- Create an Inner Join for tables analyzed in the same day
        myJoin :: ROperation
        myJoin = RInJoin (  t1 t2 -> 
                                let
                                    RTime {rtime = RTimestampVal {year = y1, month = m1, day = d1, hours24 = hh1, minutes = mm1, seconds = ss1}} = t1<!>"LAST_ANALYZED"
                                    RTime {rtime = RTimestampVal {year = y2, month = m2, day = d2, hours24 = hh2, minutes = mm2, seconds = ss2}} = t2<!>"LAST_ANALYZED"
                                in y1 == y2 && m1 == m2 && d1 == d2
                        )

And here is the output:

:l .srcRTable/example.hs
:set -XOverloadedStrings
main
How many rows you want to print from the source table? :

10
---------------------------------------------------------------------------------------------------------------------------------
OWNER           TABLE_NAME                        TABLESPACE_NAME     STATUS     NUM_ROWS     BLOCKS     LAST_ANALYZED
~~~~~           ~~~~~~~~~~                        ~~~~~~~~~~~~~~~     ~~~~~~     ~~~~~~~~     ~~~~~~     ~~~~~~~~~~~~~
APEX_030200     SYS_IOT_OVER_71833                SYSAUX              VALID      0            0          06082012 16:22:36
APEX_030200     WWV_COLUMN_EXCEPTIONS             SYSAUX              VALID      3            3          06082012 16:22:33
APEX_030200     WWV_FLOWS                         SYSAUX              VALID      10           3          06082012 22:01:21
APEX_030200     WWV_FLOWS_RESERVED                SYSAUX              VALID      0            0          06082012 16:22:33
APEX_030200     WWV_FLOW_ACTIVITY_LOG1$           SYSAUX              VALID      1            29         07202012 19:07:57
APEX_030200     WWV_FLOW_ACTIVITY_LOG2$           SYSAUX              VALID      14           29         07202012 19:07:57
APEX_030200     WWV_FLOW_ACTIVITY_LOG_NUMBER$     SYSAUX              VALID      1            3          07202012 19:08:00
APEX_030200     WWV_FLOW_ALTERNATE_CONFIG         SYSAUX              VALID      0            0          06082012 16:22:33
APEX_030200     WWV_FLOW_ALT_CONFIG_DETAIL        SYSAUX              VALID      0            0          06082012 16:22:33
APEX_030200     WWV_FLOW_ALT_CONFIG_PICK          SYSAUX              VALID      37           3          06082012 16:22:33


10 rows returned
---------------------------------------------------------------------------------------------------------------------------------

These are the tables that start with a B:

-------------------------------------------------------------
OWNER      TABLE_NAME                LAST_ANALYZED
~~~~~      ~~~~~~~~~~                ~~~~~~~~~~~~~
DBSNMP     BSLN_BASELINES            04152018 16:14:51
DBSNMP     BSLN_METRIC_DEFAULTS      06082012 16:06:41
DBSNMP     BSLN_STATISTICS           04152018 17:41:33
DBSNMP     BSLN_THRESHOLD_PARAMS     06082012 16:06:41
SYS        BOOTSTRAP$                04142014 13:53:43


5 rows returned
-------------------------------------------------------------

These are the tables that were analyzed the same day:

-------------------------------------------------------------------------------------------------------------------------------------
OWNER           TABLE_NAME                     LAST_ANALYZED           OWNER_1     TABLE_NAME_1              LAST_ANALYZED_1
~~~~~           ~~~~~~~~~~                     ~~~~~~~~~~~~~           ~~~~~~~     ~~~~~~~~~~~~              ~~~~~~~~~~~~~~~
APEX_030200     SYS_IOT_OVER_71833             06082012 16:22:36     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     SYS_IOT_OVER_71833             06082012 16:22:36     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_COLUMN_EXCEPTIONS          06082012 16:22:33     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_COLUMN_EXCEPTIONS          06082012 16:22:33     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_FLOWS                      06082012 22:01:21     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_FLOWS                      06082012 22:01:21     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_FLOWS_RESERVED             06082012 16:22:33     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_FLOWS_RESERVED             06082012 16:22:33     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_FLOW_ALTERNATE_CONFIG      06082012 16:22:33     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_FLOW_ALTERNATE_CONFIG      06082012 16:22:33     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_FLOW_ALT_CONFIG_DETAIL     06082012 16:22:33     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_FLOW_ALT_CONFIG_DETAIL     06082012 16:22:33     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41
APEX_030200     WWV_FLOW_ALT_CONFIG_PICK       06082012 16:22:33     DBSNMP      BSLN_THRESHOLD_PARAMS     06082012 16:06:41
APEX_030200     WWV_FLOW_ALT_CONFIG_PICK       06082012 16:22:33     DBSNMP      BSLN_METRIC_DEFAULTS      06082012 16:06:41


14 rows returned
-------------------------------------------------------------------------------------------------------------------------------------

Check the output CSV file

$ head ./app/result-data.csv
OWNER,TABLE_NAME,LAST_ANALYZED
DBSNMP,BSLN_BASELINES,04152018 16:14:51
DBSNMP,BSLN_METRIC_DEFAULTS,06082012 16:06:41
DBSNMP,BSLN_STATISTICS,04152018 17:41:33
DBSNMP,BSLN_THRESHOLD_PARAMS,06082012 16:06:41
SYS,BOOTSTRAP$,04142014 13:53:43
Synopsis

The Relational Table Concept

RTable Data Types

type RTable = Vector RTuple Source #

Definition of the Relational Table entity An RTable is a "container" of RTuples.

type RTuple = HashMap ColumnName RDataType Source #

Definition of the Relational Tuple. An RTuple is implemented as a HashMap of (ColumnName, RDataType) pairs. This ensures fast access of the column value by column name. Note that this implies that the RTuple CANNOT have more than one columns with the same name (i.e. hashmap key) and more importantly that it DOES NOT have a fixed order of columns, as it is usual in RDBMS implementations. This gives us the freedom to perform column change operations very fast. The only place were we need fixed column order is when we try to load an RTable from a fixed-column structure such as a CSV file. For this reason, we have embedded the notion of a fixed column-order in the RTuple metadata. See RTupleMData.

data RDataType Source #

Definition of the Relational Data Type. This is the data type of the values stored in each RTable. This is a strict data type, meaning whenever we evaluate a value of type RDataType, there must be also evaluated all the fields it contains.

Constructors

RInt 

Fields

RText 

Fields

RDate 

Fields

RTime 

Fields

RDouble 

Fields

Null 
Instances
Eq RDataType Source #

We need to explicitly specify equation of RDataType due to SQL NULL logic (i.e., anything compared to NULL returns false): Null == _ = False, _ == Null = False, Null /= _ = False, _ /= Null = False. IMPORTANT NOTE: Of course this means that anywhere in your code where you have something like this: x == Null or x /= Null, will always return False and thus it is futile to do this comparison. You have to use the is isNull function instead.

Instance details

Defined in RTable.Core

Fractional RDataType Source #

In order to be able to use (/) with RDataType

Instance details

Defined in RTable.Core

Num RDataType Source # 
Instance details

Defined in RTable.Core

Ord RDataType Source # 
Instance details

Defined in RTable.Core

Read RDataType Source # 
Instance details

Defined in RTable.Core

Show RDataType Source # 
Instance details

Defined in RTable.Core

Generic RDataType Source # 
Instance details

Defined in RTable.Core

Associated Types

type Rep RDataType :: Type -> Type #

NFData RDataType Source #

In order to be able to force full evaluation up to Normal Form (NF) https://www.fpcomplete.com/blog/2017/09/all-about-strictness

Instance details

Defined in RTable.Core

Methods

rnf :: RDataType -> () #

type Rep RDataType Source # 
Instance details

Defined in RTable.Core

data RTimestamp Source #

Basic data type to represent time. This is a strict data type, meaning whenever we evaluate a value of type RTimestamp, there must be also evaluated all the fields it contains.

Constructors

RTimestampVal 

Fields

Instances
Eq RTimestamp Source # 
Instance details

Defined in RTable.Core

Ord RTimestamp Source # 
Instance details

Defined in RTable.Core

Read RTimestamp Source # 
Instance details

Defined in RTable.Core

Show RTimestamp Source # 
Instance details

Defined in RTable.Core

Generic RTimestamp Source # 
Instance details

Defined in RTable.Core

Associated Types

type Rep RTimestamp :: Type -> Type #

NFData RTimestamp Source #

In order to be able to force full evaluation up to Normal Form (NF)

Instance details

Defined in RTable.Core

Methods

rnf :: RTimestamp -> () #

type Rep RTimestamp Source # 
Instance details

Defined in RTable.Core

RTable Metadata Data Types

data RTableMData Source #

Metadata for an RTable

Constructors

RTableMData 

Fields

Instances
Eq RTableMData Source # 
Instance details

Defined in RTable.Core

Show RTableMData Source # 
Instance details

Defined in RTable.Core

type RTupleMData = (HashMap ColumnOrder ColumnName, HashMap ColumnName ColumnInfo) Source #

Basic Metadata of an RTuple. The RTuple metadata are accessed through a HashMap ColumnName ColumnInfo structure. I.e., for each column of the RTuple, we access the ColumnInfo structure to get Column-level metadata. This access is achieved by ColumnName. However, in order to provide the "impression" of a fixed column order per tuple (see RTuple definition), we provide another HashMap, the HashMap ColumnOrder ColumnName. So in the follwoing example, if we want to access the RTupleMData tupmdata ColumnInfo by column order, (assuming that we have N columns) we have to do the following:

     (snd tupmdata)!((fst tupmdata)!0)
     (snd tupmdata)!((fst tupmdata)!1)
     ...
     (snd tupmdata)!((fst tupmdata)!(N-1))

In the same manner in order to access the column of an RTuple (e.g., tup) by column order, we do the following:

     tup!((fst tupmdata)!0)
     tup!((fst tupmdata)!1)
     ...
     tup!((fst tupmdata)!(N-1))

data ColumnInfo Source #

Basic metadata for a column of an RTuple

Constructors

ColumnInfo 
Instances
Eq ColumnInfo Source # 
Instance details

Defined in RTable.Core

Show ColumnInfo Source # 
Instance details

Defined in RTable.Core

type Name = Text Source #

Definition of the Name type

type ColumnName = Name Source #

Definition of the Column Name

type RTableName = Name Source #

Definition of the Table Name

data ColumnDType Source #

This is used only for metadata purposes (see ColumnInfo). The actual data type of a value is an RDataType The Text component of Date and Timestamp data constructors is the date format e.g., "DD/MM/YYYY", "DD/MM/YYYY HH24:MI:SS"

Instances
Eq ColumnDType Source # 
Instance details

Defined in RTable.Core

Show ColumnDType Source # 
Instance details

Defined in RTable.Core

Type Classes for "Tabular Data"

class RTabular a where Source #

Basic class to represent a data type that can be turned into an RTable. It implements the concept of "tabular data"

Instances
RTabular CSV Source #

CSV data are "Tabular" data thus implement the RTabular interface

Instance details

Defined in RTable.Data.CSV

Relational Algebra Operations

Operations Data Types

data ROperation Source #

Definition of Relational Algebra operations. These are the valid operations between RTables

Constructors

ROperationEmpty 
RUnion

Union

RInter

Intersection

RDiff

Difference

RPrj

Projection

Fields

RFilter

Filter operation (an RPredicate can be any function of the signature RTuple -> Bool so it is much more powerful than a typical SQL filter expression, which is a boolean expression of comparison operators)

Fields

RInJoin

Inner Join (any type of join predicate allowed. Any function with a signature of the form: RTuple -> RTuple -> Bool is a valid join predicate. I.e., a function which returns True when two RTuples must be paired)

RLeftJoin

Left Outer Join

RRightJoin

Right Outer Join

RSemiJoin

Semi-Join

RAntiJoin

Anti-Join

RAggregate

Performs aggregation operations on specific columns and returns a singleton RTable

Fields

RGroupBy

A Group By operation The SQL equivalent is: SELECT colGrByList, aggList FROM... GROUP BY colGrByList Note that compared to SQL, we can have a more generic grouping predicate (i.e., when two RTuples should belong in the same group) than just the equality of values on the common columns between two RTuples. Also note, that in the case of an aggregation without grouping (equivalent to a single-group group by), then the grouping predicate should be: _ _ -> True

Fields

RCombinedOp

A combination of unary ROperations e.g., (p plist).(f pred) (i.e., RPrj . RFilter) , in the form of an RTable -> RTable function. In this sense we can also include a binary operation (e.g. join), if we partially apply the join to one RTable, e.g.,

(ij jpred rtab) . (p plist) . (f pred)
RBinOp

A generic binary ROperation.

ROrderBy

Order the RTuples of the RTable acocrding to the specified list of Columns. First column in the input list has the highest priority in the sorting order.

type UnaryRTableOperation = RTable -> RTable Source #

A generic unary operation on a RTable

type BinaryRTableOperation = RTable -> RTable -> RTable Source #

A generic binary operation on RTable

data RAggOperation Source #

This data type represents all possible aggregate operations over an RTable. Examples are : Sum, Count, Average, Min, Max but it can be any other "aggregation". The essential property of an aggregate operation is that it acts on an RTable (or on a group of RTuples - in the case of the RGroupBy operation) and produces a single RTuple.

An aggregate operation is applied on a specific column (source column) and the aggregated result will be stored in the target column. It is important to understand that the produced aggregated RTuple is different from the input RTuples. It is a totally new RTuple, that will consist of the aggregated column(s) (and the grouping columns in the case of an RGroupBy).

Constructors

RAggOperation 

Fields

Available Aggregate Operations

type AggFunction = ColumnName -> RTable -> RDataType Source #

Aggregation Function type. An aggregation function receives as input a source column (i.e., a ColumnName) of a source RTable and returns an aggregated value, which is the result of the aggregation on the values of the source column.

raggGenericAgg Source #

Arguments

:: AggFunction

custom aggregation function

-> ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

Returns an RAggOperation with a custom aggregation function provided as input

raggSum Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The Sum aggregate operation

raggCount Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The Count aggregate operation Count aggregation (no distinct)

raggCountDist Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The CountDist aggregate operation Count distinct aggregation (i.e., count(distinct col) in SQL). Returns the distinct number of values for this column.

raggCountStar Source #

Arguments

:: ColumnName

target column to save the result aggregated value

-> RAggOperation 

The CountStar aggregate operation Returns the number of RTuples in the RTable (i.e., count(*) in SQL)

raggAvg Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The Average aggregate operation

raggMax Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The Max aggregate operation

raggMin Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> RAggOperation 

The Min aggregate operation

raggStrAgg Source #

Arguments

:: ColumnName

source column

-> ColumnName

target column

-> Delimiter

delimiter string

-> RAggOperation 

The StrAgg aggregate operation This is known as "string_agg"" in Postgresql and "listagg" in Oracle. It aggregates the values of a text RDataType column with a specified delimiter

Predicates

type RPredicate = RTuple -> Bool Source #

A Predicate. It defines an arbitrary condition over the columns of an RTuple. It is used primarily in the filter RFilter operation and used in the filter function f.

type RGroupPredicate = RTuple -> RTuple -> Bool Source #

The Group By Predicate It defines the condition for two RTuples to be included in the same group.

type RJoinPredicate = RTuple -> RTuple -> Bool Source #

The Join Predicate. It defines when two RTuples should be paired.

data RUpsertPredicate Source #

The Upsert Predicate. It defines when two RTuples should be paired in a merge operation. The matching predicate must be applied on a specific set of matching columns. The source RTable in the Upsert operation must return a unique set of RTuples, if grouped by this set of matching columns. Otherwise an exception (UniquenessViolationInUpsert) is thrown.

Constructors

RUpsertPredicate 

Operation Execution

runUnaryROperation Source #

Arguments

:: ROperation

input ROperation

-> RTable

input RTable

-> RTable

output RTable

Execute a Unary ROperation

ropU :: ROperation -> RTable -> RTable Source #

ropU operator executes a unary ROperation. A short name for the runUnaryROperation function

runUnaryROperationRes Source #

Arguments

:: ROperation

input ROperation

-> RTable

input RTable

-> RTabResult

output: Result of operation

Execute a Unary ROperation and return an RTabResult

ropUres :: ROperation -> RTable -> RTabResult Source #

ropUres operator executes a unary ROperation. A short name for the runUnaryROperationRes function

runBinaryROperation Source #

Arguments

:: ROperation

input ROperation

-> RTable

input RTable1

-> RTable

input RTable2

-> RTable

output RTabl

Execute a Binary ROperation

ropB :: ROperation -> RTable -> RTable -> RTable Source #

ropB operator executes a binary ROperation. A short name for the runBinaryROperation function

runBinaryROperationRes Source #

Arguments

:: ROperation

input ROperation

-> RTable

input RTable1

-> RTable

input RTable2

-> RTabResult

output: Result of operation

Execute a Binary ROperation and return an RTabResult

ropBres :: ROperation -> RTable -> RTable -> RTabResult Source #

ropBres operator executes a binary ROperation. A short name for the runBinaryROperationRes function

Operation Result

type RTuplesRet = Sum Int Source #

Number of RTuples returned by an RTable operation

type RTabResult = Writer RTuplesRet RTable Source #

RTabResult is the result of an RTable operation and is a Writer Monad, that includes the new RTable, as well as the number of RTuples returned by the operation.

rtabResult Source #

Arguments

:: (RTable, RTuplesRet)

input pair

-> RTabResult

output Writer Monad

Creates an RTabResult (i.e., a Writer Monad) from a result RTable and the number of RTuples that it returned

runRTabResult :: RTabResult -> (RTable, RTuplesRet) Source #

Returns the info "stored" in the RTabResult Writer Monad

execRTabResult :: RTabResult -> RTuplesRet Source #

Returns the "log message" in the RTabResult Writer Monad, which is the number of returned RTuples

rtuplesRet :: Int -> RTuplesRet Source #

Creates an RTuplesRet type

getRTuplesRet :: RTuplesRet -> Int Source #

Return the number embedded in the RTuplesRet data type

Operation Composition

An Example of Operation Composition

>>> -- define a simple RTable with four RTuples of a single column "col1"
>>> let tab1 = rtableFromList [rtupleFromList [("col1", RInt 1)], rtupleFromList [("col1", RInt 2)], rtupleFromList [("col1", RInt 3)], rtupleFromList [("col1", RInt 4)] ]
>>> printRTable tab1
   col1
   ~~~~
   1
   2
   3
   4


   4 rows returned
   ---------
   
>>> -- define a filter operation col1 > 2
>>> let rop1 = RFilter (\t-> t<!>"col1" > 2)
>>> -- define another filter operation col1 > 3
>>> let rop2 = RFilter (\t-> t<!>"col1" > 3)
>>> -- Composition of RTable operations via (.) (rop1 returns 2 RTuples and rop2 returns 1 RTuple)
>>> printRTable $ (ropU rop2) . (ropU rop1) $ tab1
   col1
   ~~~~
   4


   1 row returned
   ---------
   
>>> -- Composition of RTabResult operations via (<=<) (Note: that the result includes the sum of the returned RTuples in each operation, i.e., 2+1 = 3)
>>> execRTabResult $ (ropUres rop2) <=< (ropUres rop1) $ tab1
Sum {getSum = 3}
>>> printRTable $ fst.runRTabResult $ (ropUres rop2) <=< (ropUres rop1) $ tab1
   col1
   ~~~~
   4


   1 row returned
   ---------
   

(.) :: (b -> c) -> (a -> b) -> a -> c infixr 9 #

Function composition.

(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c infixr 1 #

Right-to-left composition of Kleisli arrows. (>=>), with the arguments flipped.

Note how this operator resembles function composition (.):

(.)   ::            (b ->   c) -> (a ->   b) -> a ->   c
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c

RTable Functions

Relational Algebra Functions

runRfilter :: RPredicate -> RTable -> RTable Source #

Executes an RFilter operation

f :: RPredicate -> RTable -> RTable Source #

Filter (i.e. selection operator). A short name for the runRFilter function

runInnerJoinO :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements an Inner Join operation between two RTables (any type of join predicate is allowed) This Inner Join implementation follows Oracle DB's convention for common column names. When we have two tuples t1 and t2 with a common column name (lets say "Common"), then the resulting tuple after a join will be "Common", "Common_1", so a "_1" suffix is appended. The tuple from the left table by convention retains the original column name. So "Column_1" is the column from the right table. If "Column_1" already exists, then "Column_2" is used.

iJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

RTable Inner Join Operator. A short name for the runInnerJoinO function

runLeftJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements a Left Outer Join operation between two RTables (any type of join predicate is allowed), i.e., the rows of the left RTable will be preserved. Note that when dublicate keys encountered that is, since the underlying structure for an RTuple is a Data.HashMap.Strict, only one value per key is allowed. So in the context of joining two RTuples the value of the left RTuple on the common key will be prefered.

Implements a Left Outer Join operation between two RTables (any type of join predicate is allowed), i.e., the rows of the left RTable will be preserved. A Left Join : tabLeft LEFT JOIN tabRight ON joinPred where tabLeft is the preserving table can be defined as: the Union between the following two RTables:

  • The result of the inner join: tabLeft INNER JOIN tabRight ON joinPred
  • The rows from the preserving table (tabLeft) that DONT satisfy the join condition, enhanced with the columns of tabRight returning Null values.

The common columns will appear from both tables but only the left table column's will retain their original name.

lJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

RTable Left Outer Join Operator. A short name for the runLeftJoin function

runRightJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements a Right Outer Join operation between two RTables (any type of join predicate is allowed), i.e., the rows of the right RTable will be preserved. A Right Join : tabLeft RIGHT JOIN tabRight ON joinPred where tabRight is the preserving table can be defined as: the Union between the following two RTables:

  • The result of the inner join: tabLeft INNER JOIN tabRight ON joinPred
  • The rows from the preserving table (tabRight) that DONT satisfy the join condition, enhanced with the columns of tabLeft returning Null values.

The common columns will appear from both tables but only the right table column's will retain their original name.

rJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

RTable Right Outer Join Operator. A short name for the runRightJoin function

runFullOuterJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements a Full Outer Join operation between two RTables (any type of join predicate is allowed) A full outer join is the union of the left and right outer joins respectively. The common columns will appear from both tables but only the left table column's will retain their original name (just by convention).

foJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements a Right Outer Join operation between two RTables (any type of join predicate is allowed) i.e., the rows of the right RTable will be preserved. Note that when dublicate keys encountered that is, since the underlying structure for an RTuple is a Data.HashMap.Strict, only one value per key is allowed. So in the context of joining two RTuples the value of the right RTuple on the common key will be prefered.

RTable Full Outer Join Operator. A short name for the runFullOuterJoin function

sJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

RTable semi-join operator. A short name for the runSemiJoin function

runSemiJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements the semi-Join operation between two RTables (any type of join predicate is allowed) It returns the RTuples from the left RTable that match with the right RTable. Note that if an RTuple from the left RTable matches more than one RTuples from the right RTable the semi join operation will return only a single RTuple.

aJ :: RJoinPredicate -> RTable -> RTable -> RTable Source #

RTable anti-join operator. A short name for the runAntiJoin function

runAntiJoin :: RJoinPredicate -> RTable -> RTable -> RTable Source #

Implements the anti-Join operation between two RTables (any type of join predicate is allowed) It returns the RTuples from the left RTable that DONT match with the right RTable.

joinRTuples :: RTuple -> RTuple -> RTuple Source #

Joins two RTuples into one. In this join we follow Oracle DB's convention when joining two tuples with some common column names. When we have two tuples t1 and t2 with a common column name (lets say Common), then the resulitng tuple after a join will be Common, Common_1, so a "_1" suffix is appended. The tuple from the left table by convention retains the original column name. So Column_1 is the column from the right table. If Column_1 already exists, then Column_2 is used.

runUnion :: RTable -> RTable -> RTable Source #

Implements the union of two RTables as a union of two lists (see List). Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result

Implements the union of two RTables. Note that dublicate RTuple elimination takes places.

runUnionAll :: RTable -> RTable -> RTable Source #

Implements the union-all of two RTables. I.e., a union without dublicate RTuple elimination. Runs in O(m+n).

u :: RTable -> RTable -> RTable Source #

RTable Union Operator. A short name for the runUnion function

runIntersect :: RTable -> RTable -> RTable Source #

Implements the intersection of two RTables

i :: RTable -> RTable -> RTable Source #

RTable Intersection Operator. A short name for the runIntersect function

runDiff :: RTable -> RTable -> RTable Source #

Implements the set Difference of two RTables as the diff of two lists (see List).

d :: RTable -> RTable -> RTable Source #

RTable Difference Operator. A short name for the runDiff function

runProjection Source #

Arguments

:: [ColumnName]

list of column names to be included in the final result RTable

-> RTable 
-> RTable 

Implements RTable projection operation. If a column name does not exist, then an empty RTable is returned.

runProjectionMissedHits Source #

Arguments

:: [ColumnName]

list of column names to be included in the final result RTable

-> RTable 
-> RTable 

Implements RTable projection operation. If a column name does not exist, then the returned RTable includes this column with a Null value. This projection implementation allows missed hits.

p :: [ColumnName] -> RTable -> RTable Source #

RTable Projection operator. A short name for the runProjection function

runAggregation Source #

Arguments

:: [RAggOperation]

Input Aggregate Operations

-> RTable

Input RTable

-> RTable

Output singleton RTable

Implements the aggregation operation on an RTable It aggregates the specific columns in each AggOperation and returns a singleton RTable i.e., an RTable with a single RTuple that includes only the agg columns and their aggregated value.

rAgg :: [RAggOperation] -> RTable -> RTable Source #

Aggregation Operator. A short name for the runAggregation function

runGroupBy Source #

Arguments

:: RGroupPredicate

Grouping predicate, in order to form the groups of RTuples (it defines when two RTuples should be included in the same group)

-> [RAggOperation]

Aggregations to be applied on specific columns

-> [ColumnName]

List of grouping column names (GROUP BY clause in SQL) We assume that all RTuples in the same group have the same value in these columns

-> RTable

input RTable

-> RTable

output RTable

Implements the GROUP BY operation over an RTable.

rG :: RGroupPredicate -> [RAggOperation] -> [ColumnName] -> RTable -> RTable Source #

Group By Operator. A short name for the runGroupBy function

groupNoAggList Source #

Arguments

:: RGroupPredicate

Grouping predicate, in order to form the groups of RTuples (it defines when two RTuples should be included in the same group)

-> [ColumnName]

List of grouping column names (GROUP BY clause in SQL) We assume that all RTuples in the same group have the same value in these columns

-> RTable

input RTable

-> [RTable]

output list of RTables where each one corresponds to a group

Implement a grouping operation over an RTable. No aggregation takes place. It returns the individual groups as separate RTables in a list. In total the initial set of RTuples is retained. If an empty RTable is provided as input, then a ["empty RTable"] is returned.

groupNoAgg Source #

Arguments

:: RGroupPredicate

Grouping predicate, in order to form the groups of RTuples (it defines when two RTuples should be included in the same group)

-> [ColumnName]

List of grouping column names (GROUP BY clause in SQL) We assume that all RTuples in the same group have the same value in these columns

-> RTable

input RTable

-> RTable

output RTable

Implement a grouping operation over an RTable. No aggregation takes place. The output RTable has exactly the same RTuples, as the input, but these are grouped based on the input grouping predicate. If an empty RTable is provided as input, then an empty RTable is returned.

runOrderBy Source #

Arguments

:: [(ColumnName, OrderingSpec)]

Input ordering specification

-> RTable

Input RTable

-> RTable

Output RTable

Implements the ORDER BY operation. First column in the input list has the highest priority in the sorting order We treat Null as the maximum value (anything compared to Null is smaller). This way Nulls are send at the end (i.e., "Nulls Last" in SQL parlance). This is for Asc ordering. For Desc ordering, we have the opposite. Nulls go first and so anything compared to Null is greater. @ SQL example with q as (select case when level < 4 then level else NULL end c1 -- , level c2 from dual connect by level < 7 ) select * from q order by c1

C1 ---- 1 2 3 Null Null Null

with q as (select case when level < 4 then level else NULL end c1 -- , level c2 from dual connect by level < 7 ) select * from q order by c1 desc

rO :: [(ColumnName, OrderingSpec)] -> RTable -> RTable Source #

Order By Operator. A short name for the runOrderBy function

runCombinedROp Source #

Arguments

:: (RTable -> RTable)

input combined RTable operation

-> RTable

input RTable that the input function will be applied to

-> RTable

output RTable

runCombinedROp: A Higher Order function that accepts as input a combination of unary ROperations e.g., (p plist).(f pred) expressed in the form of a function (RTable -> Rtable) and applies this function to the input RTable. In this sense we can also include a binary operation (e.g. join), if we partially apply the join to one RTable e.g., (ij jpred rtab) . (p plist) . (f pred)

rComb :: (RTable -> RTable) -> RTable -> RTable Source #

A short name for the runCombinedROp function

Decoding

data IgnoreDefault Source #

Constructors

Ignore 
NotIgnore 
Instances
Eq IgnoreDefault Source # 
Instance details

Defined in RTable.Core

Show IgnoreDefault Source # 
Instance details

Defined in RTable.Core

decodeRTable Source #

Arguments

:: ColumnName

ColumnName key

-> RDataType

Search value

-> RDataType

Return value

-> RDataType

Default value

-> IgnoreDefault

Ignore default indicator

-> RTable

input RTable

-> RTable 

It receives an RTable, a search value and a default value. It returns a new RTable which is identical to the source one but for each RTuple, for the specified column: if the search value was found then the specified Return Value is returned else the default value is returned (if the ignore indicator is not set), otherwise (if the ignore indicator is set), it returns the existing value for the column for each RTuple. If you pass an empty RTable, then it returns an empty RTable Throws a ColumnDoesNotExist exception, if the column does not exist

decodeColValue Source #

Arguments

:: ColumnName

ColumnName key

-> RDataType

Search value

-> RDataType

Return value

-> RDataType

Default value

-> IgnoreDefault

Ignore default indicator

-> RTuple

input RTuple

-> RDataType 

It receives an RTuple and lookups the value at a specfic column name. Then it compares this value with the specified search value. If it is equal to the search value then it returns the specified Return Value. If not, then it returns the specified default Value, if the ignore indicator is not set, otherwise (if the ignore indicator is set) it returns the existing value. If you pass an empty RTuple, then it returns Null. Throws a ColumnDoesNotExist exception, if this map contains no mapping for the key.

Date/Time

toRTimestamp Source #

Arguments

:: String

Format string e.g., "DD/MM/YYYY HH:MI:SS"

-> String

Timestamp string

-> RTimestamp 

Returns an RTimestamp from an input String and a format String.

Valid format patterns are:

  • For year: YYYY, e.g., "0001", "2018"
  • For month: MM, e.g., "01", "1", "12"
  • For day: DD, e.g., "01", "1", "31"
  • For hours: HH, HH24 e.g., "00", "23" I.e., hours must be specified in 24 format
  • For minutes: MI, e.g., "01", "1", "59"
  • For seconds: SS, e.g., "01", "1", "59"

Example of a typical format string is: "DD/MM/YYYY HH:MI:SS

If no valid format pattern is found then an UnsupportedTimeStampFormat exception is thrown

createRTimestamp Source #

Arguments

:: String

Format string e.g., "DD/MM/YYYY HH24:MI:SS"

-> String

Timestamp string

-> RTimestamp 

Creates an RTimestamp data type from an input timestamp format string and a timestamp value represented as a String. Valid format patterns are:

  • For year: YYYY, e.g., "0001", "2018"
  • For month: MM, e.g., "01", "1", "12"
  • For day: DD, e.g., "01", "1", "31"
  • For hours: HH, HH24 e.g., "00", "23" I.e., hours must be specified in 24 format
  • For minutes: MI, e.g., "01", "1", "59"
  • For seconds: SS, e.g., "01", "1", "59"

Example of a typical format string is: "DD/MM/YYYY HH:MI:SS

If no valid format pattern is found then an UnsupportedTimeStampFormat exception is thrown

toUTCTime :: RTimestamp -> UTCTime Source #

Convert an RTimestamp value to a Universal Time value (UTCTime)

fromUTCTime :: UTCTime -> RTimestamp Source #

Convert a Universal Time value (UTCTime) to an RTimestamp value

rTimestampToRText Source #

Arguments

:: String

Output format e.g., "DD/MM/YYYY HH24:MI:SS"

-> RTimestamp

Input RTimestamp

-> RDataType

Output RText

rTimeStampToText: converts an RTimestamp value to RText Valid input formats are:

  • 1. "DD/MM/YYYY HH24:MI:SS"
  • 2. "YYYYMMDD-HH24.MI.SS"
  • 3. "YYYYMMDD"
  • 4. "YYYYMM"
  • 5. "YYYY"

stdTimestampFormat :: String Source #

Standard timestamp format. For example: "DDMMYYYY HH24:MI:SS"

stdDateFormat :: [Char] Source #

Standard date format

Character/Text

instrRText Source #

Arguments

:: RDataType

substring to search for

-> RDataType

string to be searched

-> Maybe Int

Position within input string of substr 1st character

Search for the first occurence of a substring within a RText string and return the 1st character position, or Nothing if the substring is not found, or if an non-text RDataType, is given as input.

instr Source #

Arguments

:: Eq a 
=> [a]

substring to search for

-> [a]

string to be searched

-> Maybe Int

Position within input string of substr 1st character

Search for the first occurence of a substring within a String and return the 1st character position, or Nothing if the substring is not found.

instrText Source #

Arguments

:: Text

substring to search for

-> Text

string to be searched

-> Maybe Int

Position within input string of substr 1st character

Search for the first occurence of a substring within a Text string and return the 1st character position, or Nothing if the substring is not found.

rdtappend :: RDataType -> RDataType -> RDataType Source #

Concatenates two Text RDataTypes, in all other cases of RDataType it returns Null.

stripRText Source #

Arguments

:: RDataType

input string

-> RDataType 

stripRText : O(n) Remove leading and trailing white space from a string. If the input RDataType is not an RText, then Null is returned

removeCharAroundRText :: Char -> RDataType -> RDataType Source #

Helper function to remove a character around (from both beginning and end) of an (RText t) value

isText :: RDataType -> Bool Source #

Returns True only if this is an RText

NULL-Related

nvlRTable Source #

Arguments

:: ColumnName

ColumnName key

-> RDataType

Default value

-> RTable

input RTable

-> RTable 

It receives an RTable and a default value. It returns a new RTable which is identical to the source one but for each RTuple, for the specified column every Null value in every RTuple has been replaced by a default value If you pass an empty RTable, then it returns an empty RTable Throws a ColumnDoesNotExist exception, if the column does not exist

nvlRTuple Source #

Arguments

:: ColumnName

ColumnName key

-> RDataType

Default value in the case of Null column values

-> RTuple

input RTuple

-> RTuple

output RTuple

It receives an RTuple and a default value. It returns a new RTuple which is identical to the source one but every Null value in the specified colummn has been replaced by a default value

isNullRTuple :: RTuple -> Bool Source #

Returns True if the input RTuple is a Null RTuple, otherwise it returns False Note that a Null RTuple has all its values equal with Null but it still has columns. This is different from an empty RTuple, which is an RTuple withi no columns and no values whatsoever. See isRTupEmpty.

isNull :: RDataType -> Bool Source #

Use this function to compare an RDataType with the Null value because due to Null logic x == Null or x /= Null, will always return False. It returns True if input value is Null

isNotNull :: RDataType -> Bool Source #

Use this function to compare an RDataType with the Null value because deu to Null logic x == Null or x /= Null, will always return False. It returns True if input value is Not Null

nvl Source #

Arguments

:: RDataType

input value

-> RDataType

default value returned if input value is Null

-> RDataType

output value

Returns the 1st parameter if this is not Null, otherwise it returns the 2nd.

nvlColValue Source #

Arguments

:: ColumnName

ColumnName key

-> RDataType

value returned if original value is Null

-> RTuple

input RTuple

-> RDataType

output value

Returns the value of a specific column (specified by name) if this is not Null. If this value is Null, then it returns the 2nd parameter. If you pass an empty RTuple, then it returns Null. Throws a ColumnDoesNotExist exception, if this map contains no mapping for the key.

Access RTable

isRTabEmpty :: RTable -> Bool Source #

Test whether an RTable is empty

headRTup :: RTable -> RTuple Source #

Get the first RTuple from an RTable

limit Source #

Arguments

:: Int

number of N RTuples to return

-> RTable

input RTable

-> RTable

output RTable

returns the N first RTuples of an RTable

isRTupEmpty :: RTuple -> Bool Source #

Test whether an RTuple is empty

getRTupColValue Source #

Arguments

:: ColumnName

ColumnName key

-> RTuple

Input RTuple

-> RDataType

Output value

getRTupColValue :: Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns Null. !!!Note that this might be confusing since there might be an existing column name with a Null value!!!

rtupLookup Source #

Arguments

:: ColumnName

ColumnName key

-> RTuple

Input RTuple

-> Maybe RDataType

Output value

Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns Nothing

rtupLookupDefault Source #

Arguments

:: RDataType

Default value to return in the case the column name does not exist in the RTuple

-> ColumnName

ColumnName key

-> RTuple

Input RTuple

-> RDataType

Output value

Returns the value of an RTuple column based on the ColumnName key if the column name is not found, then it returns a default value

(<!>) Source #

Arguments

:: RTuple

Input RTuple

-> ColumnName

ColumnName key

-> RDataType

Output value

Operator for getting a column value from an RTuple Throws a ColumnDoesNotExist exception, if this map contains no mapping for the key.

(<!!>) Source #

Arguments

:: RTuple

Input RTuple

-> ColumnName

ColumnName key

-> Maybe RDataType

Output value

Safe Operator for getting a column value from an RTuple if the column name is not found, then it returns Nothing

Conversions

rtableToList :: RTable -> [RTuple] Source #

Turns an RTable to a list of RTuples

concatRTab :: [RTable] -> RTable Source #

Concatenates a list of RTables to a single RTable. Essentially, it unions (see runUnion) all RTables of the list.

rtupleToList :: RTuple -> [(ColumnName, RDataType)] Source #

Turns an RTuple to a List

toListRDataType :: RTupleMData -> RTuple -> [RDataType] Source #

toListRDataType: returns a list of RDataType values of an RTuple, in the fixed column order of the RTuple

toText :: RDataType -> Maybe Text Source #

Return the Text out of an RDataType If a non-text RDataType is given then Nothing is returned.

Container Functions

rtabMap :: (RTuple -> RTuple) -> RTable -> RTable Source #

Map function over an RTable.

rtabFoldr' :: (RTuple -> RTable -> RTable) -> RTable -> RTable -> RTable Source #

This is a fold operation on a RTable that returns an RTable. It is similar with : foldr' :: (a -> b -> b) -> b -> Vector a -> b of Vector, which is an O(n) Right fold with a strict accumulator

rtabFoldl' :: (RTable -> RTuple -> RTable) -> RTable -> RTable -> RTable Source #

This is a fold operation on RTable that returns an RTable. It is similar with : foldl' :: (a -> b -> a) -> a -> Vector b -> a of Vector, which is an O(n) Left fold with a strict accumulator

rtupleMap :: (RDataType -> RDataType) -> RTuple -> RTuple Source #

O(n) Transform this RTuple by applying a function to every value

rtupleMapWithKey :: (ColumnName -> RDataType -> RDataType) -> RTuple -> RTuple Source #

O(n) Transform this RTuple by applying a function to every value

rdatatypeFoldr' :: (RTuple -> RDataType -> RDataType) -> RDataType -> RTable -> RDataType Source #

This is a fold operation on a RTable that returns an RDataType value. It is similar with : foldr' :: (a -> b -> b) -> b -> Vector a -> b of Vector, which is an O(n) Right fold with a strict accumulator

rdatatypeFoldl' :: (RDataType -> RTuple -> RDataType) -> RDataType -> RTable -> RDataType Source #

This is a fold operation on RTable that returns an RDataType value It is similar with : foldl' :: (a -> b -> a) -> a -> Vector b -> a of Vector, which is an O(n) Left fold with a strict accumulator

Modify RTable (DML)

insertAppendRTab :: RTuple -> RTable -> RTable Source #

O(n) append an RTuple to an RTable Please note that this is an immutable implementation of an RTable insert. This simply means that the insert operation returns a new RTable and does not affect the original RTable.

insertPrependRTab :: RTuple -> RTable -> RTable Source #

O(n) prepend an RTuple to an RTable Please note that this is an immutable implementation of an RTable insert. This simply means that the insert operation returns a new RTable and does not affect the original RTable.

insertRTabToRTab Source #

Arguments

:: RTable

Source RTable to be inserted

-> RTable

Target RTable

-> RTable

Final Result

Insert an RTable to an existing RTable. This is equivalent to an INSERT INTO SELECT caluse in SQL. We want to insert into an RTable the results of a "subquery", which in our case is materialized via the input RTable. Please note that this is an immutable implementation of an RTable insert. This simply means that the insert operation returns a new RTable and does not affect the original RTable. Also note that the source and target RTables should have the same structure. By "structure", we mean that the ColumnNames and the corresponding data types must match. Essentially what we record in the ColumnInfo must be the same for the two RTables. Otherwise a ConflictingRTableStructures exception will be thrown.

deleteRTab Source #

Arguments

:: RPredicate

Predicate specifying the Rtuples that must be deleted

-> RTable

RTable that the deletion will be applied

-> RTable

Result RTable

Delete RTuples from an RTable based on an RPredicate. Please note that this is an immutable implementation of an RTable update. This simply means that the delete operation returns a new RTable. So, the original RTable remains unchanged and no deletion in-place takes place whatsoever. Moreover, if we have multiple threads deleting an RTable, due to immutability, each thread "sees" its own copy of the RTable and thus there is no need for locking the deleted RTuples, as happens in a common RDBMS.

updateRTab Source #

Arguments

:: [(ColumnName, RDataType)]

List of column names to be updated with the corresponding new values

-> RPredicate

An RTuple -> Bool function that specifies the RTuples to be updated

-> RTable

Input RTable

-> RTable

Output RTable

Update an RTable. The input includes a list of (ColumnName, new Value) pairs. Also a filter predicate is specified, in order to restrict the update only to those RTuples that fulfill the predicate. Please note that this is an immutable implementation of an RTable update. This simply means that the update operation returns a new RTable that includes all the RTuples of the original RTable, both the ones that have been updated and the others that have not. So, the original RTable remains unchanged and no update in-place takes place whatsoever. Moreover, if we have multiple threads updating an RTable, due to immutability, each thread "sees" its own copy of the RTable and thus there is no need for locking the updated RTuples, as happens in a common RDBMS.

upsertRTab Source #

Arguments

:: RTable

Source RTable, i.e., the equivalent to an SQL USING subclause

-> RUpsertPredicate

The RTuple matching predicate for the merge operation

-> [ColumnName]

List of column names to be updated with the corresponding new values coming from the source RTuples that match with the target RTuples based on the RUpsertPredicate

-> RPredicate

A filter that specifies the target RTuples to be updated

-> RTable

The target RTable

-> RTable

Final Result

Upsert (Update+Insert, aka Merge) Operation. We provide a source RTable and a matching condition (RUpsertPredicate) to the RTuples of the target RTable. An RTuple from the target RTable might match to a single only RTuple in the source RTable, or not match at all. If it is matched to more than one RTuples then an exception (UniquenessViolationInUpsert) is thrown. When an RTuple from the target RTable is matched to a source RTuple, then the corresponding columns of the target RTuple are updated with the new values provided in the source RTuple. This takes place for the target RTuples that match but also that satisfy the input RPredicate. Thus we can restrict further with a filter the RTuples of the target RTable where the update will take place. Finally, the source RTuples that did not match to the target RTable, are inserted (appended) to the target RTable

Please note that this is an immutable implementation of an RTable upsert. This simply means that the upsert operation returns a new RTable and does not affect the original RTable. Moreover, if we have multiple threads updating an RTable, due to immutability, each thread "sees" its own copy of the RTable and thus there is no need for locking the updated RTuples, as happens in a common RDBMS.

Also note that the source and target RTables should have the same structure. By "structure", we mean that the ColumnNames and the corresponding data types must match. Essentially what we record in the ColumnInfo must be the same for the two RTables. Otherwise a ConflictingRTableStructures exception will be thrown.

 An Example:
 Source RTable: src = 
     Id  |   Msg         | Other
     ----|---------------|-------
     1   |   "hello2"    |"a"    
     2   |   "world2"    |"a"    
     3   |   "new"       |"a"    

 Target RTable: trg = 
     Id  |   Msg         | Other
     ----|---------------|-------
     1   |   "hello1"    |"b"    
     2   |   "world1"    |"b"    
     4   |   "old"       |"b"    
     5   |   "hello"     |"b"    

 >>> upsertRTab  src
                 RUpsertPredicate {matchCols = ["Id"], matchPred = \t1 t2 -> t1 <!> "Id" == t2 <!> "Id" }
                 ["Msg"]
                 (\t ->   let 
                             msg = case toText (t <!> "Msg") of
                                         Just t -> t
                                         Nothing -> pack ""
                         in (take 5 msg) == (pack "hello")
                 )  -- Msg like "hello%"
                 trg

 Result RTable: rslt = 
     Id  |   Msg         | Other
     ----|---------------|-------
     1   |   "hello2"    |"b"   (Note that only column "Msg" has been overwritten, as per the 3rd argument) 
     2   |   "world1"    |"b"    
     3   |   "new"       |"a"    
     4   |   "old"       |"b"    
     5   |   "hello"     |"b"    

updateRTuple Source #

Arguments

:: ColumnName

key where the update will take place

-> RDataType

new value

-> RTuple

input RTuple

-> RTuple

output RTuple

Update an RTuple at a specific column specified by name with a value. If the ColumnName exists, then the value is updated with the input value. If the ColumnName does not exist, then a ColumnDoesNotExist exception is thrown.

upsertRTuple Source #

Arguments

:: ColumnName

key where the upsert will take place

-> RDataType

new value

-> RTuple

input RTuple

-> RTuple

output RTuple

Upsert (update/insert) an RTuple at a specific column specified by name with a value If the cname key is not found then the (columnName, value) pair is inserted. If it exists then the value is updated with the input value.

Create/Alter RTable (DDL)

emptyRTable :: RTable Source #

emptyRTable: Create an empty RTable

createSingletonRTable :: RTuple -> RTable Source #

Creates an RTable with a single RTuple

rtableFromList :: [RTuple] -> RTable Source #

Creates an RTable from a list of RTuples

addColumn Source #

Arguments

:: ColumnName

name of the column to be added

-> RDataType

Default value of the new column. All RTuples will initially have this value in this column

-> RTable

Input RTable

-> RTable

Output RTable

addColumn: adds a column to an RTable

removeColumn Source #

Arguments

:: ColumnName

Column to be removed

-> RTable

input RTable

-> RTable

output RTable

removeColumn : removes a column from an RTable. The column is specified by ColumnName. If this ColumnName does not exist in the RTuple of the input RTable then nothing is happened, the RTuple remains intact.

emptyRTuple :: RTuple Source #

Creates an empty RTuple (i.e., one with no column,value mappings)

createNullRTuple :: [ColumnName] -> RTuple Source #

Creates a Null RTuple based on a list of input Column Names. A Null RTuple is an RTuple where all column names correspond to a Null value (Null is a data constructor of RDataType)

createRTuple Source #

Arguments

:: [(ColumnName, RDataType)]

input list of (columnname,value) pairs

-> RTuple 

createRTuple: Create an Rtuple from a list of column names and values

rtupleFromList :: [(ColumnName, RDataType)] -> RTuple Source #

Create an RTuple from a list

createRDataType Source #

Arguments

:: Typeable a 
=> a

input value

-> RDataType

output RDataType

createRDataType: Get a value of type a and return the corresponding RDataType. The input value data type must be an instance of the Typepable typeclass from Data.Typeable

Metadata Functions

createRTableMData Source #

Arguments

:: (RTableName, [(ColumnName, ColumnDType)]) 
-> [ColumnName]

Primary Key. [] if no PK exists

-> [[ColumnName]]

list of unique keys. [] if no unique keys exists

-> RTableMData 

createRTableMData : creates RTableMData from input given in the form of a list We assume that the column order of the input list defines the fixed column order of the RTuple.

getColumnNamesFromRTab :: RTable -> [ColumnName] Source #

Get the Column Names of an RTable

getColumnNamesFromRTuple :: RTuple -> [ColumnName] Source #

Returns the Column Names of an RTuple

getTheType :: RDataType -> ColumnDType Source #

Take a column value and return its type

listOfColInfoRDataType :: [ColumnInfo] -> RTuple -> [(ColumnInfo, RDataType)] Source #

Define equality for two ColumnInfo structures For two column two have "equal structure" they must have the same name and the same type. If one of the two (or both) have an UknownType, then they are still considered of equal structure.

Creates a list of the form [(ColumnInfo, RDataType)] from a list of ColumnInfo and an RTuple. The returned list respects the order of the [ColumnInfo]. It guarantees that RDataTypes will be in the same column order as [ColumnInfo], i.e., the correct RDataType for the correct column

toListColumnName :: RTupleMData -> [ColumnName] Source #

toListColumnName: returns a list of RTuple column names, in the fixed column order of the RTuple.

toListColumnInfo :: RTupleMData -> [ColumnInfo] Source #

toListColumnInfo: returns a list of RTuple columnInfo, in the fixed column order of the RTuple

rtabsSameStructure :: RTable -> RTable -> Bool Source #

Compares the structure of the input RTables and returns True if these are the same. By "structure", we mean that the ColumnNames and the corresponding data types must match. Essentially what we record in the ColumnInfo must be the same for the two RTables. Note that in the case of two columns having the same name but one of the two (or both) have a dtype equal to UknownType, then this function assumes that they are the same (i.e., equal ColumnInfos).

rtuplesSameStructure :: RTuple -> RTuple -> Bool Source #

Compares the structure of the input RTuples and returns True if these are the same. By "structure", we mean that the ColumnNames and the corresponding data types must match. Essentially what we record in the ColumnInfo must be the same for the two RTuples

getUniqueColumnNamesAfterJoin :: [ColumnName] -> [ColumnName] -> [ColumnName] Source #

Receives two lists of ColumnNames and returns the unique list of ColumnNames after concatenating the two and removing the names from the second one that are a prefix of the first one. This function is intended to dedublicate common columns after a join (see ij), where ColA for example, will also appear as ColA_1. This function DOES NOT dedublicate columns ColA and ColAsomeSuffix, only cases like this one ColName_Num (e.g., ColName_1, ColName_2, etc.) Here is an example:

>>> getUniqueColumnNames ["ColA","ColB"] ["ColC","ColA", "ColA_1", "ColA_2", "ColA_A", "ColA_hello", "ColAhello"]
>>> ["ColA","ColB","ColC","ColA_A","ColA_hello","ColAhello"]

Exceptions

data ColumnDoesNotExist Source #

This exception is thrown whenever we try to access a specific column (i.e., ColumnName) of an RTuple and the column does not exist.

data ConflictingRTableStructures Source #

This exception means that we have tried to do some operation between two RTables, which requires that the structure of the two is the same. e.g., an Insert Into TAB RTuples, or a UNION or toher set operations. By "structure", we mean that the ColumnNames and the corresponding data types must match. Essentially what we record in the ColumnInfo must be the same for the two RTables

Constructors

ConflictingRTableStructures String

Error message indicating the operation that failed.

data EmptyInputStringsInToRTimestamp Source #

Length mismatch between the format String and the input String data RTimestampFormatLengthMismatch = RTimestampFormatLengthMismatch String String deriving(Eq,Show) instance Exception RTimestampFormatLengthMismatch

One (or both) of the input Strings to function toRTimestamp are empty

data UniquenessViolationInUpsert Source #

This exception means that we have tried an Upsert operation where the source RTable does not have a unique set of Rtuples if grouped by the columns used in the matching condition. This simply means that we cannot determine which of the dublicate RTuples in the source RTable will overwrite the target RTable, when the matching condition is satisfied.

Constructors

UniquenessViolationInUpsert String

Error message

RTable IO Operations

RTable Printing and Formatting

An Example of RTable printing

>>> -- define a simple RTable from a list
>>> :set -XOverloadedStrings
>>> :{
let tab1 =  rtableFromList [   rtupleFromList [("ColInteger", RInt 1), ("ColDouble", RDouble 2.3), ("ColText", RText "We dig dig dig dig dig dig dig")]
                        ,rtupleFromList [("ColInteger", RInt 2), ("ColDouble", RDouble 5.36879), ("ColText", RText "From early morn to night")]
                        ,rtupleFromList [("ColInteger", RInt 3), ("ColDouble", RDouble 999.9999), ("ColText", RText "In a mine the whole day through")]
                        ,rtupleFromList [("ColInteger", RInt 4), ("ColDouble", RDouble 0.9999), ("ColText", RText "Is what we like to do")]
                      ]
:}                          
>>> -- print without format specification
>>> printRTable tab1
   -----------------------------------------------------------------
   ColInteger     ColText                             ColDouble
   ~~~~~~~~~~     ~~~~~~~                             ~~~~~~~~~
   1              We dig dig dig dig dig dig dig      2.30
   2              From early morn to night            5.37
   3              In a mine the whole day through     1000.00
   4              Is what we like to do               1.00

   4 rows returned
   -----------------------------------------------------------------
   
>>> -- print with format specification (define column printing order and value formatting per column)
>>> printfRTable (genRTupleFormat ["ColInteger","ColDouble","ColText"] $ genColFormatMap [("ColInteger", Format "%d"),("ColDouble", Format "%1.1e"),("ColText", Format "%50s\n")]) tab1
   -----------------------------------------------------------------
   ColInteger     ColDouble     ColText
   ~~~~~~~~~~     ~~~~~~~~~     ~~~~~~~
   1              2.3e0                             We dig dig dig dig dig dig dig

   2              5.4e0                                   From early morn to night

   3              1.0e3                            In a mine the whole day through

   4              1.0e0                                      Is what we like to do


   4 rows returned
   -----------------------------------------------------------------    
   

printRTable :: RTable -> IO () Source #

printRTable : Print the input RTable on screen

eitherPrintRTable :: Exception e => (RTable -> IO ()) -> RTable -> IO (Either e ()) Source #

Safe printRTable alternative that returns an Either, so as to give the ability to handle exceptions gracefully, during the evaluation of the input RTable. Example:

do 
 p <- (eitherPrintRTable  printRTable myRTab) :: IO (Either SomeException ())
 case p of
           Left exc -> putStrLn $ "There was an error in the Julius evaluation: " ++ (show exc)
           Right _  -> return ()

printfRTable :: RTupleFormat -> RTable -> IO () Source #

prints an RTable with an RTuple format specification. It can be used instead of printRTable when one of the following two is required:

  • a) When we want to specify the order that the columns will be printed on screen
  • b) When we want to specify the formatting of the values by using a printf-like FormatSpecifier

eitherPrintfRTable :: Exception e => (RTupleFormat -> RTable -> IO ()) -> RTupleFormat -> RTable -> IO (Either e ()) Source #

Safe printRfTable alternative that returns an Either, so as to give the ability to handle exceptions gracefully, during the evaluation of the input RTable. Example:

do 
 p <- (eitherPrintfRTable printfRTable myFormat myRTab) :: IO (Either SomeException ())
 case p of
           Left exc -> putStrLn $ "There was an error in the Julius evaluation: " ++ (show exc)
           Right _  -> return ()

data RTupleFormat Source #

Basic data type for defining the desired formatting of an RTuple when printing an RTable (see printfRTable).

Constructors

RTupleFormat 

Fields

Instances
Eq RTupleFormat Source # 
Instance details

Defined in RTable.Core

Show RTupleFormat Source # 
Instance details

Defined in RTable.Core

type ColFormatMap = HashMap ColumnName FormatSpecifier Source #

A map of ColumnName to Format Specification

data FormatSpecifier Source #

Format specifier of printf style

Constructors

DefaultFormat 
Format String 

data OrderingSpec Source #

A sum type to help the specification of a column ordering (Ascending, or Descending)

Constructors

Asc 
Desc 
Instances
Eq OrderingSpec Source # 
Instance details

Defined in RTable.Core

Show OrderingSpec Source # 
Instance details

Defined in RTable.Core

genRTupleFormat Source #

Arguments

:: [ColumnName]

Column Select list

-> ColFormatMap

Column Format Map

-> RTupleFormat

Output

Generate an RTupleFormat data type instance

genRTupleFormatDefault :: RTupleFormat Source #

Generate a default RTupleFormat data type instance. In this case the returned column order (Select list), will be unspecified and dependant only by the underlying structure of the RTuple (HashMap)

genColFormatMap :: [(ColumnName, FormatSpecifier)] -> ColFormatMap Source #

Generates a Column Format Specification

genDefaultColFormatMap :: ColFormatMap Source #

Generates a default Column Format Specification