Copyright  (c) Nikos Karagiannidis 2018 

License  BSD3 
Maintainer  nkarag@gmail.com 
Stability  stable 
Portability  POSIX 
Safe Haskell  None 
Language  Haskell2010 
This is an internal module (i.e., not to be imported directly) that implements the core ETL functionality that is exposed via the Julius EDSL for ETL/ELT found in the Etl.Julius module)
Synopsis
 data RColMapping
 = ColMapEmpty
  RMap1x1 { }
  RMapNx1 {
 srcColGrp :: [ColumnName]
 removeSrcCol :: YesNo
 trgCol :: ColumnName
 transformNx1 :: [RDataType] > RDataType
 srcRTupleFilter :: RPredicate
  RMap1xN {
 srcCol :: ColumnName
 removeSrcCol :: YesNo
 trgColGrp :: [ColumnName]
 transform1xN :: RDataType > [RDataType]
 srcRTupleFilter :: RPredicate
  RMapNxM {
 srcColGrp :: [ColumnName]
 removeSrcCol :: YesNo
 trgColGrp :: [ColumnName]
 transformNxM :: [RDataType] > [RDataType]
 srcRTupleFilter :: RPredicate
 type ColXForm = [RDataType] > [RDataType]
 createColMapping :: [ColumnName] > [ColumnName] > ColXForm > YesNo > RPredicate > RColMapping
 data ETLOperation
 = ETLrOp {
 rop :: ROperation
  ETLcOp {
 cmap :: RColMapping
 = ETLrOp {
 data ETLMapping
 = ETLMapEmpty
  ETLMapLD {
 etlOp :: ETLOperation
 tabL :: ETLMapping
 tabR :: RTable
  ETLMapRD {
 etlOp :: ETLOperation
 tabLrd :: RTable
 tabRrd :: ETLMapping
  ETLMapBal { }
 data YesNo
 runCM :: RColMapping > RTable > RTable
 etlOpU :: ETLOperation > RTable > RTable
 etlOpB :: ETLOperation > RTable > RTable > RTable
 etl :: ETLMapping > RTable
 etlRes :: ETLMapping > RTabResult
 rtabToETLMapping :: RTable > ETLMapping
 createLeafETLMapLD :: ETLOperation > RTable > ETLMapping
 createLeafBinETLMapLD :: ETLOperation > RTable > RTable > ETLMapping
 connectETLMapLD :: ETLOperation > RTable > ETLMapping > ETLMapping
Basic Data Types
data RColMapping Source #
This is the basic data type to define the columntocolumn mapping from a source RTable
to a target RTable
.
Essentially, an RColMapping
represents the columnlevel transformations of an RTuple
that will yield a target RTuple
.
A mapping is simply a triple of the form ( SourceColumn(s), TargetColumn(s), Transformation, RTupleFilter), where we define the source columns
over which a transformation (i.e. a function) will be applied in order to yield the target columns. Also, an RPredicate
(i.e. a filter) might be applied on the source RTuple
.
Remember that an RTuple
is essentially a mapping between a key (the Column Name) and a value (the RDataType
value). So the various RColMapping
data constructors below simply describe the possible modifications of an RTuple
orginating from its own columns.
So, we can have the following mapping types:
a) singlesource column to singletarget column mapping (1 to 1),
the source column will be removed or not based on the removeSrcCol
flag (dublicate column names are not allowed in an RTuple
)
b) multiplesource columns to singletarget column mapping (N to 1),
The N columns will be merged to the single target column based on the transformation.
The N columns will be removed from the RTuple or not based on the removeSrcCol
flag (dublicate column names are not allowed in an RTuple
)
c) singlesource column to multipletarget columns mapping (1 to M)
the source column will be "expanded" to M target columns based ont he transformation.
the source column will be removed or not based on the removeSrcCol
flag (dublicate column names are not allowed in an RTuple
)
d) multiplesource column to multiple target columns mapping (N to M)
The N source columns will be mapped to M target columns based on the transformation.
The N columns will be removed from the RTuple or not based on the removeSrcCol
flag (dublicate column names are not allow in an RTuple
)
Some examples of mapping are the following:
(Start_Date, No, StartDate, t > True)  copy the source value to target and dont remove the source column, so the target RTuple will have both columns Start_Date and StartDate  with the exactly the same value) ([Amount, Discount], Yes, FinalAmount, ([a, d] > a * d) )  FinalAmount is a derived column based on a function applied to the two source columns.  In the final RTuple we remove the two source columns.
An RColMapping
can be applied with the runCM
(runColMapping) operator
ColMapEmpty  
RMap1x1  singlesource column to singletarget column mapping (1 to 1). 
 
RMapNx1  multiplesource columns to singletarget column mapping (N to 1) 
 
RMap1xN  singlesource column to multipletarget columns mapping (1 to N) 
 
RMapNxM  multiplesource column to multiple target columns mapping (N to M) 

type ColXForm = [RDataType] > [RDataType] Source #
A Column Transformation function data type.
It is used in order to define an arbitrary columnlevel transformation (i.e., from a list of N input ColumnValues we produce a list of M derived (output) ColumnValues).
A Column value is represented with the RDataType
.
:: [ColumnName]  List of source column names 
> [ColumnName]  List of target column names 
> ColXForm  Column Transformation function 
> YesNo  Remove source column option 
> RPredicate  Filtering predicate 
> RColMapping  Output Column Mapping 
Constructs an RColMapping. This is the suggested method for creating a column mapping and not by calling the data constructors directly.
data ETLOperation Source #
An ETL operation applied to an RTable can be either an ROperation
(a relational agebra operation like join, filter etc.) defined in RTable.Core module,
or an RColMapping
applied to an RTable
ETLrOp  
 
ETLcOp  

data ETLMapping Source #
ETLmapping : it is the equivalent of a mapping in an ETL tool and consists of a series of ETLOperations that are applied, onebyone, to some initial input RTable, but if binary ETLOperations are included in the ETLMapping, then there will be more than one input RTables that the ETLOperations of the ETLMapping will be applied to. When we apply (i.e., run) an ETLOperation of the ETLMapping we get a new RTable, which is then inputed to the next ETLOperation, until we finally run all ETLOperations. The purpose of the execution of an ETLMapping is to produce a single new RTable as the result of the execution of all the ETLOperations of the ETLMapping. In terms of database operations an ETLMapping is the equivalent of an CREATE AS SELECT (CTAS) operation in an RDBMS. This means that anything that can be done in the SELECT part (i.e., column projection, row filtering, grouping and join operations, etc.) in order to produce a new table, can be included in an ETLMapping.
An ETLMapping is executed with the etl (runETLmapping) operator
Implementation: An ETLMapping is implemented as a binary tree where the node represents the ETLOperation to be executed and the left branch is another ETLMapping, while the right branch is an RTable (that might be empty in the case of a Unary ETLOperation). Execution proceeds from bottomleft to topright. This is similar in concept to a leftdeep join tree. In a LeftDeep ETLOperation tree the "pipe" of ETLOperations comes from the left branches always. The leaf node is always an ETLMapping with an ETLMapEmpty in the left branch and an RTable in the right branch (the initial RTable inputed to the ETLMapping). In this way, the result of the execution of each ETLOperation (which is an RTable) is passed on to the next ETLOperation. Here is an example:
A LeftDeep ETLOperation Tree final RTable result / etlOp3 / etlOp2 rtab2 / A leafnode > etlOp1 emptyRTab / ETLMapEmpty rtab1
You see that always on the left branch we have an ETLMapping data type (i.e., a leftdeep ETLOperation tree). So how do we implement the following case?
final RTable result / A leafnode > etlOp1 / rtab1 rtab2
The answer is that we "model" the left RTable (rtab1 in our example) as an ETLMapping of the form:
ETLMapLD { etlOp = ETLcOp{cmap = ColMapEmpty}, tabL = ETLMapEmpty, tabR = rtab1 }
So we embed the rtab1 in a ETLMapping, which is a leaf (i.e., it has an empty prevMap), the rtab1 is in
the right branch (tabR) and the ETLOperation is the EmptyColMapping, which returns its input RTable when executed.
We can use function rtabToETLMapping
for this job. So it becomes
A leafnode > etlOp1
/
rtabToETLMapping rtab1 rtab2
In this manner, a leafnode can also be implemented like this:
final RTable result / etlOp3 / etlOp2 rtab2 / A leafnode > etlOp1 emptyRTab / rtabToETLMapping rtab1 emptyRTable
ETLMapEmpty  an empty node 
ETLMapLD  a LeftDeep node 
 
ETLMapRD  a RightDeep node 
 
ETLMapBal  a Balanced node 

Instances
Eq ETLMapping Source #  
Defined in Etl.Internal.Core (==) :: ETLMapping > ETLMapping > Bool # (/=) :: ETLMapping > ETLMapping > Bool # 
Execution of an ETL Mapping
runCM :: RColMapping > RTable > RTable Source #
runCM operator executes an RColMapping If a targetcolumn has the same name with a sourcecolumn and a DontRemoveSrc (i.e., removeSrcCol == No) has been specified, then the (targetcolumn, targetvalue) keyvalue pair, overwrites the corresponding (sourcecolumn, sourcevalue) keyvalue pair
etl :: ETLMapping > RTable Source #
This operator executes an ETLMapping
:: ETLMapping  input ETLMapping 
> RTabResult  output RTabResult 
This operator executes an ETLMapping
and returns the RTabResult
Writer
Monad
that embedds apart from the resulting RTable, also the number of RTuple
s returned
Functions for "Building" an ETL Mapping
rtabToETLMapping :: RTable > ETLMapping Source #
Model an RTable
as an ETLMapping
which when executed will return the input RTable
:: ETLOperation  ETL operation of this ETL mapping 
> RTable  input RTable 
> ETLMapping  output ETLMapping 
Creates a leftdeep leaf ETL Mapping, of the following form:
A LeftDeep ETLOperation Tree final RTable result / etlOp3 / etlOp2 rtab2 / A leafnode > etlOp1 emptyRTab / ETLMapEmpty rtab1
createLeafBinETLMapLD Source #
:: ETLOperation  ETL operation of this ETL mapping 
> RTable  input RTable1 
> RTable  input RTable2 
> ETLMapping  output ETLMapping 
creates a Binary operation leaf node of the form:
A leafnode > etlOp1 / rtabToETLMapping rtab1 rtab2
:: ETLOperation  ETL operation of this ETL Mapping 
> RTable  Right RTable (right branch) (if this is a Unary ETL mapping this should be an emptyRTable) 
> ETLMapping  Previous ETL mapping (left branch) 
> ETLMapping  New ETL Mapping, which has added at the end the new node 
Connects an ETL Mapping to a leftdeep ETL Mapping tree, of the form
A LeftDeep ETLOperation Tree final RTable result / etlOp3 / etlOp2 rtab2 / A leafnode > etlOp1 emptyRTab / ETLMapEmpty rtab1
Example:
 connect a Unary ETL mapping (etlOp2) etlOp2 / etlOp1 emptyRTab => connectETLMapLD etlOp2 emptyRTable prevMap  connect a Binary ETL Mapping (etlOp3) etlOp3 / etlOp2 rtab2 => connectETLMapLD etlOp3 rtab2 prevMap
Note that the right branch (RTable) appears first in the list of input arguments of this function and the left branch (ETLMapping) appears second. This is strange, and one could thought that it is a mistake (i.e., the left branch should appear first and the right branch second) since we are reading from left to right. However this was a deliberate choice, so that we leave the left branch (which is the connection point with the previous ETLMapping) as the last argument, and thus we can partially apply the argumenets and get a new function with input parameter only the previous mapping. This is very helpfull in function composition