
DBFunctor: Functional Data Management
ETL/ELT* Data Processing in Haskell
DBFunctor is a Haskell library for ETL/ELT[^1] data processing of tabular data. What does this mean?
It simply means that whenever you have a data analysis, data preparation, or data transformation task and you want to do it with Haskell type-safe code, that you enjoy, love and trust so much, now you can!
Main Features
- Julius: An Embedded Domain Specific (EDSL) Language for ETL
Provides an intuitive type-level Embedded Domain Specific (EDSL) Language called Julius for expressing complex data flows (i.e., ETL flows) but also for performing SQL-like data analysis. For more info check this Julius tutorial.
- Supports all known relational operations
Julius supports all known relational operations (selection, projection, inner/outer join, grouping, ordering, aggregation, set operations etc.)
- Provides the ETL Mapping and other typical ETL constructs and operations
Julius implements typical ETL constructs such the Column Mapping and the ETL Mapping.
- Applicable to all kinds of tabular data
It is applicable to all kinds of "tabular data" (see explanation below)
- In-memory, database-less data processing
Data transformations or queries can run in-memory, within your Haskell code, without the need for a database to process your data.
- Offloading to a database for heavy queries/data transformations
In addition, a query or data transformation can be offloaded to a Database, when data don't fit in memory, or heavy data processing over large volumes of data is required. The result can be fetched into the client's memory (i.e., where your haskell code runs) in the
RTable
data structure (see below), or stored in a database staging table.
- Workflow Operations
Julius provides common workflow operations. Workflows provide the ability to combine the evaluation of several different Julius Expressions (i.e., data pipelines) in an arbitrary logic. Examples of such operations include:
- Ability to handle a failure of some operation in a Julius expression:
- retry the failed operation (after corrective actions have taken place) and continue the evaluation of the Julius expression from this point onward.
- skip the failed operation and move on with the rest operations in the pipeline.
- restart the Julius expression from the beginning
- terminate the Julius expression and skip all pending operations
- Ability to start a Julius expression based on the success or failure result of another one
- Ability to fork several different Julius expressions that will run concurrently
- Conditional execution of Julius expressions and iteration functionality
- Workflow hierarchy (i.e., flows, subflows etc.)
- "Declarative ETL"
Enables declarative ETL implementation in the same sense that SQL is declarative for querying data (see more below).
Typical examples of DBFunctor use-cases
- Build database-less Haskell apps. Build your data processing haskell apps without the need to import your data in a database for querying functionality or any for executing any data transformations. Analyze your CSV files in-place with plain haskell code (for Haskellers!).
- Data Preparation. I.e., clean-up data, calculate derived fields and variables, group by and aggregate etc., in order to feed some machine learning algorithm (for Data Scientists).
- Data Transformation. in order to transform data from Data Model A to Data Model B (typical use-case for Data Engineers who perform ETL/ELT[^1] tasks for feeding Data Warehouses or Data Marts)
- Data Exploration. Ad hoc data analysis tasks, in order to explore a data set for several purposes such as to find business insights and solve a specific business problem, or maybe to do data profiling in order to evaluate the quality of the data coming from a data source, etc (for Data Analysts).
- Business Intelligence. Build reports, or dashboards in order to share business insights with others and drive decision making process (for BI power-users)