DBFunctor: DBFunctor - Functional Data Management => ETL/ELT Data Processing in Haskell

[ bsd3, database, etl, library, program ] [ Propose Tags ]

Please see the README on Github at https://github.com/nkarag/haskell-DBFunctor


[Skip to Readme]
Versions 0.1.0.0
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring, cassava, cereal, containers, DBFunctor, deepseq, either, MissingH, text, transformers, unordered-containers, vector [details]
License BSD-3-Clause
Copyright 2018 Nikos Karagiannidis
Author Nikos Karagiannidis
Maintainer nkarag@gmail.com
Revised Revision 1 made by nkarag at Wed Oct 10 10:18:43 UTC 2018
Category ETL
Home page https://github.com/nkarag/haskell-DBFunctor#readme
Bug tracker https://github.com/nkarag/haskell-DBFunctor/issues
Source repo head: git clone https://github.com/nkarag/haskell-DBFunctor/
Uploaded by nkarag at Wed Oct 10 08:52:01 UTC 2018
Distributions NixOS:0.1.0.0, Stackage:0.1.0.0
Executables dbfunctor-example
Downloads 34 total (34 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-10-10 [all 1 reports]
Hackage Matrix CI

Modules

[Index] [Quick Jump]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

For package maintainers and hackage trustees


Readme for DBFunctor-0.1.0.0

[back to package description]

dbfunctor logo

DBFunctor: Functional Data Management

ETL/ELT* Data Processing in Haskell

DBFunctor is a Haskell library for ETL/ELT[^1] data processing of tabular data. What does this mean? It simply means that whenever you have a data analysis, data preparation, or data transformation task and you want to do it with Haskell type-safe code, that you enjoy, love and trust so much, now you can!

Main Features

  1. Julius: An Embedded Domain Specific (EDSL) Language for ETL Provides an intuitive type-level Embedded Domain Specific (EDSL) Language called Julius for expressing complex data flows (i.e., ETL flows) but also for performing SQL-like data analysis. For more info check this Julius tutorial.
  2. Supports all known relational operations Julius supports all known relational operations (selection, projection, inner/outer join, grouping, ordering, aggregation, set operations etc.)
  3. Provides the ETL Mapping and other typical ETL constructs and operations Julius implements typical ETL constructs such the Column Mapping and the ETL Mapping.
  4. Applicable to all kinds of tabular data It is applicable to all kinds of "tabular data" (see explanation below)
  5. In-memory, database-less data processing Data transformations or queries can run in-memory, within your Haskell code, without the need for a database to process your data.
  6. Offloading to a database for heavy queries/data transformations In addition, a query or data transformation can be offloaded to a Database, when data don't fit in memory, or heavy data processing over large volumes of data is required. The result can be fetched into the client's memory (i.e., where your haskell code runs) in the RTable data structure (see below), or stored in a database staging table.
    1. Workflow Operations Julius provides common workflow operations. Workflows provide the ability to combine the evaluation of several different Julius Expressions (i.e., data pipelines) in an arbitrary logic. Examples of such operations include:
    • Ability to handle a failure of some operation in a Julius expression:
      • retry the failed operation (after corrective actions have taken place) and continue the evaluation of the Julius expression from this point onward.
      • skip the failed operation and move on with the rest operations in the pipeline.
      • restart the Julius expression from the beginning
      • terminate the Julius expression and skip all pending operations
    • Ability to start a Julius expression based on the success or failure result of another one
    • Ability to fork several different Julius expressions that will run concurrently
    • Conditional execution of Julius expressions and iteration functionality
    • Workflow hierarchy (i.e., flows, subflows etc.)
    1. "Declarative ETL" Enables declarative ETL implementation in the same sense that SQL is declarative for querying data (see more below).

Typical examples of DBFunctor use-cases

  • Build database-less Haskell apps. Build your data processing haskell apps without the need to import your data in a database for querying functionality or any for executing any data transformations. Analyze your CSV files in-place with plain haskell code (for Haskellers!).
  • Data Preparation. I.e., clean-up data, calculate derived fields and variables, group by and aggregate etc., in order to feed some machine learning algorithm (for Data Scientists).
  • Data Transformation. in order to transform data from Data Model A to Data Model B (typical use-case for Data Engineers who perform ETL/ELT[^1] tasks for feeding Data Warehouses or Data Marts)
  • Data Exploration. Ad hoc data analysis tasks, in order to explore a data set for several purposes such as to find business insights and solve a specific business problem, or maybe to do data profiling in order to evaluate the quality of the data coming from a data source, etc (for Data Analysts).
  • Business Intelligence. Build reports, or dashboards in order to share business insights with others and drive decision making process (for BI power-users)