HaskRel: HaskRel, Haskell as a DBMS with support for the relational algebra

[ database, gpl, library ] [ Propose Tags ]

HaskRel aims to define those elements of the relational theory of database management that Haskell can accommodate, thus enabling Haskell (or more precisely GHC) in its own right as a DBMS with first-class support for those parts of the relational model. It does not qualify as a proper RDBMS since it as-is only defines the relational algebra, relational variables and relational assignment. It does not define the relational calculus, views, constraints and transactions (beyond the fundamental requirement that the tuples of relations are to be unique), certain operators like relation valued aggregate operators, nor a few minor or even deprecated operators such as DIVIDE. The implemented parts are decently complete even if there are major implementation shortcomings that prevent this from being practically usable as an actual DBMS.

I refer to it as "first-class" since the types of the relational model are first-class types to Haskell, and the Haskell type system is able to induce the type resulting of relational expressions (for instance that a natural join of two relations results in a relation with a heading that is the setwise union of the headings of the original relations).

Examples

The examples in this documentation are based on "the old warhorse" that is the suppliers-parts database (see [1] for more). This gives a body of relational expressions with known results to base examples upon. See also examples/SuppliersPartsExamples.hs (not visible from this documentation) for Haskell versions of a selection of the Tutorial D expressions given as examples in chapters 6 and 7 of [1]. These can be run by starting examples/suppliersPartsDB.sh and then running snrt2ndExamples. While most Tutorial D expressions translate fairly verbatim to Haskell there are a few where one must be a bit more explicit. While most Tutorial D expressions translate fairly verbatim to Haskell there are a few where Haskell is stricter than Tutorial D and one must be a bit more explicit.

$ is always used after p/rPrint or pt/rPrintTyped in the examples to keep them uniform (and so it kinda looks like a prompt), even when not required. The short forms p and pt are used whenever there isn't a conflict with other identifiers, whereas for the SuppliersPartsExample, which has a relvar "p", rPrint is used instead of p for presentation of relational objects without type information.

Terminology

Since this builds on both Haskell and relational theory this documentation uses terms as they have been established in material related to either. Several terms of Haskell and HList have been redefined in terms of relational theory in this library, mostly to illustrate how terms and concepts have been mapped from the latter to the former. (I'm trying to keep this open to change later if it turns out to be an unhelpful crutch.)

The following table gives a quick overview of either terms or concepts as found in Haskell, the relational model (as presented in [1]), HaskRel and SQL, and how they are mapped from the second to the first:

┌───────────────────────────┬────────────────────┬────────────┬────────────────────────────────────────────────┐
│ haskell                   │ relModel           │ haskRel    │ sql                                            │
╞═══════════════════════════╪════════════════════╪════════════╪════════════════════════════════════════════════╡
│ Data.Tagged.Tagged        │ attribute          │ Attr       │ field, column                                  │
│ Data.HList.Record.Record  │ tuple              │ RTuple     │ row                                            │
│ ( Set (Record a) )        │ relation           │ Relation a │ table                                          │
│ FilePath (Set (Record a)) │ relvar             │ Relvar a   │ table                                          │
│ Data.HList.Record.Label   │ attribute name     │ Label      │ field name, column name                        │
│ Data.HList.Record.Labels  │ attribute name set │ Labels     │ list of field/column names                     │
│ function, operator        │ operator           │ function   │ operator, function, procedure, routine, method │
└───────────────────────────┴────────────────────┴────────────┴────────────────────────────────────────────────┘

Found in "example/Terminology.hs". Note that this is just an overview, study of [1] or [2], Haskell itself, HList and HaskRel is required to see how terms and concepts correlate.

The term "RTuple", or "r-tuple", is chosen to simultaneously distinguish the concept from Haskell tuples while relating it to tuples of the relational model. For the type of either "Record a" or "Set ( Record a )" in Haskell the term "heading" is used in relational theory, and "row type" or "composite type" in SQL. In relational theory the term "scalar" is used to refer to data types that are neither tuples nor relations, which corresponds to everything but "Record a" or "Set ( Record a )" in Haskell. Note also that HaskRel does use the term "table", but then only in the sense of "presentation of a relation value" (see above).

The HaskRel library

Not all modules of this library are relevant to gain an understanding of how it functions, the next part to go to at this point is Database.HaskRel.RDBMS, and the modules it reexports.

1
SQL and Relational Theory, 2nd ed. (2011), C.J. Date
2
The Third Manifesto, C. J. Date and Hugh Darwen, February 7th, 2013

[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2
Dependencies base (>=4.8 && <4.9), containers (>=0.5 && <0.6), directory (>=1.2 && <1.3), ghc-prim (>=0.4 && <0.5), HList (>=0.4 && <0.5), tagged (>=0.8 && <0.9) [details]
License GPL-2.0-only
Copyright Thor Michael Støre 2015
Author Thor Michael Støre
Maintainer thormichael@gmail.com
Category Database
Source repo head: git clone https://github.com/thormick/HaskRel
Uploaded by thormick at 2015-11-16T23:52:25Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 2717 total (9 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2015-11-17 [all 1 reports]

Readme for HaskRel-0.1.0.1

[back to package description]

HaskRel, Haskell as a DBMS with support for the relational algebra

(C) 2015, Thor Michael Støre

HaskRel aims to define those elements of the relational theory of database management that Haskell can accommodate, thus enabling Haskell (or more precisely GHC) in its own right as a DBMS with first-class support for those parts of the relational model. It does not qualify as a proper RDBMS since it as-is only defines the relational algebra, relational variables and relational assignment. It does not define the relational calculus, views, constraints and transactions (beyond the fundamental requirement that the tuples of relations are to be unique), certain operators like relation valued aggregate operators, nor a few minor or even deprecated operators such as DIVIDE. The implemented parts are decently complete even if there are major implementation shortcomings that prevent this from being practically usable as an actual DBMS.

I refer to it as first-class since the types of the relational model are first-class types to Haskell, and the Haskell type system is able to induce the type resulting of relational expressions (for instance that a natural join of two relations results in a relation with a heading that is the setwise union of the headings of the original relations).

As such it isn't as much just an implementation of the relational model built using Haskell (the way, say, PostgreSQL is an implementation of SQL using C), as a Haskell library that accommodates a subset of the relational model (unlike PostgreSQL, which doesn't turn C into an RDBMS), where GHCi is employed as a database terminal front-end.

Model

HaskRel is based on the relational model of database management, as defined by Chris Date and Hugh Darwen today. For an understanding of the foundation HaskRel builds on see for instance SQL and Relational Theory, 2nd ed., which has been the principal guide in this endeavor (I have plenty of SQL in my background so a book on SQL and the relational model has been a good introduction to the latter). See also The Third Manifesto, TTM, and the additional material on http://www.thethirdmanifesto.com/ (this might be better to start with for those mathematically inclined but not steeped in SQL). Regarding The Third Manifesto it is important to note that all parts of TTM that overlap with something preexisting in Haskell or which Haskell cannot support have been disregarded when implementing HaskRel (it's about determining where we stand, after all).

HaskRel is not based on SQL, there is no support for anything that it defines beyond what the relational model does, nor plans for that. Tutorial D has been the reference when implementing the functions of the relational algebra, although HaskRel does not satisfy the criteria of D. For a project of this kind it is sensible to follow developed theory as it stands rather than an industry standard that deviates from it. More specifically, HaskRel is intended as an exploration of the possibilities to directly employ a general purpose programming language and its associated REPL as a relational database management system, or even as a bit of an academic exercise, and a great deal of SQL is not relevant towards this end. (SQL defines for instance its own types, while HaskRel is explicitly intended to build upon Haskell's, and SQL's type system is weaker than Haskell's, in that it employs a wide range of coercions between types).

Features and aspects of SQL that go beyond or deviate from relational theory are by and large irrelevant for the purpose of HaskRel, and may even overlap or conflict with Haskell idioms that are more relevant to adhere to for a "make an RDBMS out of a general purpose programming language" project. (Parts of Tutorial D also overlap, such as the THE_ operator and selectors vs. show and read, but all in all a fraction of what SQL does.) Even if there were plans to go beyond this and support SQL it would be a good first step to first support the fundamentals of the relational model both as completely as it is reasonable to do, and to do so properly.

In addition to naming certain other aspects of Tutorial D has been adopted, such as argument order. Such aspects may be changed in the future if HaskRel is found to be solid enough to stand on its own, particularly if there are ways to express this that are more idiomatic vis-a-vis Haskell (Tutorial D is, after all, not a definition of the relational model, but a vehicle for understanding it).

Of the shortcomings of model that are mentioned above the relational calculus, database constraints, and transactions would be possible to add in a reasonably complete manner (I hope I am able to work on this). Views would be the most difficult, as far as I able to tell it is not possible to properly implement this fully within Haskell, although I don't know if some part of it can be. (This is something SQL products have had difficulties with, after all, and an area where the SQL standard is obtuse.) One further particular issue is that data definition is performed by writing, compiling and loading Haskell, which is very static compared to DDL in a proper DBMS.

Implementation

HaskRel is developed and tested with GHC 7.10.2. It builds upon HList (at least version 0.4.0.0). Separately from this library there is also an optional variant that builds on TIPs instead of records, which additionally relies on Lens (tested with version 4.12.3).

There are many missing implementation level features that database products rely on, enough so that HaskRel neither is nor will be usable as a production system, even if the shortcomings mentioned regarding the model was in place. Query optimization and indexes are the major ones, just for starters, which would require a monumental effort that there are no plans for. There is also the issue that even slightly complex queries have compile times running into several seconds. HaskRel also builds on Data.Set to manage relation bodies, although this is too limited to support a practically usable RDBMS, at least by itself.

In any case, HaskRel is very, very far from being a usable database management system, and there is as such no plan to bring it there, but I hope it's useful from an academic perspective.

Trying it out

A Haskell version of the ubiquitous suppliers-and-parts database is included as a sample database, with GHC installed this can be loaded by running suppliersPartsDB.sh in examples from the command line. This loads SuppliersPartsExample.hs, which contains Haskell renditions of Tutorial D expressions used as examples in SQL and Relational Theory, 2nd ed chapters 6 and 7. In the documentation see particularly Database.HaskRel.Relational.Expression, which "pulls together" both the algebra and assignment functions and generalizes them to operate as they should in a DBMS (with caveats), on both variables and values.

Alternatively, cabal repl will start a GHCi session with the required modules loaded, although one will have to either load a file with definitions, or define some elements to play around with. In the example below most everything is ad-hoc, predefining a few values would make certain expressions look less messy. Here are what relation constants look like:

*Database.HaskRel.RDBMS> let foo = relation' [( 10, "foo" ),( 20, "bar" ) ] :: Relation '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> let bar = relation' [( 10, "one" ),( 30, "two" ) ] :: Relation '[Attr "asdf" Int, Attr "zxcv" String]
*Database.HaskRel.RDBMS> pt foo
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> pt bar
┌─────────────┬────────────────┐
│ asdf :: Int │ zxcv :: String │
╞═════════════╪════════════════╡
│ 10          │ one            │
│ 30          │ two            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> pt$ foo `naturalJoin` bar
┌─────────────┬────────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │ zxcv :: String │
╞═════════════╪════════════════╪════════════════╡
│ 10          │ foo            │ one            │
└─────────────┴────────────────┴────────────────┘

(Note that certain operating systems and browsers have difficulties displaying the horizontal lines of tables, either single or double. On OS X Firefox has worked better than Chrome.)

As advertised, Haskell infers the type of relational expressions (mind the type synonyms though):

*Database.HaskRel.RDBMS> :t foo
foo :: Relation '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> :t bar
bar :: Relation '[Attr "asdf" Int, Attr "zxcv" String]
*Database.HaskRel.RDBMS> :t (foo `naturalJoin` bar)
(foo `nJoin` bar)
  :: containers-0.5.6.2:Data.Set.Base.Set
       (RTuple
          '[Tagged "asdf" Int, Tagged "qwer" String, Tagged "zxcv" [Char]])

Ad-hoc relvars can be defined thusly:

*Database.HaskRel.RDBMS> let baz = Relvar "baz.rv" :: Relvar '[Attr "asdf" Int, Attr "qwer" String]
*Database.HaskRel.RDBMS> assign baz (empty :: Relation '[Attr "asdf" Int, Attr "qwer" String] )
Value assigned to baz.rv

See Definition.hs of the SuppliersPartsDB for a way to do this properly.

assign against a newly defined relvar initializes it in the file system, and will create a file relative to the directory the interpreter is run from.

*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> insert baz foo
Inserted 2 of 2 tuples into baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
└─────────────┴────────────────┘
*Database.HaskRel.RDBMS> insert baz ( relation [rTuple ((Label::Label "asdf") .=. 30, (Label::Label "qwer") .=. "frotz")] )
Inserted 1 of 1 tuples into baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar            │
│ 30          │ frotz          │
└─────────────┴────────────────┘

Concise expression of updates require a set of language extensions (this in addition to DataKinds, which this module enables by default):

*Database.HaskRel.RDBMS> :set -XQuasiQuotes -XKindSignatures -XViewPatterns
*Database.HaskRel.RDBMS> update baz (\ [pun|asdf|] -> asdf > 15 ) (\ [pun|qwer|] -> case (qwer ++ "-new") of (qwer) -> [pun|qwer|])
Updated 2 of 3 tuples in baz.rv
*Database.HaskRel.RDBMS> pt baz
┌─────────────┬────────────────┐
│ asdf :: Int │ qwer :: String │
╞═════════════╪════════════════╡
│ 10          │ foo            │
│ 20          │ bar-new        │
│ 30          │ frotz-new      │
└─────────────┴────────────────┘

All functions of the relational algebra that HaskRel implement work against both relation constants and relvars, as do the print functions p and pt:

*Database.HaskRel.RDBMS> p$ baz `nJoin` bar
┌──────┬───────────┬──────┐
│ asdf │ qwer      │ zxcv │
╞══════╪═══════════╪══════╡
│ 10   │ foo       │ one  │
│ 30   │ frotz-new │ two  │
└──────┴───────────┴──────┘

Contributions

While contributions to HaskRel itself would be great, the limiting factors for HaskRel to develop further, let alone be usable, is in its foundations. The most interesting work that could be done with HaskRel is to go through it and find points where Haskell or its ecosystem could be improved to the benefit of both HaskRel and other projects, and focus development efforts on that. The two most important libraries that HaskRel builds on are HList and Data.Set; efforts towards improving them or arguments towards replacing them would be beneficial.

Making HaskRel usable for practical purposes is not on the horizon, but it may be possible to improve it further towards being a demonstrator of the relational model. A list of "things Haskell/GHC needs in order to qualify as a first class RDBMS" would also be a fun list to have, even if there isn't a chance of it happening.