Ticket #1605 (new proposed-project)

Opened 2 years ago

Last modified 15 months ago

A universal data store interface.

Reported by: gregweber Owned by:
Priority: not yet rated Keywords:
Cc: Topic: misc
Difficulty: unknown Mentor: not-accepted

Description (last modified by gregweber) (diff)

A lack of a high-level data store library is a huge weakness in haskell.

Data storage is absolutely critical for web development or any program that needs to persist data or whose data processing exceeds memory. Haskell web development has languished in part because one had to choose between lower-level SQL or Happstack's experimental data store that uses only the process memory. All other widely used programming languages have multiple ORMs for proven databases. Haskell needs some better database abstraction libraries.

The persistent library is a universal data store interface that already has PostgreSQL, Sqlite, MySQL, MongoDB, and experimental CouchDB backend. Most users of the Yesod web framework are using it, and it is also being used outside of web development. With some more improvements, persistent could become the go-to data store interface for haskell programmers.

We could create interfaces to more databases, but the majority of Haskell programs just need *a* database, and would be happy with a really good interface to any database. There is also a need to interface with existing SQL databases. So I would like to focus on making (SQL & MongoDB) storage layers really good. MongoDB should be easier to create a great interface for.

We have moved Persistent in the direction of universal query interface to just a universal data store serialization interface. There are many critics of query interfaces for good reasons: we will never be able to solve all use cases.

I believe future work on Persistent should continue this recent direction of allowing for raw queries. One can now finally write raw SQL queries and get them automatically properly serialized. The next step is to make them extraordinarily type-safe. That is, we know at compile time that the queries are valid. They reference columns correctly and they are valid database queries. There is already an experimental implementation of this for SQL called persistent-hssqlppp that checks the validity of SQL statements at compile time.

Persistent's biggest limitation right now is the lack of a good technique for returning a projection of the data - we always give back a full record. This issue should be explored in the GSoC, but does not have to be solved.

Persistent already has a very good Quasi-Quoted DSL for creating a schema, but another task at hand is to write a convenient Template Haskell interface for declaring a schema. This should not be difficult because we already have all the tools in place.

There are also some possibilities for integrating with existing Haskell backends. One interesting option is integration with HaskellDB or DSH - HaskellDB does not automatically serialize to a Haskell record like Persistent does.

Michael Snoyman and Greg Weber are willing to mentor, and there is a large community of users willing to help or give feedback.

Change History

Changed 16 months ago by gregweber

Thanks you for your interest in that proposal. I rushed it off a year ago. Since then we have made a lot of improvements to Persistent and the library forms a basic building block for most Yesod users and other Haskellers. Persistent offers a level of type-safety and convenience not available elsewhere (except perhaps for libraries like acid-state that are limited to in-memory storage). That being said, there are still a lot of improvements that could be made. With the effort of a GSoC volunteer we could probably get it to the point of being the go-to data storage library for Haskellers, at least those planning on using the subset of backends (likely SQL) with great support. This proposal is vague and we would need to work with you to narrow things down a bit.

Changed 16 months ago by gregweber

  • description modified (diff)

Note that we have moved Persistent in the direction of universal query interface to just a universal data store serialization interface. There are many critics of query interfaces for good reasons: we will never be able to solve all use cases.

I believe future work on Persistent should continue this recent direction of allowing for raw queries. One can now write raw SQL queries but get them automatically properly serialized. The next step is to make them extraordinarily type-safe. That is, we know at compile time that the queries are valid. They reference columns correctly and they are valid database queries. There is already an experimental implementation of this for SQL called persistent-hssqlppp that checks the validity of SQL statements at compile time.

Persistent's biggest limitation right now is the lack of a good technique for returning a projection of the data - we always give back a full record. Tackling this in the GSoC would be great. Another task at hand is to write a convenient Template Haskell interface for declaring a schema, which should only be a few days work because we already have all the tools in place through the Quasi-Quoted DSL.

Changed 16 months ago by gregweber

  • description modified (diff)

Changed 16 months ago by serras

I'm very interested in working on this project during the summer (within the SoC program). Is there any way we could discuss it? My mail is trupill AT gmail DOT com.

Changed 16 months ago by serras

I've thought a bit more about the SQL problem, and I find there are two different problems for which we need to find a balance:

  • First of all, it would be really good to be able to write SQL and get them parsed and checked for correctness. This could be achieved using Template Haskell.
  • On the other hand, I would like queries to be composable. For example, that you could say "order by my_order", and let "my_order" be some order that could have been calculated previously (or even more, say "order by (calculate_order 1 3)" where "calculate_order" returns an SQL order). I don't know well how this composability would mix up with Template Haskell.

For a few months now I've been using Squeryl  http://squeryl.org/ as an ORM for Scala. This project handles very well the construction of new queries using a DSL (you can see examples in  http://squeryl.org/selects.html), and is quite composable. The main problem for reusing what they've done is that quite a lot of times they use subclassing, and that may be dificult to port to Haskell. In any case, I think this shows a good way to start.

There are some open questions I still have to think about:

  • Should this SQL interface be available to every backend? Until now, Persistent has tried to use a consistent view for both SQL and non-SQL databases, but maybe a way to write SQL queries is only of interest for a few of them. Not supporting non-SQL databases would alleviate from the work of emulating SQL features, and just focus on making the best translation possible to SQL.
  • It can be an interesting experiment to use SQL-like monad comprehensions for writing the queries. This would create a monadic interface to database querying, which can be interesting on its own (and I think would look like Datalog in some way).

Just to tell a bit about me, as you saw I'm the current maintainer of scion-browser, and work as actively as I can in EclipseFP (the Haskell plug-in for Eclipse). A few months ago I did the transition from an in-memory database to a SQLite database for scion-browser, and I did that using Persistent. Apart from that, I made some contributions to XMonad and I'm proficient in other functional languages like Scala or Clojure.

Changed 16 months ago by gregweber

We should definitely not try to map raw SQL to non-SQL databases. I believe if we get this right, users will prefer to use a db-specific (with re-use across SQL) interface for their database. The current PersistQuery? interface will be useful for writing things like admin interfaces that will work across different database storage solutions. Please note that we will still re-use the solid PersistStore? code.

We should probably attempt this project for MongoDB first: Mongo uses a JSON structure that is inherently composable. Full query support is much easier because there are no joins: I think this can easily be completed in a week, and that the code and experience will apply directly to SQL.

I think it is possible to compose raw SQL. You just have to be able to insert your Haskell into the string (right there, not with a placeholder) and be able to easily break up the string. Write your WHERE clause separate from the WHERE string, and be able to insert it, just as you would write your condition separately in Squeryl or ActiveRecord? for Rails in place it in a function call. Perhaps having some hybrid in which WHERE is a function would work better. But I think we should start with type-safe raw SQL, which is a benefit on its own, and then see how we can work our way backwards to composable queries.

Changed 15 months ago by gregweber

Someone is making interesting progress on type-safe raw SQL:  https://gist.github.com/2019618

Changed 15 months ago by gregweber

To be clear, porting Squeryl to Haskell is essentially HaskellDB (unless you provide a more detailed explanation). Having Persistent handle serialization for HaskellDB could make a lot of sense.

Changed 15 months ago by gregweber

  • description modified (diff)
Note: See TracTickets for help on using tickets.