The perdure package

[Tags: library, program]

Perdure™, a Cognimeta product, aims to provide a simple and robust persistence mechanism for acyclic immutable data with an easily comprehensible cost-model. It persists to raw block devices.

For some classes of applications, it can replace the use of traditional DMBS and keep the data modelled in a manner that is natural for purely functional languages. It is not quite orthogonal persistence for Haskell, which would automatically turn any non-persistent program into a persistent one, but it aims to minimize the scope of changes required to make an application persistent.

The persistence process is strict, it does not persist thunks or closures, but only the fully evaluated data, which must be acyclic. It also requires some changes to the data types and algorithms. References must be inserted at some strategic places within data structures. These cut the structure into separately loadable parts. These parts should hold at least a few hundred bytes to reduce the frequency of disk reads. The application has control over the representation so it can optimize it in terms of the anticipated access patterns. A SizeRef reference type is provided which separates automatically if its size threshold is reached, and inlines the data otherwise. As a convenience, a Map type is also provided which uses SizeRef on internal tree nodes. It can grow beyond the memory size and still remain efficient for lookups and traversals. Other persistent data structures with similar properties can be built on top of Perdure™.

For the user application, dereferencing is a pure computation. This is similar to lazy loading, but the reference data types do not hold ordinary references to their referent, only their location. This allows the referent to be unloaded transparently. Also since the reference holds the location and size of the data on the storage medium, dereferencing is very simple and efficient. No index need to be accessed, we only need to check a cache in case the data is already loaded.

Persisters for data types are created using safe combinators. Serialization is done at the bit level, allowing for very compact representations.

Undetected corruption is virtually impossible even in the presence of drive failures. Each separately loadable section has a 128-bit digest, and that digest is stored in the reference(s). This approach alleviates the need for low-level storage replication such as RAID, and takes care of replication at the persistence layer level. This seems appropriate given the increasing need for geographically distributed replication.

Persistence occurs in discrete transactions. These run in the IO monad. Transactions may be requested by multiple threads but they may block until they get their turn. Within a transaction, multiple threads or sparks can be used to examine the current state and compute the next state to be persisted.

Reference counting is used to automatically reclaim unused storage.

Since persisted data is immutable, it is trivial for applications to keep some or all historical states by using a list-like type as their root persisted type. Those past states can be used for analysis or possibly to recover data lost because of an application-level error. The library includes a History data type which automatically preserves a greater number of recent states and fewer older states.

We support 32bit, 64bit, little-endian and big-endian architectures. We allow platforms to write data and generate digests in their native format for maximum speed, but they should be able to read the data written by other platforms with the necessary conversions. Each reference stores the endianness and word size of the referent representation so databases can be moved between platforms without any conversion, and they could be read concurrently by different platforms.

A general mechanism Database.Perdure.Rev is provided to upgrade the persisted types. Its goal is similar to that of the safecopy package. Here however, the Rev module simply exports a type that is similar to Either but whose persister leaves room in the representation to accommodate future versions. We use it to create growing lists of alternatives such as User1.User :> User0.User :> NoRev. This type should always be your root persisted type. You should also use it on the nested parts of your data whose type is more subject to change: this will avoid having to propagate all type changes up to the root.

This library is young and there remains limitations that will need to be addressed in future releases:

Given those limitations, Perdure™ is not applicable for very large scale projects at the moment. But it can be ideal for smaller projects where there is no point in burdening the developer with a distinct data model. It can also be used as a temporary solution before integrating to external databases.

Cognimeta's Iota Charts web application https://iotacharts.com, is based on Perdure™ and has been has been live for close to a year. Its database is relatively small at ~80MB currently, but we have been very pleased with the results.

Perdure™ has been developed by Cognimeta Inc. over the past two years. We are releasing this as open source under the permissive Apache license 2.0. The persistence mechanism is relatively simple and concise and its open source nature can provide the inquisitive user with added confidence about the security of its data. Also Cognimeta has been using Haskell exclusively, and has benefited from many excellent open source libraries. We are happy to contribute back, and are hoping for constructive and critical feedback from this very bright community.


Properties

Versions0.1.0, 0.1.1, 0.1.2, 0.2.0, 0.2.1
Dependenciesarray (>=0.4.0.0), base (>=4.5.0.0 && <5), binary (>=0.5.1.0), bytestring (>=0.9.2.1), cognimeta-utils (==0.1.0), collections-api (>=1.0.0.0), comonad-transformers (>=2.1.1.1), containers (>=0.4.2.1), cryptohash (==0.7.2), data-binary-ieee754 (>=0.4.2.1), data-lens (>=2.9.0), data-lens-fd (>=2.0.2), data-lens-template (>=2.1.5), deepseq (>=1.3.0.0), deepseq-th (>=0.1.0.3), filepath (>=1.3.0.0), ghc-prim (>=0.2.0.0), lrucache (>=1.1.1), MonadRandom (==0.1.3), mtl (>=2.1.2), perdure (>=0.1.0), primitive (>=0.4.1), QuickCheck (>=2.4.0.1), stm (>=2.3), strict (>=0.3.2), tagged (>=0.4.2.1), template-haskell (>=2.7.0.0), time (>=1.4.0.1), transformers (>=0.3.0.0), unix (>=2.5.1.0)
LicenseOtherLicense
Copyright(c) 2010-2012 Cognimeta Inc.
AuthorPatrick Premont for Cognimeta Inc.
MaintainerPatrick Premont <ppremont@cognimeta.com>
CategoryDatabases
Home pagehttps://github.com/Cognimeta/perdure
Source repositoryhead: git clone git://github.com/Cognimeta/perdure.git
Executablesperdure
Upload dateThu Aug 30 12:06:42 UTC 2012
Uploaded byPatrickPremont
Downloads289 total (34 in last 30 days)

Modules

Downloads

Maintainers' corner

For package maintainers and hackage trustees