hPDB
====
Haskell PDB file format parser.

[![Build Status](https://api.travis-ci.org/BioHaskell/hPDB.svg?branch=master)](https://travis-ci.org/BioHaskell/hPDB)
[![Hackage](https://budueba.com/hackage/hPDB)](https://hackage.haskell.org/package/hPDB)
[![Hackage Dependencies](https://img.shields.io/hackage-deps/v/hPDB.svg?style=flat)](http://packdeps.haskellers.com/feed?needle=hPDB)

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

 - below 7s for the largest entry in PDB - 1HTQ which is over 70MB
 - as compared with 11s of RASMOL 2.7.5,
 - or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser. 

Details on official releases are on [Hackage](https://hackage.haskell.org/package/hPDB)

This package is also a part of [Stackage](http://www.stackage.org/package/hPDB) - a stable subset of Hackage.

Projects for the future:
------------------------

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

* Implement basic spatial operations of RMS superposition (with SVD),
affine transform on a substructure.
* Use `lens` to facilitate access to the data structures.
    - torsion angles within protein/RNA chain.
* Add Octree to the default data structure (with automatic update.)
* Migrate out of `text-format`, since it gives portability trouble, and slows things down when printing.
* Write a combinator library for generic fast parsing.
* Checking whether GHC 7.8 improved efficiency of fixed point arithmetic,
since PDB coordinates have dynamic range of just ~2^20 bits, with smallest
step of 0.001.
* Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface
with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on [Gitter](https://gitter.im/mgajda).