Gene-CluEDO: Hox gene clustering

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:

'ghc-options: -O2' is rarely needed. Check that it is giving a real benefit and not just imposing longer compile times on your users.
Exposed modules use unallocated top-level names: BioInf

Gene Cluster Evolution Determined Order

Calculate the most likely order of genes in a gene cluster.

Apart from being an interesting problem in computational biology, it also serves as an example problem for dynamic programming over unordered sets with interfaces.

[Skip to Readme]

Properties

Versions	0.0.0.1, 0.0.0.1, 0.0.0.2
Change log	changelog.md
Dependencies	ADPfusion (>=0.5.2 && <0.5.3), ADPfusionSet (>=0.0.0 && <0.0.1), base (>=4.7 && <5.0), cmdargs (>=0.10), containers, filepath, FormalGrammars (>=0.3.1 && <0.3.2), Gene-CluEDO, log-domain (>=0.10), PrimitiveArray (>=0.8.0 && <0.8.1), PrimitiveArray-Pretty (>=0.0.0 && <0.0.1), ShortestPathProblems (>=0.0.0 && <0.0.1), text (>=1.0), vector (>=0.11) [details]
License	GPL-3.0-only
Copyright	Christian Hoener zu Siederdissen, 2017
Author	Christian Hoener zu Siederdissen, 2017
Maintainer	choener@bioinf.uni-leipzig.de
Category	Bioinformatics
Home page	https://github.com/choener/Gene-CluEDO
Bug tracker	https://github.com/choener/Gene-CluEDO/issues
Source repo	head: git clone git://github.com/choener/Gene-CluEDO
Uploaded	by ChristianHoener at 2017-04-07T10:27:14Z

Modules

[Index]

BioInf
- BioInf.GeneCluEDO
  - BioInf.GeneCluEDO.EdgeProb

Downloads

Gene-CluEDO-0.0.0.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

ChristianHoener

For package maintainers and hackage trustees

edit package information

Readme for Gene-CluEDO-0.0.0.1

[back to package description]

generalized Algebraic Dynamic Programming Homepage

Gene-CluEDO: Gene Cluster Evolution Determined Order

Hoener zu Siederdissen, Christian and Prohaska, Sonja J. and Stadler, Peter F.
Dynamic Programming for Set Data Types
2014, Lecture Notes in Bioinformatics, 8826,
preprint: http://www.bioinf.uni-leipzig.de/~choener/pdfs/hoe-pro-2014.pdf
Hoener zu Siederdissen, Christian and Prohaska, Sonja J. and Stadler, Peter F.
Algebraic Dynamic Programming over General Data Structures
2015, BMC Bioinformatics
oa: https://doi.org/10.1186/1471-2105-16-S19-S2
Prohaska, Sonja J. and Berkemer, Sarah and Externbrink, Fabian and Gatter, Thomas
and Retzlaff, Nancy and The Students of the Graphs and Biological Networks Lab 2017
and H"oner zu Siederdissen, Christian and Stadler, Peter F.
Expansion of Gene Clusters and the Shortest Hamiltonian Path Problem
2017
preprint: http://www.bioinf.uni-leipzig.de/~choener/pdfs/pro-ber-2017.pdf

This program accepts a matrix with distances between nodes (see below for an example). It then proceeds to calculate the Hamiltonian path with the shortest distance between each pair of nodes, where the path has to travel from the start, then to all other nodes, finally stopping at the last node.

We further calculate all neighbour probabilities via Inside/Outside. This means that for any two nodes we calculate the weight of the edge between these two nodes. The weight is between [0, ... ,1] where 0 denotes the the nodes are almost surely not direct neighbours on a weighted-randomly drawn path, while 1 denotes that they almost surely are.

Finally, we calculate the probability that a node is one of the terminal nodes in the Hamiltonian path, i.e. either the first or the last node.

The Biological Problem We Solve

Wikipedia on Hox clusters.

Hox clusters are a set of genes that are linearly ordered. The genes are (assumed) to have a single originating gene, and repeated duplication has led to the cluster with unknown duplication tree.

The long time scales involved make it hard to produce a tree that can be trusted. This program therefore produces summary information in the form of edge path probabilities.

Example matrix:

In this artificial distance matrix, we have prime numbers as distances between nodes. Store the matrix in a file, say mat.dat.

#   A   B   C   D   E
A   0   2   3   5   7
B   2   0  11  13  17
C   3  11   0  19  23
D   5  13  19   0  27
E   7  17  23  27   0

Now, run the algorithm ./GeneCluEDO -o output.run ./mat.dat. After the program has run, output.run contains the a wealth of information about the input. The maximum likelihood path, the edge weights, end probabilities, and maximum expected accuracy path are calculated. Two additional files, here output.boundary.svg, and output.edge.svg are produced. The boundary plot provides graphical output of the probability that a node (or gene) is the start or end node. The edge probability plot provides probabilities for each edge (i,j) between nodes. This shows the most likely neighbors, and therefore genetic relationship, over all possible gene orders.

Contact

Christian Hoener zu Siederdissen
Leipzig University, Leipzig, Germany
choener@bioinf.uni-leipzig.de
http://www.bioinf.uni-leipzig.de/~choener/