gsc-weighting-0.2: Generic implementation of Gerstein/Sonnhammer/Chothia weighting.




gsc :: Dendrogram a -> Dendrogram (a, Distance)Source

O(n^2) Calculates the Gerstein/Sonnhammer/Chothia weights for all elements of a dendrogram. Weights are annotated to the leafs of the dendrogram while distances in branches are kept unchanged.

Distances d in branches should be non-increasing and between 0 (in the leafs) and 1. The final weights are normalized to average to 1 (i.e. sum to the number of sequences, the same they would sum if all weights were 1).

For example, suppose we have

 dendro = Branch 0.8
            (Branch 0.5
              (Branch 0.2
                (Leaf A)
                (Leaf B))
              (Leaf C))
            (Leaf D)

This is the same as GSC paper's example, however they used similarities while we are using distances (i.e. applying (1-) to the distances would give exactly their example). Then gsc dendro is

 gsc dendro == Branch 0.8
                 (Branch 0.5
                   (Branch 0.2
                     (Leaf (A,0.7608695652173914))
                     (Leaf (B,0.7608695652173914)))
                   (Leaf (C,1.0869565217391306)))
                 (Leaf (D,1.3913043478260871))

which is exactly what they calculated.