The clustertools package

[Tags: gpl, program]

This is a bunch of stuff I needed at some for manipulating sequence clusters. See the README for details. The tools included are:

The Darcs repository is at:

[Skip to ReadMe]


Versions0.1, 0.1.1, 0.1.2, 0.1.5
Change logNone available
Dependenciesbase (>3), bio (>=0.3.3), bytestring, containers, regex-compat, simpleargs (>=0.1) [details]
AuthorKetil Malde
MaintainerKetil Malde <>
Home page
Executablesclusterlibs, ace2fasta, ace2contigs, add_single, clusc, filter
UploadedSat Mar 8 03:09:06 UTC 2008 by GwernBranwen
Downloads776 total (33 in last 30 days)
0 []
StatusDocs not available [build log]
All reported builds failed as of 2015-10-07 [all 3 reports]


Maintainers' corner

For package maintainers and hackage trustees

Readme for clustertools-0.1

This contains the following tools:

To build these, you will need a Haskell compiler (the most likely
candidate begin GHC), and my bioinformatics library and the SimpleArgs
module installed (Downloadable from: <>).

filter - remove unwanted sequences from a clustering
         usage: filter seq.list < cluster.L > cluster2.L
         cluster2.L will only contain sequence labels found in seq.list

hist   - produce a histogram of cluster sizes from a "label"-formatted

clusc  - compare clusterings, calculating numerous pair-based and
         entropy based indices.

xcerpt - given a file containing a list of sequence labels (e.g. a
         "label" formatted clustering), extract matching sequences
         from a FASTA file.  Like "agrep -d '^>'" without the bugs.

         Usage: xcerpt list.txt fasta.seq
         creates "fasta.seq.match" and ""

add_single - add singletons to a clustering.
        Usage: add_single all.L clustering.L
        creates clustering.L_s listing all sequences in all.L but not in
        clustering.L, one per line.

ace2contigs - parse an ACE assembly file, and output the contigs in a
        FASTA file (named by tacking on .fasta to the ACE file name),
        and the corresponding quality information (.qual).

ace2fasta - parse an ACE assembly, and output each assembly in a separate
        fasta formatted file, with the necessary gaps inserted to align the
        sequences (suitable for import into e.g. Seaview)

clusterlibs - given a table of regular expressions and library names,
        along with a clustering (TGICL-format), output a table of clusters
        with the library name prepended to the sequences.