Readme for clustertools-0.1
This contains the following tools:
To build these, you will need a Haskell compiler (the most likely
candidate begin GHC), and my bioinformatics library and the SimpleArgs
module installed (Downloadable from: <http://malde.org/~ketil/biohaskell/>).
filter - remove unwanted sequences from a clustering
usage: filter seq.list < cluster.L > cluster2.L
cluster2.L will only contain sequence labels found in seq.list
hist - produce a histogram of cluster sizes from a "label"-formatted
clustering.
clusc - compare clusterings, calculating numerous pair-based and
entropy based indices.
xcerpt - given a file containing a list of sequence labels (e.g. a
"label" formatted clustering), extract matching sequences
from a FASTA file. Like "agrep -d '^>'" without the bugs.
Usage: xcerpt list.txt fasta.seq
creates "fasta.seq.match" and "fasta.seq.rest"
add_single - add singletons to a clustering.
Usage: add_single all.L clustering.L
creates clustering.L_s listing all sequences in all.L but not in
clustering.L, one per line.
ace2contigs - parse an ACE assembly file, and output the contigs in a
FASTA file (named by tacking on .fasta to the ACE file name),
and the corresponding quality information (.qual).
ace2fasta - parse an ACE assembly, and output each assembly in a separate
fasta formatted file, with the necessary gaps inserted to align the
sequences (suitable for import into e.g. Seaview)
clusterlibs - given a table of regular expressions and library names,
along with a clustering (TGICL-format), output a table of clusters
with the library name prepended to the sequences.