estreps: Repeats from ESTs

[ bioinformatics, program ] [ Propose Tags ]

rselect - select a random set of sequences from a FASTA file, optinally with random orientation (forward/reverse complement).
reps - extract exact k-word repeats based that occur in sequences grouped in different clusters.

The Darcs repository is at: http://malde.org/~ketil/biohaskell/estreps.

[Skip to Readme]

Downloads

estreps-0.3.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

GwernBranwen, KetilMalde

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1, 0.3, 0.3.1
Dependencies	base (>3 && <4), bio (>=0.4), bytestring, containers, random [details]
License	LicenseRef-GPL
Author	Ketil Malde
Maintainer	Ketil Malde <ketil@malde.org>
Category	Bioinformatics
Home page	http://blog.malde.org/
Uploaded	by KetilMalde at 2009-10-07T13:01:48Z
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Executables	reps, rselect
Downloads	2705 total (9 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs not available [build log] All reported builds failed as of 2016-12-31 [all 6 reports]

Readme for estreps-0.3.1

[back to package description]

SYNOPSIS

    rselect - select a random set of sequences from a FASTA file.
    reps    - extract exact k-word repeats based that occur in
              sequences grouped in different clusters.


INSTALLATION

    You'll need GHC or possibly another Haskell system, and the
    Haskell bioinformatics library.  The Makefile should work to
    build and install (by default to your home directory) the
    executables.

USAGE

      rselect [-r] n [m] input.seq

    Selects n sequences from the file input.seq.  If the optional
    m is given, this limits the selection to happen only from the first
    m sequences in the file, which may be more efficient.  If -r is given,
    the sequences will be reoriented randomly.

    The selected sequences are written to standard output, so you
    probably want to redirect them to a file.

      reps k clustered.seq

    Generate a list of repeated k-words (or k-grams) found in the sequences.
    The sequences are expected to be on UniGene format, i.e. a FASTA file
    with #-initiated comments separating the clusters.

    A k-word is considered repeated if it is found in more than one of the
    clusters.

      reps k clustering.lst sequences.seq

    As above, but take a separate input clustering (and ignore any #'s in the
    sequences.  The clustering should consist of one line per cluster, with each
    line containing the sequence identifier (first word after the initial '>'
    in the FASTA header).

BUGS

    Do let me know about them, at <ketil@malde.org>!

HOMEPAGE

   http://malde.org/~ketil/