dephd: Analyze quality of nucleotide sequences.

[ bioinformatics, program ] [ Propose Tags ]

dephd - A simple tool for base calling and quality appraisal.

Reads files in phd-format (phred output), either specified individually, or in a directory (use the --input-dirs option to read directories or --input-list to read from an index file). Can also read FASTA with an associated quality file.

Can trim according to Lucy or Phred parameters, can mask by quality, can plot graphs (via gnuplot) of sequence quality to a window, or to JPG/EPS files. Can categorize sequences according to overall quality. Also constructs files suitable for submission to dbEST. More information at http://blog.malde.org/index.php/2010/09/07/submitting-ests-upstream/.

Also provides fakequal, a utility to generate bogus quality values, which are sometimes needed by less flexible tools.

The Darcs repository is at http://malde.org/~ketil/biohaskell/dephd.

[Skip to Readme]

Downloads

dephd-0.1.6.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

GwernBranwen, KetilMalde

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.0, 0.1, 0.1.1, 0.1.3, 0.1.4, 0.1.5, 0.1.6
Dependencies	base (>=3 && <5), bio (>0.4), bytestring, cmdargs (>=0.5), directory, process, regex-compat [details]
License	LicenseRef-GPL
Author	Ketil Malde
Maintainer	Ketil Malde <ketil@malde.org>
Category	Bioinformatics
Home page	http://malde.org/~ketil/biohaskell/dephd
Uploaded	by KetilMalde at 2011-03-21T19:07:02Z
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Executables	fakequal, dephd
Downloads	5591 total (18 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs not available [build log] Successful builds reported [all 8 reports]

Readme for dephd-0.1.6

[back to package description]

  Synopsis
  --------

dephd - A simple tool for base calling and quality appraisal

Reads files in phd-format (phred output), either specified individually,
or in a directory (use the --dir option to read directories).

  Installation
  ------------

You need the GHC compiler, or if you know what you are doing, another
Haskell compiler or interpreter with Cabal.  You also need to install
the 'bio' library (darcs get http://malde.org/~ketil/bio)

With those things in place, you should be able to do

     runhaskell Setup configure
     runhaskell Setup build
     sudo runhaskell Setup install

Optionally, add "--prefix $HOME" (without the quotes) after configure
to install to your home directory - in which case you don't need the 'sudo'.


  Usage
  -----

A brief usage report is printed if you run 'dephd -h'.  Somewhat more detailed:

Input is specified either as a list of phd-files (typcially generated
by Phred), a list of directories containing phd-files (using the
--input-dirs) option, a file containing a list of names of phd-files
(--input-list), or a Fasta and associated quality file (-i foo.fasta
foo.qual). 

Output is specified by -J, -X, -P, -R foo.ranks, -F foo.fasta, and/or
-Q foo.qual.  The first three generate a plot of sequence quality in
JPEG files, an X window, or Postscript files, respectively.  If you
use -X on multiple files, hit q to terminate one window and go to the
next.

Three other options (-R, -F, and -Q) output different aspects
of the sequence information to files (specify '-' for printing to
standard output instead - obviously this will be messy if you do it
for more than one option!).  -F and -Q is for generating the standard
Fasta and Quality files, while -R produces a file with one line per
sequence containing various quality measures, including a verdict
ranging from Excellent, through Good and Poor, to Junk.

Finally, -E can be used to generate a file suitable for submission to
dbEST.  Usually, you need to provide a library table (-l option), that is,
a text file with whitespace-seprated columns describing each library.
The first line of the table contains columns labels, and these should
correspond to fields in the dbEST library record format. One column
should be labelled "Pattern" and contain a regular expression matching
sequence names from this library.

Filtering can be specified with the -t option, which interprets
trimming information from Phred or Lucy, and chops off the offending
parts, or with the -q options, which masks poor quality parts of
sequences to lower case, and really poor quality parts to 'n'
characters. 

  Bugs
  ----

Not many, I hope.  The program should work in (approximately) constant
space, and be able to deal with large amounts of sequences.

For further questions, email me at <ketil@malde.org>