xml2x: Convert BLAST output in XML format to CSV or HTML

[ bioinformatics, program ] [ Propose Tags ]

xml2x - convert blast output in XML format, either to a (csv) table suitable for e.g. importing into Excel or OOCalc, or to HTML. Optionally annotating the output with GO terms.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.2, 0.4, 0.4.1, 0.4.2
Dependencies array, base (>3), bio (>=0.3.2), bytestring, containers, directory, xhtml [details]
License LicenseRef-GPL
Author Ketil Malde
Maintainer Ketil Malde <ketil@malde.org>
Category Bioinformatics
Uploaded by GwernBranwen at 2008-03-08T04:05:45Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables xml2x
Downloads 3663 total (13 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2017-01-03 [all 8 reports]

Readme for xml2x-0.2

[back to package description]
SYNOPSIS

    xml2x - convert blast output in XML format, either to a (csv)
	    table suitable for e.g. importing into Excel or OOCalc, or
	    to HTML.  Optionally annotating the output with GO terms.

INSTALLATION

    The usual cabal routine, should also be possible to compile via
    the Makefile.

USAGE

    xml2x [options] xmlfile1 xmlfile2...

    Options include --annotations to specify the mapping between
    UniProt accessions and GO terms.  This file is usually called
    "gene_association.goa_uniprot", and is available from the GO
    consortium.  The file is several GB, you may want to consider
    trimming it down a bit by filtering out the automatic (IEA)
    annotations.

    Additionally, you can use --go-defs to specify the description of
    the GO terms, and the output will then be somewhat more
    meaningful.  The file is usually called "GO.terms_and_ids",
    similarly available.

    Output format is specified with -C or -H, with -C being the default.

    For CSV output, the following modes are supported

      --all    - output all blast matches (HSPs), one per line
      --top    - output only the top hit for each input sequence
      --region - output top hit for regions that overlap <50%

    Use -v on an interactive terminal to keep track of progress.

BUGS

    HTML isn't implemented yet.

    XML parsing is slow, but ndm said he'd look into it.

    Must be compiled with -smp to avoid huge memory requirements, but
    the plus side is that with -smp, we use a lot less RAM than
    AutoFact.