elynx-seq: Handle molecular sequences

[ bioinformatics, gpl, library ] [ Propose Tags ]

Examine, modify, and simulate molecular sequences in a reproducible way. Please see the README on GitHub at https://github.com/dschrempf/elynx.


[Skip to Readme]
Versions [RSS] [faq] 0.0.1, 0.1.0, 0.2.1, 0.2.2, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.4.0, 0.4.1, 0.5.0, 0.5.0.1, 0.5.0.2, 0.5.1.0, 0.5.1.1
Change log ChangeLog.md
Dependencies aeson (>=1.5.6.0), attoparsec (>=0.13.2.5), base (>=4.7 && <5), bytestring (>=0.10.12.0), containers (>=0.6.2.1), matrices (>=0.5.0), mwc-random (>=0.15.0.1), parallel (>=3.2.2.0), primitive (>=0.7.1.0), vector (>=0.12.3.0), vector-th-unbox (>=0.2.1.9), word8 (>=0.1.3) [details]
License GPL-3.0-or-later
Copyright Dominik Schrempf (2021)
Author Dominik Schrempf
Maintainer dominik.schrempf@gmail.com
Category Bioinformatics
Home page https://github.com/dschrempf/elynx#readme
Bug tracker https://github.com/dschrempf/elynx/issues
Source repo head: git clone https://github.com/dschrempf/elynx
Uploaded by dschrempf at 2021-06-14T12:51:07Z
Distributions LTSHaskell:0.5.1.1, NixOS:0.5.1.1
Downloads 2242 total (33 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2021-06-14 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Candidates


Readme for elynx-seq-0.5.1.1

[back to package description]

The ELynx Suite

Version: 0.5.1.0. Reproducible evolution made easy.

A Haskell library and tool set for computational biology. The goal of ELynx is reproducible research. Evolutionary sequences and phylogenetic trees can be read, viewed, modified and simulated. The command line with all arguments is logged consistently, and automatically. Data integrity is verified using SHA256 sums so that validation of past analyses is possible without the need to recompute the result.

The Elynx Suite consists of library packages and executables providing a range of sub commands.

The library packages are:

  • elynx-nexus: Nexus file support.
  • elynx-markov: Simulate multi sequence alignments along phylogenetic trees.
  • elynx-seq: Handle evolutionary sequences and multi sequence alignments.
  • elynx-tools: Tools for the provided executables.
  • elynx-tree: Handle phylogenetic trees.

The executables are:

  • slynx: Analyze, modify, and simulate evolutionary sequences.
  • tlynx: Analyze, modify, and simulate phylogenetic trees.
  • elynx: Validate and redo past analyses.

Documentation is available on Hackage (use direct links above).

ELynx is actively developed. We happily receive comments, ideas, feature requests, and pull requests!

Installation

ELynx is written in Haskell and can be installed with Stack.

  1. Install Stack with your package manager, or directly from the web page.

    curl -sSL https://get.haskellstack.org/ | sh
    
  2. Clone the ELynx repository.

    git clone https://github.com/dschrempf/elynx
    
  3. Navigate to the newly created elynx folder and build the binaries. This will take a while.

    stack build
    
  4. Run a binary from within the project directory. For example,

    stack exec tlynx -- --help
    
  5. If needed, install the binaries.

    stack install
    

    The binaries are installed into ~/.local/bin/ which has to be added to the PATH environment variable. Then, they can be used directly.

SLynx

Handle evolutionary sequences.

stack exec slynx -- --help | head -n -16

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx [-v|--verbosity VALUE] [-o|--output-file-basename NAME] 
             [-f|--force] [--no-elynx-file] COMMAND
  Analyze, and simulate multi sequence alignments.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -v,--verbosity VALUE     Be verbose; one of: Quiet Warning Info Debug
                           (default: Info)
  -o,--output-file-basename NAME
                           Specify base name of output file
  -f,--force               Ignore previous analysis and overwrite existing
                           output files.
  --no-elynx-file          Do not write data required to reproduce an analysis.

Available commands:
  concatenate              Concatenate sequences found in input files.
  examine                  Examine sequences. If data is a multi sequence alignment, additionally analyze columns.
  filter-columns           Filter columns of multi sequence alignments.
  filter-rows              Filter rows (or sequences) found in input files.
  simulate                 Simulate multi sequence alignments.
  sub-sample               Sub-sample columns from multi sequence alignments.
  translate                Translate from DNA to Protein or DNAX to ProteinX.


Available sequence file formats:
  - FASTA

Available alphabets:
  - DNA (nucleotides)
  - DNAX (nucleotides; including gaps)
  - DNAI (nucleotides; including gaps, and IUPAC codes)
  - Protein (amino acids)
  - ProteinX (amino acids; including gaps)
  - ProteinS (amino acids; including gaps, and translation stops)

Concatenate

Concatenate multi sequence alignments.

stack exec slynx -- concatenate --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx concatenate (-a|--alphabet NAME) INPUT-FILE
  Concatenate sequences found in input files.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  -h,--help                Show this help text

Examine

Examine sequence with slynx examine.

stack exec slynx -- examine --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx examine (-a|--alphabet NAME) INPUT-FILE [--per-site]
  Examine sequences. If data is a multi sequence alignment, additionally analyze columns.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  --per-site               Report per site summary statistics
  -h,--help                Show this help text

Filter

Filter sequences with filer-rows.

stack exec slynx -- filter-rows --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx filter-rows (-a|--alphabet NAME) INPUT-FILE [--longer-than LENGTH] 
                         [--shorter-than LENGTH] [--standard-characters]
  Filter rows (or sequences) found in input files.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  --longer-than LENGTH     Only keep sequences longer than LENGTH
  --shorter-than LENGTH    Only keep sequences shorter than LENGTH
  --standard-characters    Only keep sequences containing at least one standard
                           (i.e., non-IUPAC) character
  -h,--help                Show this help text

Filter columns of multi sequence alignments with filter-columns.

stack exec slynx -- filter-columns --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx filter-columns (-a|--alphabet NAME) INPUT-FILE 
                            [--standard-chars DOUBLE]
  Filter columns of multi sequence alignments.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  --standard-chars DOUBLE  Keep columns with a proportion standard (non-IUPAC)
                           characters larger than DOUBLE in [0,1]
  -h,--help                Show this help text

Simulate

Simulate sequences with slynx simulate.

stack exec slynx -- simulate --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx simulate (-t|--tree-file Name) [-s|--substitution-model MODEL] 
                      [-m|--mixture-model MODEL] [-e|--edm-file NAME] 
                      [-p|--siteprofile-files NAMES] 
                      [-w|--mixture-model-weights "[DOUBLE,DOUBLE,...]"] 
                      [-g|--gamma-rate-heterogeneity "(NCAT,SHAPE)"]
                      (-l|--length NUMBER) [-S|--seed [INT]]
  Simulate multi sequence alignments.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -t,--tree-file Name      Read tree from Newick file NAME
  -s,--substitution-model MODEL
                           Set the phylogenetic substitution model; available
                           models are shown below (mutually exclusive with -m
                           option)
  -m,--mixture-model MODEL Set the phylogenetic mixture model; available models
                           are shown below (mutually exclusive with -s option)
  -e,--edm-file NAME       Empirical distribution model file NAME in Phylobayes
                           format
  -p,--siteprofile-files NAMES
                           File names of site profiles in Phylobayes format
  -w,--mixture-model-weights "[DOUBLE,DOUBLE,...]"
                           Weights of mixture model components
  -g,--gamma-rate-heterogeneity "(NCAT,SHAPE)"
                           Number of gamma rate categories and shape parameter
  -l,--length NUMBER       Set alignment length to NUMBER
  -S,--seed [INT]          Seed for random number generator; list of 32 bit
                           integers with up to 256 elements (default: random)
  -h,--help                Show this help text

Substitution models:
-s "MODEL[PARAMETER,PARAMETER,...]{STATIONARY_DISTRIBUTION}"
   Supported DNA models: JC, F81, HKY, GTR4.
     For example,
       -s HKY[KAPPA]{DOUBLE,DOUBLE,DOUBLE,DOUBLE}
       -s GTR4[e_AC,e_AG,e_AT,e_CG,e_CT,e_GT]{DOUBLE,DOUBLE,DOUBLE,DOUBLE}
          where the 'e_XY' are the exchangeabilities from nucleotide X to Y.
   Supported Protein models: Poisson, Poisson-Custom, LG, LG-Custom, WAG, WAG-Custom, GTR20.
     MODEL-Custom means that only the exchangeabilities of MODEL are used,
     and a custom stationary distribution is provided.
     For example,
       -s LG
       -s LG-Custom{...}
       -s GTR20[e_AR,e_AN,...]{...}
          the 'e_XY' are the exchangeabilities from amino acid X to Y (alphabetical order).
   Notes: The F81 model for DNA is equivalent to the Poisson-Custom for proteins.
          The GTR4 model for DNA is equivalent to the GTR20 for proteins.

Mixture models:
-m "MIXTURE(SUBSTITUTION_MODEL_1,SUBSTITUTION_MODEL_2[PARAMETERS]{STATIONARY_DISTRIBUTION},...)"
   For example,
     -m "MIXTURE(JC,HKY[6.0]{0.3,0.2,0.2,0.3})"
Mixture weights have to be provided with the -w option.

Special mixture models:
-m CXX
   where XX is 10, 20, 30, 40, 50, or 60; CXX models, Quang et al., 2008.
-m "EDM(EXCHANGEABILITIES)"
   Arbitrary empirical distribution mixture (EDM) models.
   Stationary distributions have to be provided with the -e or -p option.
   For example,
     LG exchangeabilities with stationary distributions given in FILE.
     -m "EDM(LG-Custom)" -e FILE
     LG exchangeabilities with site profiles (Phylobayes) given in FILES.
     -m "EDM(LG-Custom)" -p FILES
For special mixture models, mixture weights are optional.

Sub-sample

Sub-sample columns from multi sequence alignments.

stack exec slynx -- sub-sample --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx sub-sample (-a|--alphabet NAME) INPUT-FILE
                        (-n|--number-of-sites INT)
                        (-m|--number-of-alignments INT) [-S|--seed [INT]]
  Sub-sample columns from multi sequence alignments.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  -n,--number-of-sites INT Number of sites randomly drawn with replacement
  -m,--number-of-alignments INT
                           Number of multi sequence alignments to be created
  -S,--seed [INT]          Seed for random number generator; list of 32 bit
                           integers with up to 256 elements (default: random)
  -h,--help                Show this help text

Create a given number of multi sequence alignments, each of which contains a given number of random sites drawn from the original multi sequence alignment.

Translate

Translate sequences.

stack exec slynx -- translate --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: slynx translate (-a|--alphabet NAME) INPUT-FILE (-r|--reading-frame INT)
                       (-u|--universal-code CODE)
  Translate from DNA to Protein or DNAX to ProteinX.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -a,--alphabet NAME       Specify alphabet type NAME
  INPUT-FILE               Read sequences from INPUT-FILE
  -r,--reading-frame INT   Reading frame [0|1|2].
  -u,--universal-code CODE universal code; one of: Standard,
                           VertebrateMitochondrial.
  -h,--help                Show this help text

TLynx

Handle phylogenetic trees in Newick format.

stack exec tlynx -- --help | head -n -16

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx [-v|--verbosity VALUE] [-o|--output-file-basename NAME] 
             [-f|--force] [--no-elynx-file] COMMAND
  Compare, examine, and simulate phylogenetic trees.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -v,--verbosity VALUE     Be verbose; one of: Quiet Warning Info Debug
                           (default: Info)
  -o,--output-file-basename NAME
                           Specify base name of output file
  -f,--force               Ignore previous analysis and overwrite existing
                           output files.
  --no-elynx-file          Do not write data required to reproduce an analysis.

Available commands:
  compare                  Compare two phylogenetic trees (compute distances and branch-wise differences).
  connect                  Connect two phylogenetic trees in all ways (possibly honoring constraints).
  distance                 Compute distances between many phylogenetic trees.
  examine                  Compute summary statistics of phylogenetic trees.
  shuffle                  Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).
  simulate                 Simulate phylogenetic trees using a birth and death or coalescent process.


Available tree file formats:
  - Newick Standard: Branch support values are stored in square brackets after branch lengths.
  - Newick IqTree:   Branch support values are stored as node names after the closing bracket of forests.
  - Newick RevBayes: Key-value pairs is provided in square brackets after node names as well as branch lengths. XXX: Key value pairs are ignored at the moment.

Compare

Compute distances between phylogenetic trees.

stack exec tlynx -- compare --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx compare [-n|--normalize] [-b|--bipartitions] [-t|--intersect] 
                     [-f|--newick-format FORMAT] NAMES
  Compare two phylogenetic trees (compute distances and branch-wise differences).

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -n,--normalize           Normalize trees before comparison
  -b,--bipartitions        Print and plot common and missing bipartitions
  -t,--intersect           Compare intersections; i.e., before comparison, drop
                           leaves that are not present in the other tree
  -f,--newick-format FORMAT
                           Newick tree format: Standard, IqTree, or RevBayes;
                           default: Standard; for detailed help, see 'tlynx
                           --help'
  NAMES                    Tree files
  -h,--help                Show this help text

Connect

Connect two phylogenetic tree in all ways (possibly honoring constraints).

stack exec tlynx -- connect --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx connect [-f|--newick-format FORMAT] [-c|--contraints CONSTRAINTS]
                     TREE-FILE-A TREE-FILE-B
  Connect two phylogenetic trees in all ways (possibly honoring constraints).

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -f,--newick-format FORMAT
                           Newick tree format: Standard, IqTree, or RevBayes;
                           default: Standard; for detailed help, see 'tlynx
                           --help'
  -c,--contraints CONSTRAINTS
                           File containing one or more Newick trees to be used
                           as constraints
  TREE-FILE-A              File containing the first Newick tree
  TREE-FILE-B              File containing the second Newick tree
  -h,--help                Show this help text

Distancce

Compute distances between many phylogenetic trees.

stack exec tlynx -- distance --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx distance (-d|--distance MEASURE) [-n|--normalize] [-t|--intersect] 
                      [-s|--summary-statistics] 
                      [-m|--master-tree-file MASTER-TREE-File] 
                      [-f|--newick-format FORMAT] [INPUT-FILES]
  Compute distances between many phylogenetic trees.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -d,--distance MEASURE    Type of distance to calculate (available distance
                           measures are listed below)
  -n,--normalize           Normalize trees before distance calculation; only
                           affect distances depending on branch lengths
  -t,--intersect           Compare intersections; i.e., before comparison, drop
                           leaves that are not present in the other tree
  -s,--summary-statistics  Report summary statistics only
  -m,--master-tree-file MASTER-TREE-File
                           Compare all trees to the tree in the master tree
                           file.
  -f,--newick-format FORMAT
                           Newick tree format: Standard, IqTree, or RevBayes;
                           default: Standard; for detailed help, see 'tlynx
                           --help'
  INPUT-FILES              Read tree(s) from INPUT-FILES; if more files are
                           given, one tree is expected per file
  -h,--help                Show this help text

Distance measures:
  symmetric                Symmetric distance (Robinson-Foulds distance).
  incompatible-split[VAL]  Incompatible split distance. Collapse branches with (normalized)
                           support less than 0.0<=VAL<=1.0 before distance calculation;
                           if, let's say, VAL>0.7, only well supported differences contribute
                           to the total distance.
  branch-score             Branch score distance.

Examine

Compute summary statistics of phylogenetic trees.

stack exec tlynx -- examine --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx examine INPUT-FILE [-f|--newick-format FORMAT]
  Compute summary statistics of phylogenetic trees.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  INPUT-FILE               Read trees from INPUT-FILE
  -f,--newick-format FORMAT
                           Newick tree format: Standard, IqTree, or RevBayes;
                           default: Standard; for detailed help, see 'tlynx
                           --help'
  -h,--help                Show this help text

Shuffle

Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).

stack exec tlynx -- shuffle --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx shuffle [-f|--newick-format FORMAT] [-n|--replicates N] TREE-FILE 
                     [-S|--seed [INT]]
  Shuffle a phylogenetic tree (keep coalescent times, but shuffle topology and leaves).

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -f,--newick-format FORMAT
                           Newick tree format: Standard, IqTree, or RevBayes;
                           default: Standard; for detailed help, see 'tlynx
                           --help'
  -n,--replicates N        Number of trees to generate
  TREE-FILE                File containing a Newick tree
  -S,--seed [INT]          Seed for random number generator; list of 32 bit
                           integers with up to 256 elements (default: random)
  -h,--help                Show this help text

Simulate

Simulate phylogenetic trees using birth and death processes.

stack exec tlynx -- simulate --help

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: tlynx simulate (-t|--nTrees INT) (-n|--nLeaves INT) PROCESS 
                      [-u|--sub-sample DOUBLE] [-s|--summary-statistics] 
                      [-S|--seed [INT]]
  Simulate phylogenetic trees using a birth and death or coalescent process.

Available options:
  -h,--help                Show this help text
  -V,--version             Show version
  -t,--nTrees INT          Number of trees
  -n,--nLeaves INT         Number of leaves per tree
  -u,--sub-sample DOUBLE   Perform sub-sampling; see below.
  -s,--summary-statistics  For each branch, print length and number of children
  -S,--seed [INT]          Seed for random number generator; list of 32 bit
                           integers with up to 256 elements (default: random)
  -h,--help                Show this help text

Available processes:
  birthdeath               Birth and death process
  coalescent               Coalescent process

See, for example, 'tlynx simulate birthdeath --help'.
Sub-sample with probability p:
  1. Simulate one big tree with n'=round(n/p), n'>=n, leaves;
  2. Randomly sample sub-trees with n leaves.
  (With p=1.0, the same tree is reported over and over again.)

ELynx

Validate and (optionally) redo past ELynx analyses.

stack exec elynx -- --help | head -n -16

ELynx Suite version 0.5.1.0.
Developed by Dominik Schrempf.
Compiled on June 12, 2021, at 14:54 pm, UTC.

Usage: elynx COMMAND
  Validate and redo past ELynx analyses

Available options:
  -h,--help                Show this help text
  -V,--version             Show version

Available commands:
  validate                 Validate an ELynx analysis
  redo                     Redo an ELynx analysis