elynx-seq-0.0.1: Handle molecular sequences

Copyright (c) Dominik Schrempf 2017 GPLv3 dominik.schrempf@gmail.com unstable non-portable (not tested) None Haskell2010

ELynx.Export.Sequence.CountsFile

Description

TODO: Import.

• The Counts Format

The input of PoMo is allele frequency data. Especially, when populations have many individuals it is preferable to count the number of bases at each position. This decreases file size and speeds up the parser.

Counts files contain:

• One headerline that specifies the file as counts file and states the number of populations as well as the number of sites (separated by white space).
• A second headerline with white space separated headers: CRHOM (chromosome), POS (position) and sequence names.
• Many lines with counts of A, C, G and T bases and their respective positions.

• Lines starting with # before the first headerline are treated as comments.

A toy example:

    COUNTSFILE  NPOP 5   NSITES N
CHROM  POS  Sheep    BlackSheep  RedSheep  Wolf     RedWolf
1      1    0,0,1,0  0,0,1,0     0,0,1,0   0,0,5,0  0,0,0,1
1      2    0,0,0,1  0,0,0,1     0,0,0,1   0,0,0,5  0,0,0,1
.
.
.
9      8373 0,0,0,1  1,0,0,0     0,1,0,0   0,1,4,0  0,0,1,0
.
.
.
Y      9999 0,0,0,1  0,1,0,0     0,1,0,0   0,5,0,0  0,0,1,0

Synopsis

# Documentation

The chromosome name.

type Pos = Int Source #

The position on the chromosome.

type DataOneSite = [State] Source #

The set of boundary states for one site.

type PopulationNames = [ByteString] Source #

The names of the populations.

Convert data to a counts file.