bio-0.5: A bioinformatics library

Bio.Location.Location

Description

Data type for a more general sequence location consiting of potentially disjoint ranges of positions on the sequence.

Throughout, sequence position refers to a `Pos` which includes a strand. An index into a sequence is referred to as an offset, and is generally of type `Offset`.

Synopsis

Sequence locations

newtype Loc Source

General (disjoint) sequence region consisting of a concatenated set of contiguous regions (see `ContigLoc`).

Constructors

 Loc [ContigLoc]

Instances

 Eq Loc Ord Loc Show Loc Stranded Loc

Locations and positions

bounds :: Loc -> (Offset, Offset)Source

The bounds of a sequence location. This is a pair consisting of the lowest and highest sequence offsets covered by the region. The bounds ignore the strand of the sequence location, and the first element of the pair will always be lower than the second. Even if the positions in the location do not run monotonically through the location, the overall lowest and highest sequence offsets are returned.

Returns the length of the region

Sequence position of the start of the location. This is the 5' end on the location strand, which will have a higher offset than `endPos` if the location is on the `RevCompl` strand.

Sequence position of the end of the location, as described in `startPos`.

Given a sequence position and a sequence location relative to the same sequence, compute a new position representing the original position relative to the subsequence defined by the location. If the sequence position lies outside of the sequence location, `Nothing` is returned; thus, the offset of the new position will always be in the range `[0, length cloc - 1]`.

When the sequence positions in the location are not monotonic, there may be multiple possible posInto solutions. That is, if the same outer sequence position is covered by two different contiguous blocks of the location, then it would have two possible sequence positions relative to the location. In this case, the position 5'-most in the location orientation is returned.

Given a sequence location and a sequence position within that location, compute a new position representing the original position relative to the outer sequence. If the sequence position lies outside the location, `Nothing` is returned.

This function inverts `posInto` when the sequence position lies within the position is actually within the location. Due to the possibility of redundant location-relative positions for a given absolute position, `posInto` does not necessary invert `posOutof`

isWithin :: Pos -> Loc -> BoolSource

Returns `True` when a sequence position lies within a sequence location on the same sequence, and occupies the same strand.

overlaps :: Loc -> Loc -> BoolSource

Returns `True` when two sequence locations overlap at any position.

Extracting subsequences

seqData :: (Error e, MonadError e m) => SeqData -> Loc -> m SeqDataSource

Extract the nucleotide `SeqData` for the sequence location. If any part of the location lies outside the bounds of the sequence, an error results.

As `seqData`, extract the nucleotide subsequence for the location. Any positions in the location lying outside the bounds of the sequence are returned as `N` rather than producing an error.

Transforming locations

Arguments

 :: (Offset, Offset) (5' extension, 3' extension) -> Loc -> Loc

Returns a sequence location produced by extending the original location on each end, based on a pair of (5\' extension, /3' extension/). These add contiguous positions to the 5' and 3' ends of the original location. The 5' extension is applied to the 5' end of the location on the location strand; if the location is on the `RevCompl` strand, the 5' end will have a higher offset than the 3' end and this offset will increase by the amount of the 5' extension. Similarly, the 3' extension is applied to the 3' end of the location.

Displaying locations

Display a human-friendly, zero-based representation of a sequence location.