The package name *loc* stands for “location” and is also an allusion to the acronym for “lines of code”. Overview of the concepts: ![Example text illustrating Loc, Span, and Area](https://raw.githubusercontent.com/chris-martin/haskell-libraries/4be81df645d4a2e5073f45563930e202e41209c7/loc/example.png) * `Loc` - a cursor position, starting at the origin `1:1` * `Span` - a nonempty contiguous region between two locs * `Area` - a set of zero or more spans with gaps between them See also: * [loc-test](https://hackage.haskell.org/package/loc-test) - Test-related utilities for this package. ## `Pos` Since all of the numbers we're dealing with in this domain are positive, we define a "positive integer" type. This is a newtype for `Natural` that doesn't allow zero. ```haskell newtype Pos = Pos Natural deriving (Eq, Ord) instance Num Pos where fromInteger = Pos . checkForUnderflow . fromInteger Pos x + Pos y = Pos (x + y) Pos x - Pos y = Pos (checkForUnderflow (x - y)) Pos x * Pos y = Pos (x * y) abs = id signum _ = Pos 1 negate _ = throw Underflow checkForUnderflow :: Natural -> Natural checkForUnderflow n = if n == 0 then throw Underflow else n ``` `Pos` does not have an `Integral` instance, because that would require implementing `quotRem :: Pos -> Pos -> (Pos, Pos)`, which doesn't make much sense. Therefore we can't use `toInteger` on `Pos`. Instead we use our own `ToNat` class to convert positive numbers to natural numbers. ```haskell class ToNat a where toNat :: a -> Natural instance ToNat Pos where toNat (Pos n) = n ``` ## `Line`, `Column` We then add some newtypes to be more specific about whether we're talking about line or column numbers. ```haskell newtype Line = Line Pos deriving (Eq, Ord, Num, Real, Enum, ToNat) newtype Column = Column Pos deriving (Eq, Ord, Num, Real, Enum, ToNat) ``` ## `Loc` A `Loc` is a `Line` and a `Column`. ```haskell data Loc = Loc { line :: Line , column :: Column } deriving (Eq, Ord) ``` Note that this library has chosen to be remain entirely agnostic of the text that the positions are referring to. Therefore there is no "plus one" operation on `Loc`, because the next `Loc` after *4:17* could be either *4:18* or *5:1* - we can't tell without knowing the line lengths. ## `Span` A `Span` is a start `Loc` and an end `Loc`. ```haskell data Span = Span { start :: Loc , end :: Loc } deriving (Eq, Ord) ``` A `Span` is not allowed to be empty; in other words, `start` and `end` must be different. There are two functions for constructing a `Span`. They both reorder their arguments as appropriate to make sure the start comes before the end (so that spans are never backwards). They take different approaches to ensuring that spans are never empty: the first can throw an exception, whereas the second is typed as `Maybe`. ```haskell fromTo :: Loc -> Loc -> Span fromTo a b = maybe (throw EmptySpan) id (fromToMay a b) fromToMay :: Loc -> Loc -> Maybe Span fromToMay a b = case compare a b of LT -> Just (Span a b) GT -> Just (Span b a) EQ -> Nothing ``` The choice to use an exclusive upper bound *\[start, end)* rather than two inclusive bounds *\[start, end\]* is forced by the decision to be text-agnostic. With inclusive ranges, you couldn't tell whether span *4:16-4:17* abuts span *5:1-5:2* without knowing whether the character at position *4:17* is a newline. ## `Area` Conceptually, an area is a set of spans. To support efficient union and difference operations, `Area` is defined like this: ```haskell data Terminus = Start | End deriving (Eq, Ord) newtype Area = Area (Map Loc Terminus) deriving (Eq, Ord) ``` You can think of this as a sorted list of the spans' start and end positions, along with a tag indicating whether each is a start or an end. ## `Show` We define custom `Show` and `Read` instances to be able to write terse tests like: ```haskell >>> addSpan (read "1:1-6:1") (read "[1:1-3:1,6:1-6:2,7:4-7:5]") [1:1-6:2,7:4-7:5] ``` These are the `showsPrec` implementations for `Loc` and `Span`: ```haskell locShowsPrec :: Int -> Loc -> ShowS locShowsPrec _ (Loc l c) = shows l . showString ":" . shows c spanShowsPrec :: Int -> Span -> ShowS spanShowsPrec _ (Span a b) = locShowsPrec 10 a . showString "-" . locShowsPrec 10 b ``` ## `Read` The parser for `Pos` is based on the parser for `Natural`, applying `mfilter (/= 0)` to make the parser fail if the input represents a zero. ```haskell posReadPrec :: ReadPrec Pos posReadPrec = Pos <$> mfilter (/= 0) readPrec ``` As a reminder, the type of `mfilter` is: ```haskell mfilter :: MonadPlus m => (a -> Bool) -> m a -> m a ``` The `Loc` parser uses a very typical `Applicative` pattern: ```haskell -- | Parses a single specific character. readPrecChar :: Char -> ReadPrec () readPrecChar = void . readP_to_Prec . const . ReadP.char locReadPrec :: ReadPrec Loc locReadPrec = Loc <$> readPrec <* readPrecChar ':' <*> readPrec ``` We used `mfilter` above to introduce failure into the `Pos` parser; for `Span` we use `empty`. ```haskell empty :: Alternative f => f a ``` First we use `fromToMay` to produce a `Maybe Span`, and then in the case where the result is `Nothing` we use `empty` to make the parser fail. ```haskell spanReadPrec :: ReadPrec Span spanReadPrec = locReadPrec >>= \a -> readPrecChar '-' *> locReadPrec >>= \b -> maybe empty pure (fromToMay a b) ``` ## Comparison to similar packages ### `srcloc` [srcloc](https://hackage.haskell.org/package/srcloc) has a similar general purpose: defining types related to positions in text files. Some differences: * `srcloc`'s `Pos` type (comparable to our `Loc` type) has a `FilePath` parameter, whereas this library doesn't consider file paths at all. * `srcloc` has nothing comparable to the `Area` type. There are some undocumented aspects of `srcloc` we find confusing: * What does "character offset" mean? * Does `srcloc`'s `Loc` type use inclusive or exclusive bounds?