Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Text.Newline.LineMap
Description
Create a map of the lines in the file to allow fast seeking later. Specifically, for each line, we output:
- the byte offset from the start of the file of the start of the line
- the length of the line in number of bytes (including the line terminator, if any)
- the type of line terminator that ended the line, if any
- the non-decoded bytes of that line.
There is an associated file format to serialize this data, based on CSV.
See documentation for display
.
Currently, we only support utf8-encoded text with Unix line-endings (LF).
Synopsis
- data Line a = Line {}
- display :: [Line a] -> String
- breakLines_unixUtf8 :: ByteString -> [Line ByteString]
- breakLine_unixUtf8 :: Int -> ByteString -> (Line ByteString, ByteString)
Documentation
Holds a detected line. The main result type for this module.
Constructors
Line | |
display :: [Line a] -> String Source #
Render contents for a linemap file.
The format is simply a three-colum CSV with header row. The columns are offset, length, and terminator, as above. Offset and length are decimal-encoded unsigned integers. The terminator column must hold one of the following strings:
unix
for LF (ASCII 0x0A),dos
for CRLF (ASCOO 0x0D 0x0A),eof
for end of file/input.
The output CSV does not require quoting, so the output actually abides by RFC 4180 (with the exception that I'm using LF instead of CRLF, sigh).
Arguments
:: ByteString | all bytes of a file |
-> [Line ByteString] |
Split input into lines.
Assumes utf8-encoded text with LF (ASCII 0x0A) line terminators.
See breakLine_unixUtf8
to take a single line.
Arguments
:: Int | byte offset within file of input |
-> ByteString | non-empty input bytes |
-> (Line ByteString, ByteString) | resuling line and remaining input |
Take one line of input, and also return the remaining input.
Assumes utf8-encoded text with LF (ASCII 0x0A) line terminators.
See breakLines_unixUtf8
to produce a list of all lines.