-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | A rope type based on a finger tree over UTF-8 fragments -- -- A rope data type for text, built as a finger tree over UTF-8 text -- fragments. The package also includes utiltiy functions for breaking -- and re-wrapping lines, conveniences for pretty printing and -- colourizing terminal output, and a simple mechanism for multi-line -- Rope literals. -- -- The main Rope type and its usage are described at -- Core.Text.Rope in this package. -- -- This is part of a library intended to ease interoperability and assist -- in building command-line programs, both tools and longer-running -- daemons. A list of features and some background to the library's -- design is contained in the README on GitHub. @package core-text @version 0.2.3.3 -- | Binary (as opposed to textual) data is encountered in weird corners of -- the Haskell ecosystem. We tend to forget (for example) that the -- content recieved from a web server is not text until we convert -- it from UTF-8 (if that's what it is); and of course that glosses over -- the fact that something of content-type image/jpeg is not -- text in any way, shape, or form. -- -- Bytes also show up when working with crypto algorithms, taking hashes, -- and when doing serialization to external binary formats. Although we -- frequently display these in terminals (and in URLs!) as text, but we -- take for granted that we have actually deserialized the data or -- rendered the it in hexidecimal or base64 or... -- -- This module presents a simple wrapper around various representations -- of binary data to make it easier to interoperate with libraries -- supplying or consuming bytes. module Core.Text.Bytes -- | A block of data in binary form. data Bytes -- | Conversion to and from various types containing binary data into our -- convenience Bytes type. -- -- As often as not these conversions are expensive; these methods -- are here just to wrap calling the relevant functions in a uniform -- interface. class Binary α fromBytes :: Binary α => Bytes -> α intoBytes :: Binary α => α -> Bytes -- | Output the content of the Bytes to the specified Handle. -- --
-- hOutput h b ---- -- output provides a convenient way to write a Bytes to a -- file or socket handle from within the Program monad. -- -- Don't use this function to write to stdout if you are using -- any of the other output or logging facililities of this libarary as -- you will corrupt the ordering of output on the user's terminal. -- Instead do: -- --
-- write (intoRope b) ---- -- on the assumption that the bytes in question are UTF-8 (or plain -- ASCII) encoded. hOutput :: Handle -> Bytes -> IO () -- | Read the (entire) contents of a handle into a Bytes object. -- -- If you want to read the entire contents of a file, you can do: -- --
-- contents <- withFile name ReadMode hInput ---- -- At any kind of scale, Streaming I/O is almost always for better, but -- for small files you need to pick apart this is fine. hInput :: Handle -> IO Bytes -- | Access the strict ByteString underlying the Bytes -- type. unBytes :: Bytes -> ByteString instance GHC.Generics.Generic Core.Text.Bytes.Bytes instance GHC.Classes.Ord Core.Text.Bytes.Bytes instance GHC.Classes.Eq Core.Text.Bytes.Bytes instance GHC.Show.Show Core.Text.Bytes.Bytes instance Core.Text.Bytes.Binary Core.Text.Bytes.Bytes instance Core.Text.Bytes.Binary Data.ByteString.Internal.ByteString instance Core.Text.Bytes.Binary Data.ByteString.Lazy.Internal.ByteString instance Core.Text.Bytes.Binary Data.ByteString.Builder.Internal.Builder instance Core.Text.Bytes.Binary [GHC.Word.Word8] instance Data.Hashable.Class.Hashable Core.Text.Bytes.Bytes -- | If you're accustomed to working with text in almost any other -- programming language, you'd be aware that a "string" typically refers -- to an in-memory array of characters. Traditionally this was a -- single ASCII byte per character; more recently UTF-8 variable byte -- encodings which dramatically complicates finding offsets but which -- gives efficient support for the entire Unicode character space. In -- Haskell, the original text type, String, is implemented as a -- list of Char which, because a Haskell list is implemented as a -- linked-list of boxed values, is wildly inefficient at any kind -- of scale. -- -- In modern Haskell there are two primary ways to represent text. -- -- First is via the [rather poorly named] ByteString from the -- bytestring package (which is an array of bytes in pinned -- memory). The Data.ByteString.Char8 submodule gives you ways to -- manipulate those arrays as if they were ASCII characters. Confusingly -- there are both strict (Data.ByteString) and lazy -- (Data.ByteString.Lazy) variants which are often hard to tell -- the difference between when reading function signatures or haddock -- documentation. The performance problem an immutable array backed data -- type runs into is that appending a character (that is, ASCII byte) or -- concatonating a string (that is, another array of ASCII bytes) is very -- expensive and requires allocating a new larger array and copying the -- whole thing into it. This led to the development of "builders" which -- amortize this reallocation cost over time, but it can be cumbersome to -- switch between Builder, the lazy ByteString that -- results, and then having to inevitably convert to a strict -- ByteString because that's what the next function in your -- sequence requires. -- -- The second way is through the opaque Text type of -- Data.Text from the text package, which is well tuned and -- high-performing but suffers from the same design; it is likewise -- backed by arrays. Rather surprisingly, the storage backing Text -- objects are encoded in UTF-16, meaning every time you want to work -- with unicode characters that came in from anywhere else and -- which inevitably are UTF-8 encoded you have to convert to UTF-16 and -- copy into a new array, wasting time and memory. -- -- In this package we introduce Rope, a text type backed by the -- 2-3 FingerTree data structure from the fingertree -- package. This is not an uncommon solution in many languages as finger -- trees support exceptionally efficient appending to either end and good -- performance inserting anywhere else (you often find them as the -- backing data type underneath text editors for this reason). Rather -- than Char the pieces of the rope are ShortText from the -- text-short package, which are UTF-8 encoded and in normal -- memory managed by the Haskell runtime. Conversion from other Haskell -- text types is not O(1) (UTF-8 validity must be checked, or -- UTF-16 decoded, or...), but in our benchmarking the performance has -- been comparable to the established types and you may find the -- resultant interface for combining chunks is comparable to using a -- Builder, without being forced to use a Builder. -- -- Rope is used as the text type throughout this library. If you -- use the functions within this package (rather than converting to other -- text types) operations are quite efficient. When you do need to -- convert to another type you can use fromRope or intoRope -- from the Textual typeclass. -- -- Note that we haven't tried to cover the entire gamut of operations or -- customary convenience functions you would find in the other libraries; -- so far Rope is concentrated on aiding interoperation, being -- good at appending (lots of) small pieces, and then efficiently taking -- the resultant text object out to a file handle, be that the terminal -- console, a file, or a network socket. module Core.Text.Rope -- | A type for textual data. A rope is text backed by a tree data -- structure, rather than a single large continguous array, as is the -- case for strings. -- -- There are three use cases: -- -- Referencing externally sourced data -- -- Often we interpret large blocks of data sourced from external systems -- as text. Ideally we would hold onto this without copying the memory, -- but (as in the case of ByteString which is the most common -- source of data) before we can treat it as text we have to validate the -- UTF-8 content. Safety first. We also copy it out of pinned memory, -- allowing the Haskell runtime to manage the storage. -- -- Interoperating with other libraries -- -- The only constant of the Haskell universe is that you won't have the -- right combination of {strict, lazy} × {Text, -- ByteString, String, [Word8], etc} you need -- for the next function call. The Textual typeclass provides for -- moving between different text representations. To convert between -- Rope and something else use fromRope; to construct a -- Rope from textual content in another type use -- intoRope. -- -- You can get at the underlying finger tree with the unRope -- function. -- -- Assembling text to go out -- -- This involves considerable appending of data, very very occaisionally -- inserting it. Often the pieces are tiny. To add text to a -- Rope use the appendRope method as below or the -- (<>) operator from Data.Monoid (like you would -- have with a Builder). -- -- Output to a Handle can be done efficiently with -- hWrite. data Rope -- | An zero-length Rope. You can also use "" presuming the -- OverloadedStrings language extension is turned on in -- your source file. emptyRope :: Rope -- | A Rope with but a single character. singletonRope :: Char -> Rope -- | Repeat the input Rope n times. The follows the same -- semantics as other replicate functions; if you ask for zero -- copies you'll get an empty text and if you ask for lots of "" -- you'll get ... an empty text. -- -- Implementation note -- -- Rather than copying the input n times, this will simply add -- structure to hold n references to the provided input text. replicateRope :: Int -> Rope -> Rope -- | Repeat the input Char n times. This is a special case -- of replicateRope above. -- -- Implementation note -- -- Rather than making a huge FingerTree full of single characters, this -- function will allocate a single ShortText comprised of the repeated -- input character. replicateChar :: Int -> Char -> Rope -- | Get the length of this text, in characters. widthRope :: Rope -> Int -- | Break the text into two pieces at the specified offset. -- -- Examples: -- --
-- λ> splitRope 0 "abcdef"
-- ("", "abcdef")
-- λ> splitRope 3 "abcdef"
-- ("abc", "def")
-- λ> splitRope 6 "abcdef"
-- ("abcdef","")
--
--
-- Going off either end behaves sensibly:
--
--
-- λ> splitRope 7 "abcdef"
-- ("abcdef","")
-- λ> splitRope (-1) "abcdef"
-- ("", "abcdef")
--
splitRope :: Int -> Rope -> (Rope, Rope)
-- | Insert a new piece of text into an existing Rope at the
-- specified offset.
--
-- Examples:
--
-- -- λ> insertRope 3 "Con" "Def 1" -- "DefCon 1" -- λ> insertRope 0 "United " "Nations" -- "United Nations" --insertRope :: Int -> Rope -> Rope -> Rope -- | Does the text contain this character? -- -- We've used it to ask whether there are newlines present in a -- Rope, for example: -- --
-- if containsCharacter '\n' text -- then handleComplexCase -- else keepItSimple --containsCharacter :: Char -> Rope -> Bool findIndexRope :: (Char -> Bool) -> Rope -> Maybe Int -- | Machinery to interpret a type as containing valid Unicode that can be -- represented as a Rope object. -- -- Implementation notes -- -- Given that Rope is backed by a finger tree, append -- is relatively inexpensive, plus whatever the cost of conversion is. -- There is a subtle trap, however: if adding small fragments of that -- were obtained by slicing (for example) a large ByteString we would end -- up holding on to a reference to the entire underlying block of memory. -- This module is optimized to reduce heap fragmentation by letting the -- Haskell runtime and garbage collector manage the memory, so instances -- are expected to copy these substrings out of pinned memory. -- -- The ByteString instance requires that its content be valid -- UTF-8. If not an empty Rope will be returned. -- -- Several of the fromRope implementations are expensive and -- involve a lot of intermediate allocation and copying. If you're -- ultimately writing to a handle prefer hWrite which will write -- directly to the output buffer. class Textual α -- | Convert a Rope into another text-like type. fromRope :: Textual α => Rope -> α -- | Take another text-like type and convert it to a Rope. intoRope :: Textual α => α -> Rope -- | Append some text to this Rope. The default implementation is -- basically a convenience wrapper around calling intoRope and -- mappending it to your text (which will work just fine, but for -- some types more efficient implementations are possible). appendRope :: Textual α => α -> Rope -> Rope -- | Write the Rope to the given Handle. -- --
-- import Core.Text -- import Core.System -- re-exports stdout -- -- main :: IO () -- main = -- let -- text :: Rope -- text = "Hello World" -- in -- hWrite stdout text ---- -- because it's tradition. -- -- Uses hPutBuilder internally which saves all kinds of -- intermediate allocation and copying because we can go from the -- ShortTexts in the finger tree to ShortByteString to -- Builder to the Handle's output buffer in one go. -- -- If you're working in the Program monad, then write -- provides an efficient way to write a Rope to stdout. hWrite :: Handle -> Rope -> IO () -- | Access the finger tree underlying the Rope. You'll want the -- following imports: -- --
-- import qualified Data.FingerTree as F -- from the fingertree package -- import qualified Data.Text.Short as S -- from the text-short package --unRope :: Rope -> FingerTree Width ShortText nullRope :: Rope -> Bool -- | If you know the input bytes are valid UTF-8 encoded characters, -- then you can use this function to convert to a piece of Rope. unsafeIntoRope :: ByteString -> Rope -- | The length of the Rope, in characters. This is the monoid -- used to structure the finger tree underlying the Rope. newtype Width Width :: Int -> Width instance GHC.Generics.Generic Core.Text.Rope.Rope instance GHC.Generics.Generic Core.Text.Rope.Width instance GHC.Num.Num Core.Text.Rope.Width instance GHC.Show.Show Core.Text.Rope.Width instance GHC.Classes.Ord Core.Text.Rope.Width instance GHC.Classes.Eq Core.Text.Rope.Width instance GHC.Show.Show Core.Text.Rope.Rope instance Core.Text.Rope.Textual (Data.FingerTree.FingerTree Core.Text.Rope.Width Data.Text.Short.Internal.ShortText) instance Core.Text.Rope.Textual Core.Text.Rope.Rope instance Core.Text.Rope.Textual Data.Text.Short.Internal.ShortText instance Core.Text.Rope.Textual Data.Text.Internal.Text instance Core.Text.Rope.Textual Data.Text.Internal.Lazy.Text instance Core.Text.Rope.Textual Data.ByteString.Internal.ByteString instance Core.Text.Rope.Textual Data.ByteString.Lazy.Internal.ByteString instance Core.Text.Rope.Textual Core.Text.Bytes.Bytes instance Core.Text.Bytes.Binary Core.Text.Rope.Rope instance Core.Text.Rope.Textual [GHC.Types.Char] instance Control.DeepSeq.NFData Core.Text.Rope.Rope instance GHC.Classes.Eq Core.Text.Rope.Rope instance GHC.Classes.Ord Core.Text.Rope.Rope instance Data.Text.Prettyprint.Doc.Internal.Pretty Core.Text.Rope.Rope instance Data.String.IsString Core.Text.Rope.Rope instance GHC.Base.Semigroup Core.Text.Rope.Rope instance GHC.Base.Monoid Core.Text.Rope.Rope instance Data.Hashable.Class.Hashable Core.Text.Rope.Rope instance Data.FingerTree.Measured Core.Text.Rope.Width Data.Text.Short.Internal.ShortText instance GHC.Base.Semigroup Core.Text.Rope.Width instance GHC.Base.Monoid Core.Text.Rope.Width -- | Useful tools for working with Ropes. Support for pretty -- printing, multi-line strings, and... module Core.Text.Utilities -- | Types which can be rendered "prettily", that is, formatted by a pretty -- printer and embossed with beautiful ANSI colours when printed to the -- terminal. -- -- Use render to build text object for later use or -- Control.Program.Logging's writeR if you're writing -- directly to console now. class Render α where { -- | Which type are the annotations of your Doc going to be expressed in? type family Token α :: *; } -- | Convert semantic tokens to specific ANSI escape tokens colourize :: Render α => Token α -> AnsiStyle -- | Arrange your type as a Doc ann, annotated with your -- semantic tokens. intoDocA :: Render α => α -> Doc (Token α) -- | Given an object of a type with a Render instance, transform it -- into a Rope saturated with ANSI escape codes representing syntax -- highlighting or similar colouring, wrapping at the specified -- width. -- -- The obvious expectation is that the next thing you're going to do is -- send the Rope to console with: -- --
-- write (render 80 thing) ---- -- However, the better thing to do is to instead use: -- --
-- writeR thing ---- -- which is able to pretty print the document text respecting the -- available width of the terminal. render :: Render α => Int -> α -> Rope -- | Having gone to all the trouble to colourize your rendered types... -- sometimes you don't want that. This function is like render, -- but removes all the ANSI escape codes so it comes outformatted but as -- plain black & white text. renderNoAnsi :: Render α => Int -> α -> Rope -- | Render "a" or "an" in front of a word depending on English's idea of -- whether it's a vowel or not. indefinite :: Rope -> Rope -- | Split a passage of text into a list of words. A line is broken -- wherever there is one or more whitespace characters, as defined by -- Data.Char's isSpace. -- -- Examples: -- --
-- λ> breakWords "This is a test"
-- ["This","is","a","test"]
-- λ> breakWords ("St" <> "op and " <> "go left")
-- ["Stop","and","go","left"]
-- λ> breakWords emptyRope
-- []
--
breakWords :: Rope -> [Rope]
-- | Split a paragraph of text into a list of its individual lines. The
-- paragraph will be broken wherever there is a '\n' character.
--
-- Blank lines will be preserved. Note that as a special case you do
-- not get a blank entry at the end of the a list of newline
-- terminated strings.
--
-- -- λ> breakLines "Hello\n\nWorld\n" -- ["Hello","","World"] --breakLines :: Rope -> [Rope] -- | Break a Rope into pieces whereever the given predicate function -- returns True. If found, that character will not be included -- on either side. Empty runs, however, *will* be preserved. breakPieces :: (Char -> Bool) -> Rope -> [Rope] -- | Predicate testing whether a character is a newline. After -- isSpace et al in Data.Char. isNewline :: Char -> Bool -- | Often the input text represents a paragraph, but does not have any -- internal newlines (representing word wrapping). This function takes a -- line of text and inserts newlines to simulate such folding, keeping -- the line under the supplied maximum width. -- -- A single word that is excessively long will be included as-is on its -- own line (that line will exceed the desired maxium width). -- -- Any trailing newlines will be removed. wrap :: Int -> Rope -> Rope -- | Calculate the line number and column number of a Rope (interpreting it -- as if is a block of text in a file). By the convention observed by all -- leading brands of text editor, lines and columns are 1 -- origin, so an empty Rope is position (1,1). calculatePositionEnd :: Rope -> (Int, Int) -- | Pad a pieve of text on the left with a specified character to the -- desired width. This function is named in homage to the famous result -- from Computer Science known as leftPad which has a glorious -- place in the history of the world-wide web. leftPadWith :: Char -> Int -> Rope -> Rope -- | Right pad a text with the specified character. rightPadWith :: Char -> Int -> Rope -> Rope -- | Multi-line string literals. -- -- To use these you need to enable the QuasiQuotes language -- extension in your source file: -- --
-- {-# LANGUAGE OverloadedStrings #-}
-- {-# LANGUAGE QuasiQuotes #-}
--
--
-- you are then able to easily write a string stretching over several
-- lines.
--
-- How best to formatting multi-line string literal within your source
-- code is an aesthetic judgement. Sometimes you don't care about the
-- whitespace leading a passage (8 spaces in this example):
--
-- -- let message = [quote| -- This is a test of the Emergency Broadcast System. Do not be -- alarmed. If this were a real emergency, someone would have tweeted -- about it by now. -- |] ---- -- because you are feeding it into a Doc for pretty printing and -- know the renderer will convert the whole text into a single line and -- then re-flow it. Other times you will want to have the string as is, -- literally: -- --
-- let poem = [quote| -- If the sun -- rises -- in the -- west -- you drank -- too much -- last week. -- |] ---- -- Leading whitespace from the first line and trailing whitespace from -- the last line will be trimmed, so this: -- --
-- let value = [quote| -- Hello -- |] ---- -- is translated to: -- --
-- let value = fromString "Hello\n" ---- -- without the leading newline or trailing four spaces. Note that as -- string literals they are presented to your code with fromString -- :: String -> α so any type with an IsString -- instance (as Rope has) can be constructed from a multi-line -- [quote| ... |] literal. quote :: QuasiQuoter instance Core.Text.Utilities.Render Core.Text.Rope.Rope instance Core.Text.Utilities.Render GHC.Types.Char instance Core.Text.Utilities.Render a => Core.Text.Utilities.Render [a] instance Core.Text.Utilities.Render Data.Text.Internal.Text instance Core.Text.Utilities.Render Core.Text.Bytes.Bytes -- | A unified Text type providing interoperability between various text -- back-ends present in the Haskell ecosystem. -- -- This is intended to be used directly: -- --
-- import Core.Text ---- -- as this module re-exports all of the various components making up this -- library's text handling subsystem. module Core.Text