A paragraph formatting utility. Provided with input text that is arbitrarily split amongst several strings, this utility will reformat the text into paragraphs which do not exceed the specified width. Paragraphs are delimited by blank lines in the input.
This function is roughly equivalent to the Unix fmt
utility.
Features:
- An indentation/prefix text may be specified. This prefix is used on the first paragraph line and determines the standard indentation for all subsequent lines. If no indentation is specified, the blank indentation of the first line of the first paragraph becomes the default indentation for all paragraphs.
- Subsequent paragraphs may increase their indentation over the default as determined by the indentation level of their first line. Indentation values less than that of the primary paragraph are ignored.
- Paragraph text is reformatted to fit the paragraph layout.
- Extra whitespace is removed.
- "French spacing" is used: if the current word is capitalized and the previous word ended in a punctuation character, then two spaces are used between the words instead of a single space which is the default elsewhere.
- Avoids orphan words. The last line of a paragraph will usually be formatted to contain at least 2 words, pulling from the line above it.
- Recognizes lists of items, where each item starts with * or - or alphanumeric characters followed by a ) or . character. Uses list-oriented per-item indentation independent of paragraph indentation.
Documentation
:: Int | Width |
-> Maybe String | Prefix (defines indent), Nothing means indent is taken from first input line |
-> [String] | Text to format in arbitrarily-divided strings. Blank lines separate paragraphs. Paragraphs are indented the same as the first line if second argument is Nothing. |
-> [String] | Formatted text |
The formatParas
function accepts an arbitrarily-divided list of
Strings along with a width and optional indentation/prefix and
returns an array of strings representing paragraphs formatted to
fit the specified width and indentation.
Example
The following show example uses and output of the Para formatter.
Here is a simple program that takes 2 or more arguments.
- A width
- One or more filenames
The program will read the specified files and then use Para to format them with the specified width and display them on stdout.
import Text.Format.Para import System.Environment import Data.List main = do args <- getArgs let width = head args bodies <- mapM readFile $ tail args putStrLn $ unlines $ formatParas (read width) (Just "Example: ") $ intersperse "\n" bodies
This program is useable in a similar manner to the Unix fmt
application. It also provides a convenient way to test and
experiment with the output of the Para module.
The following represents an example input file that demonstrates most of the capabilities of the Para formatter:
This is a test. This is line 2. Note: double spacing (a.k.a. french spacing) between sentences, but elsewhere only single spacing is used; i.e. whitespace compression is performed. This is the second paragraph. Note that all indentation is based on the initial indentation string specified, although that string only introduces the first paragraph on the sequence. * Here is a list * This is another list item. It is fairly long, so when it wraps the subline should be indented. This is the third paragraph. And it is indented. It is followed by a command-line example: $ ghc --make -o ptest ptest.hs $ ./ptest The fourth paragraph is indented even more. Birdtracks are verbatim, even if the line is long. > main = do > args <- getArgs > putStrLn $ "Hello! Hi. Greetings. I think you said " ++ intercalate ", " args The list can also be numbered or use other indicators: 1) Here is a list 2) Item #2 10) Item 10 20a) This is a longer item with a mixed representation of the item count. 4. Can use standard decimals as well for numbering elements. 5. And it doesn't really matter if all elements of the list are the same. Just as long as it's recognized as a list element. a) But it does have to be at the same indentation? b) Right. Multi-level lists are supported. Each list item is handled as a paragraph. 6. Level is based on initial character indentation. And that's it!
If this example is saved to an input file and then processed with the test application above and a width of 80, the output might look like the following:
Example: This is a test. This is line 2. Note: double spacing (a.k.a. french spacing) between sentences, but elsewhere only single spacing is used; i.e. whitespace compression is performed. This is the second paragraph. Note that all indentation is based on the initial indentation string specified, although that string only introduces the first paragraph on the sequence. * Here is a list * This is another list item. It is fairly long, so when it wraps the subline should be indented. This is the third paragraph. And it is indented. It is followed by a command-line example: $ ghc --make -o ptest ptest.hs $ ./ptest The fourth paragraph is indented even more. Birdtracks are verbatim, even if the line is long. > main = do > args <- getArgs > putStrLn $ "Hello! Hi. Greetings. I think you said " ++ intercalate ", " args The list can also be numbered or use other indicators: 1) Here is a list 2) Item #2 10) Item 10 20a) This is a longer item with a mixed representation of the item count. 4. Can use standard decimals as well for numbering elements. 5. And it doesn't really matter if all elements of the list are the same. Just as long as it's recognized as a list element. a) But it does have to be at the same indentation? b) Right. Multi-level lists are supported. Each list item is handled as a paragraph. 6. Level is based on initial character indentation. And that's it!
If this same input file was run with a a width of 50 instead then the output would look like the following:
Example: This is a test. This is line 2. Note: double spacing (a.k.a. french spacing) between sentences, but elsewhere only single spacing is used; i.e. whitespace compression is performed. This is the second paragraph. Note that all indentation is based on the initial indentation string specified, although that string only introduces the first paragraph on the sequence. * Here is a list * This is another list item. It is fairly long, so when it wraps the subline should be indented. This is the third paragraph. And it is indented. It is followed by a command-line example: $ ghc --make -o ptest ptest.hs $ ./ptest The fourth paragraph is indented even more. Birdtracks are verbatim, even if the line is long. > main = do > args <- getArgs > putStrLn $ "Hello! Hi. Greetings. I think you said " ++ intercalate ", " args The list can also be numbered or use other indicators: 1) Here is a list 2) Item #2 10) Item 10 20a) This is a longer item with a mixed representation of the item count. 4. Can use standard decimals as well for numbering elements. 5. And it doesn't really matter if all elements of the list are the same. Just as long as it's recognized as a list element. a) But it does have to be at the same indentation? b) Right. Multi-level lists are supported. Each list item is handled as a paragraph. 6. Level is based on initial character indentation. And that's it!