para-1.1: Text paragraph formatting

Text.Format.Para

Contents

Description

A paragraph formatting utility. Provided with input text that is arbitrarily split amongst several strings, this utility will reformat the text into paragraphs which do not exceed the specified width. Paragraphs are delimited by blank lines in the input.

This function is roughly equivalent to the Unix fmt utility.

Features:

  • An indentation/prefix text may be specified. This prefix is used on the first paragraph line and determines the standard indentation for all subsequent lines. If no indentation is specified, the blank indentation of the first line of the first paragraph becomes the default indentation for all paragraphs.
  • Subsequent paragraphs may increase their indentation over the default as determined by the indentation level of their first line. Indentation values less than that of the primary paragraph are ignored.
  • Paragraph text is reformatted to fit the paragraph layout.
  • Extra whitespace is removed.
  • "French spacing" is used: if the current word is capitalized and the previous word ended in a punctuation character, then two spaces are used between the words instead of a single space which is the default elsewhere.
  • Avoids orphan words. The last line of a paragraph will usually be formatted to contain at least 2 words, pulling from the line above it.
  • Recognizes lists of items, where each item starts with * or - or alphanumeric characters followed by a ) or . character. Uses list-oriented per-item indentation independent of paragraph indentation.

Synopsis

Documentation

formatParasSource

Arguments

:: Int

Width

-> Maybe String

Prefix (defines indent), Nothing means indent is taken from first input line

-> [String]

Text to format in arbitrarily-divided strings. Blank lines separate paragraphs. Paragraphs are indented the same as the first line if second argument is Nothing.

-> [String]

Formatted text

The formatParas function accepts an arbitrarily-divided list of Strings along with a width and optional indentation/prefix and returns an array of strings representing paragraphs formatted to fit the specified width and indentation.

Example

The following show example uses and output of the Para formatter.

Here is a simple program that takes 2 or more arguments.

  • A width
  • One or more filenames

The program will read the specified files and then use Para to format them with the specified width and display them on stdout.

 import Text.Format.Para
 import System.Environment
 import Data.List

 main = do
      args <- getArgs
      let width = head args
      bodies <- mapM readFile $ tail args
      putStrLn $ unlines $ formatParas (read width) (Just "Example: ") $
               intersperse "\n" bodies

This program is useable in a similar manner to the Unix fmt application. It also provides a convenient way to test and experiment with the output of the Para module.

The following represents an example input file that demonstrates most of the capabilities of the Para formatter:

 This is a test.
 This is line 2. Note: double spacing (a.k.a. french spacing) between sentences, but
     elsewhere    only     single   spacing
 is used; i.e. whitespace compression is performed.
 
 This is the second paragraph. Note that all indentation is based on
 the initial indentation string specified, although that string only
 introduces the first paragraph on the sequence.
 
    * Here is a list
    * This is another list item.  It is fairly long, so when it wraps the subline should be indented.
 
    This is the third paragraph.
    And it is indented.
    It is followed by a command-line example:
      $ ghc --make -o ptest ptest.hs
      $ ./ptest
 
       The fourth paragraph
 is indented even more.
 
 Birdtracks are verbatim, even if the line is long.
    > main = do
    >    args <- getArgs
    >    putStrLn $ "Hello!  Hi. Greetings.   I think you said " ++ intercalate ", " args
 
 The list can also be
 numbered or use other
 indicators:
 
     1) Here is a list
     2) Item #2
     10) Item 10
     20a) This is a longer item with a mixed representation of the item count.
     4. Can use standard decimals as well for numbering elements.
     5. And it doesn't really matter if all elements of the list are the same.  Just as long as it's recognized as a list element.
        a) But it does have to be at the same indentation?
        b) Right.  Multi-level lists are supported.  Each list item is handled as a paragraph.
     6. Level is based on initial character indentation. 
 
 And that's it!

If this example is saved to an input file and then processed with the test application above and a width of 80, the output might look like the following:

 Example: This is a test.  This is line 2.  Note: double spacing (a.k.a. french
          spacing) between sentences, but elsewhere only single spacing is used;
          i.e. whitespace compression is performed.
 
          This is the second paragraph.  Note that all indentation is based on
          the initial indentation string specified, although that string only
          introduces the first paragraph on the sequence.
 
             * Here is a list
 
             * This is another list item.  It is fairly long, so when it wraps
               the subline should be indented.
 
             This is the third paragraph.  And it is indented.  It is followed by
             a command-line example:
 
               $ ghc --make -o ptest ptest.hs
               $ ./ptest
 
                The fourth paragraph is indented even more.
 
          Birdtracks are verbatim, even if the line is long.
 
             > main = do
             >    args <- getArgs
             >    putStrLn $ "Hello!  Hi. Greetings.   I think you said " ++ intercalate ", " args
 
          The list can also be numbered or use other indicators:
 
              1) Here is a list
 
              2) Item #2
 
              10) Item 10
 
              20a) This is a longer item with a mixed representation of the
                   item count.
 
              4. Can use standard decimals as well for numbering elements.
 
              5. And it doesn't really matter if all elements of the list are the
                 same.  Just as long as it's recognized as a list element.
 
                 a) But it does have to be at the same indentation?
 
                 b) Right.  Multi-level lists are supported.  Each list item is
                    handled as a paragraph.
 
              6. Level is based on initial character indentation.
 
          And that's it!

If this same input file was run with a a width of 50 instead then the output would look like the following:

 Example: This is a test.  This is line 2.  Note:
          double spacing (a.k.a. french spacing)
          between sentences, but elsewhere only
          single spacing is used; i.e. whitespace
          compression is performed.
 
          This is the second paragraph.  Note that
          all indentation is based on the initial
          indentation string specified, although
          that string only introduces the first
          paragraph on the sequence.
 
             * Here is a list
 
             * This is another list item.  It is
               fairly long, so when it wraps the
               subline should be indented.
 
             This is the third paragraph.  And it
             is indented.  It is followed by a
             command-line example:
 
               $ ghc --make -o ptest ptest.hs
               $ ./ptest
 
                The fourth paragraph is indented
                even more.
 
          Birdtracks are verbatim, even if the line
          is long.
 
             > main = do
             >    args <- getArgs
             >    putStrLn $ "Hello!  Hi. Greetings.   I think you said " ++ intercalate ", " args
 
          The list can also be numbered or use
          other indicators:
 
              1) Here is a list
              
              2) Item #2
              
              10) Item 10
              
              20a) This is a longer item with a
                   mixed representation of the
                   item count.
              
              4. Can use standard decimals as well
                 for numbering elements.
              
              5. And it doesn't really matter if
                 all elements of the list are the
                 same.  Just as long as it's
                 recognized as a list element.
                 
                 a) But it does have to be at the
                    same indentation?
                 
                 b) Right.  Multi-level lists are
                    supported.  Each list item is
                    handled as a paragraph.
              
              6. Level is based on initial
                 character indentation.
          
          And that's it!