{-# OPTIONS_GHC -fno-warn-unused-imports #-} {- | DSV ("delimiter-separated values") is a simple file format used to save tabular data such as you might see in a spreadsheet. Each row is separated by a newline character, and the fields within each row are separated by the /delimiter/ (such as a comma, tab, etc.) Most often, the delimiter is a comma, in which case we call the file a CSV file ("comma-separated values"). For example, a CSV file might contain a list of expenses. We will use variations of the following example CSV file throughout the documentation: > Date,Vendor,Price,Product > 2019-03-24,Acme Co,$599.89,Dehydrated boulders > 2019-04-18,Acme Co,$24.95,Earthquake pills -} module DSV ( -- * Reading a CSV file as a Vector -- ** @readCsvFileStrict@... -- $readingCsvFilesStrictly readCsvFileStrictWithZippedHeader, readCsvFileStrictWithoutHeader, readCsvFileStrictIgnoringHeader -- ** What is a Vector -- $vector , Vector, nthVectorElement, vectorLookup, listToVector, vectorToList, emptyVector -- ** What is a ByteString -- $bytestring , ByteString -- ** A read ends with a ParseStop , ParseStop (..), requireCompleteParse, completely -- * Other delimiters -- ** @readDsvFileStrict@... -- $readingDsvFilesStrictly , readDsvFileStrictWithZippedHeader, readDsvFileStrictWithoutHeader, readDsvFileStrictIgnoringHeader -- ** What is a Delimiter , Delimiter (..), comma, tab, delimiterWord8, charDelimiter -- * Reading with a custom row type -- ** @mapCsvFileStrict@... -- $readingCsvFilesStrictlyWithAnyRowType , mapCsvFileStrictWithoutHeader, mapCsvFileStrictIgnoringHeader, mapCsvFileStrictUsingHeader -- ** Using other delimiters -- $readingDsvFilesStrictlyWithAnyRowType , mapDsvFileStrictWithoutHeader, mapDsvFileStrictIgnoringHeader, mapDsvFileStrictUsingHeader -- * Iterating over a file with a Fold -- ** @foldCsvFile@... -- $foldingCsvFiles , foldCsvFileWithZippedHeader, foldCsvFileWithZippedHeaderM , foldCsvFileWithoutHeader, foldCsvFileWithoutHeaderM , foldCsvFileIgnoringHeader, foldCsvFileIgnoringHeaderM -- ** What is a Fold -- $fold , Fold (..), FoldM (..) -- ** Using other delimiters -- $foldingDsvFiles , foldDsvFileWithZippedHeader, foldDsvFileWithZippedHeaderM , foldDsvFileWithoutHeader, foldDsvFileWithoutHeaderM , foldDsvFileIgnoringHeader, foldDsvFileIgnoringHeaderM -- * Functions that can fail -- ** What is a View , View (..) -- ** What is Validation -- $validation , Validation (..) -- ** Constructing views , constView, maybeView -- ** Modifying views , overViewError, inputAsViewError, discardViewError -- ** Composing views -- $composingViews , (>>>), (<<<) , (>>>-), (<<<-) -- ** Using views , applyView, viewOrThrow, viewOrThrowInput, viewMaybe, viewOr, viewOr' -- ** Viewing strings as numbers , byteStringNatView, textNatView, InvalidNat (..) , byteStringNatView_, textNatView_ , byteStringRationalView, textRationalView, InvalidRational (..) , byteStringRationalView_, textRationalView_ , byteStringDollarsView, textDollarsView, InvalidDollars (..) , byteStringDollarsView_, textDollarsView_ -- ** Viewing a position of a vector , columnNumberView, TooShort (..), IndexError (..) , columnNumberView_ -- ** Finding something in a vector , lookupView, lookupView_, Duplicate (..), Missing (..), LookupError (..) -- ** Finding something in a vector of UTF-8 byte strings , lookupTextViewUtf8, lookupStringViewUtf8, LookupErrorUtf8 (..) , lookupTextViewUtf8_, lookupStringViewUtf8_ -- * Header-and-row views -- ** What is a ZipView , ZipView (..) -- ** Basic zip view operations , overZipViewError, overHeaderError, overRowError -- ** Converting a ZipView to a Pipe , zipViewPipe, zipViewPipeIgnoringAllErrors, zipViewPipeThrowFirstError -- ** Some zip views , byteStringZipView, textZipViewUtf8, textZipViewUtf8', byteStringZipViewPosition, entireRowZipView -- ** Refining a ZipView with a View , refineZipView -- ** Combining a ZipView with a Fold , zipViewFold, zipViewFoldM, ZipViewError (..) -- ** Reading strictly from CSV files using ZipView , zipViewCsvFileStrict , zipViewCsvFileStrictIgnoringAllErrors , zipViewCsvFileStrictThrowFirstError -- ** A read ends with a ZipViewStop , ZipViewStop (..) -- ** Using other delimiters , zipViewDsvFileStrict , zipViewDsvFileStrictIgnoringAllErrors , zipViewDsvFileStrictThrowFirstError -- * Pipes -- ** Pipes that parse DSV rows , csvRowPipe, dsvRowPipe -- ** Creating row producers from file handles , handleCsvRowProducer, handleDsvRowProducer -- ** Pipes that combine the header with subsequent rows , zipHeaderPipe, zipHeaderWithPipe -- ** What are Pipes -- $pipes , Pipe, Producer, Consumer, Effect, runEffect, (>->), await, yield -- * Attoparsec -- $attoparsec , AttoParser, attoPipe, handleAttoProducer, ParseError (..) -- * Position types , Position (..), RowNumber (..), ColumnNumber (..) , ColumnName (..), Positive (..), At (..) -- * Text -- ** What is Text -- $text , Text -- ** Relationship to String , stringToText, textToString -- ** Relationship to Bytestring , encodeTextUtf8, utf8TextView, InvalidUtf8 (..) ) where import DSV.AttoParser import DSV.AttoPipe import DSV.ByteString import DSV.CommonDelimiters import DSV.DelimiterSplice import DSV.DelimiterType import DSV.UTF8 import DSV.FileFold import DSV.FileFoldCsv import DSV.FileStrictCsvMap import DSV.FileStrictCsvRead import DSV.FileStrictCsvZipView import DSV.FileStrictMap import DSV.FileStrictRead import DSV.FileStrictZipView import DSV.Fold import DSV.Header import DSV.IndexError import DSV.LookupError import DSV.LookupUtf8 import DSV.ZipViews import DSV.Numbers import DSV.NumberViews import DSV.ParseError import DSV.ParseStop import DSV.Parsing import DSV.Pipes import DSV.Position import DSV.Prelude import DSV.RequireCompleteParse import DSV.Text import DSV.Validation import DSV.Vector import DSV.VectorViews import DSV.ViewType import DSV.ZipViewError import DSV.ZipViewFold import DSV.ZipViewPipe import DSV.ZipViewStop import DSV.ZipViewType import qualified Control.Foldl as L {- $readingCsvFilesStrictly We present these functions first because they require the least amount of effort to use. Each function in this section: 1. Assumes that the delimiter is a comma. 2. Reads from a file (specified by a 'FilePath'); 3. Reads all of the results into memory at once ("strictly"); Read on to the subsequent sections if: - you need to use a different delimiter; - your input source is something other than a file; - you need streaming to control memory usage; or - you would like assistance in converting the data from 'Vector's of 'ByteString's to other types. -} {- $readingDsvFilesStrictly \"CSV\" stands for "comma-separated values". But sometimes you may encounter CSV-like files in which the values are separated by some other character; e.g. it may have tabs instead of commas. We refer to such files more generally, then, as DSV files ("delimiter-separated values"). Functions that have a 'Delimiter' parameter, such as 'readDsvFileStrictWithoutHeader', let you specify what kind of DSV file you want to read. -} {- $readingCsvFilesStrictlyWithAnyRowType Most likely, you don't just want to get 'Vector's of 'ByteString' values from a CSV file; you want to interpret the meaning of those bytes somehow, converting each row into some type that is specific to the kind of data that your particular CSV file represents. These functions are parameterized on a function of type @(Vector ByteString -> IO row)@ which will get applied to each row as it is read. Then instead of getting each row as a @Vector ByteString@, each row will be represented in the result as a value of type @row@ (where @row@ is a type parameter that stands for whatever type your conversion function returns). -} {- $readingDsvFilesStrictlyWithAnyRowType This section is the same as the previous, but generalized with a 'Delimiter' parameter. -} {- $foldingCsvFiles The functions in this section are all parameterized on: 1. A 'FilePath', which specifies what CSV file to read; 2. Either a 'L.Fold' or a 'L.FoldM', which specifies what action to take upon each row from the CSV file. Use one of the functions with a 'L.Fold' parameter if you only need to collect information from the rows and aggregate it into some @result@ value. Use a function with a 'L.FoldM' parameter if your fold also needs to perform some kind of /effect/ as the rows are read from the file. See the "Control.Foldl" module for much more on what folds are and how to construct them. -} {- $foldingDsvFiles This section is the same as the previous, but generalized with a 'Delimiter' parameter. -} {- $miscellania These functions are not directly relevant to this library's primary purpose of consuming DSV files, but we include them because you might find some of them useful for reading particular kinds of values. -} {- $validation See the "Data.Validation" module for more on the 'Validation' type. -} {- $vector See the "Data.Vector" module for more on the 'Vector' type. -} {- $bytestring See the "Data.ByteString" module for more on the 'ByteString' type. -} {- $fold See the "Control.Foldl" module for more on the 'Fold' and 'FoldM' types. -} {- $text See the "Data.Text" module for more on the 'Text' type. -} {- $attoparsec See the "Data.Attoparsec.ByteString" module for more on parsing byte strings. -} {- $pipes See the "Pipes" module for more on pipes. -} {- $composingViews 'View' has a 'Category' instance, so you can chain views together using '>>>' and '<<<'. See the "Control.Category" module for more on categories. The two views being sequenced have to have the same error type, which is often inconvenient. To chain views together while converting their error type to @()@, you can use '>>>-' and '<<<-' instead. -}