DBFunctor-0.1.0.0: DBFunctor - Functional Data Management => ETL/ELT Data Processing in Haskell

Copyright(c) Nikos Karagiannidis 2018
LicenseBSD3
Maintainernkarag@gmail.com
Stabilitystable
PortabilityPOSIX
Safe HaskellNone
LanguageHaskell2010

RTable.Data.CSV

Contents

Description

This module implements the RTabular instance of the CSV data type, i.e., implements the interface by which a CSV file can be transformed to/from an RTable. It is required when we want to do ETL/ELT over CSV files with the DBFunctor package (i.e., with the Julius EDSL for ETL/ELT found in the Etl.Julius module).

The minimum requirement for implementing an RTabular instance for a data type is to implement the toRTable and fromRTable functions. Apart from these two functions, this module also exports functions for reading and writing CSV data from/to CSV files. Also it supports all types of delimiters (not only commas) and CSVs with or without headers. (see CSVOptions)

For the CSV data type this module uses the Cassava library (Data.Csv)

Synopsis

The CSV data type

type CSV = Vector Row Source #

Definition of a CSV file. Treating CSV data as opaque byte strings (see Csv type in Cassava library - Data.Csv: type Csv = Vector Record)

type Row = Vector Column Source #

Definition of a CSV Row. Essentially a Row is just a Vector of ByteString (type Record = Vector Field)

type Column = Field Source #

Definition of a CSV Record column. (type Field = ByteString)

data CSVOptions Source #

Options for a CSV file (e.g., delimiter specification, header specification etc.)

Constructors

CSVOptions 

Fields

data YesNo Source #

Yes or No sum type

Constructors

Yes 
No 

Read/Write CSV

readCSV Source #

Arguments

:: FilePath

the CSV file

-> IO CSV

the output CSV type

reads a CSV file and returns a CSV data type (Treating CSV data as opaque byte strings)

readCSVwithOptions Source #

Arguments

:: CSVOptions 
-> FilePath

the CSV file

-> IO CSV

the output CSV type

reads a CSV file based on input options (delimiter and header option) and returns a CSV data type (Treating CSV data as opaque byte strings)

readCSVFile Source #

Arguments

:: FilePath

the CSV file

-> IO ByteString

the output CSV

reads a CSV file and returns a lazy bytestring

writeCSV Source #

Arguments

:: FilePath

the csv file to be created

-> CSV

input CSV

-> IO () 

write a CSV to a newly created csv file

writeCSVFile Source #

Arguments

:: FilePath

the csv file to be created

-> ByteString

input CSV

-> IO () 

write a CSV (bytestring) to a newly created csv file

CSV as Tabular data

CSV I/O

printCSV Source #

Arguments

:: CSV

input CSV to be printed on screen

-> IO () 

print input CSV on screen

printCSVFile Source #

Arguments

:: ByteString

input CSV to be printed on screen

-> IO () 

print input CSV on screen

Basic CSV processing

copyCSV Source #

Arguments

:: FilePath

input csv file

-> FilePath

output csv file

-> IO () 

copy input csv file to specified output csv file

selectNrows Source #

Arguments

:: Int

Number of rows to select

-> ByteString

Input csv

-> ByteString

Output csv

selectNrows: Returns the first N rows from a CSV file

projectByIndex Source #

Arguments

:: [Int]

input list of column indexes

-> CSV

input csv

-> CSV

output CSV

Column projection on an input CSV file where desired columns are defined by position (index) in the CSV.

headCSV :: CSV -> Row Source #

O(1) First row

tailCSV :: CSV -> CSV Source #

O(1) Yield all but the first row without copying. The CSV may not be empty.

Misc

csvHeaderFromRtable :: RTable -> Header Source #

creates a Header (as defined in Data.Csv) from an RTable type Header = Vector Name type Name = ByteString

Exceptions

Orphan instances

FromField RDataType Source #

Necessary instance in order to convert a CSV file column value to an RDataType value.

Instance details

ToField RDataType Source #

In order to encode an input RTable into a CSV bytestring we need to make Rtuple an instance of the ToNamedRecord typeclass and implement the toNamedRecord function. Where:

             toNamedRecord :: a -> NamedRecord
             type NamedRecord = HashMap ByteString ByteString

             namedRecord :: [(ByteString, ByteString)] -> NamedRecord
                 Construct a named record from a list of name-value ByteString pairs. Use .= to construct such a pair from a name and a value.

             (.=) :: ToField a => ByteString -> a -> (ByteString, ByteString)

In our case, we dont need to do this because an RTuple is just a synonym for HM.HashMap ColumnName RDataType and the data type HashMap a b is already an instance of ToNamedRecord.

Also we need to make RDataType an instance of ToField ((CV.ToField RDataType)) by implementing toField, so as to be able to convert an RDataType into a ByteString where:

             toField :: a -> Field
             type Field = ByteString
Instance details

Methods

toField :: RDataType -> Field #

RTabular CSV Source #

CSV data are "Tabular" data thus implement the RTabular interface

Instance details