|Maintainer||Jaap Weel <weel at ugcs dot caltech dot edu>|
This module parses and dumps documents that are formatted more or less according to RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files", http://www.rfc-editor.org/rfc/rfc4180.txt.
There are some issues with this RFC. I will describe what these issues are and how I deal with them.
First, the RFC prescribes CRLF standard network line breaks, but you are likely to run across CSV files with other line endings, so we accept any sequence of CRs and LFs as a line break.
Second, there is an optional header line, but the format for the header line is exactly like a regular record and you can only figure out whether it exists from the mime type, which may not be available. I ignore the issues of header lines and simply turn them into regular records.
Third, there is an inconsistency, in that the formal grammar specifies that fields can contain only certain US ASCII characters, but the specification of the MIME type allows for other character sets. I will allow all characters in fields, except for commas, CRs and LFs in unquoted fields. This should make it possible to parse CSV files in any encoding, but it allows for characters such as tabs that the RFC may be interpreted to forbid even in non-US-ASCII character sets.
A CSV file is a series of records. According to the RFC, the records all have to have the same length. As an extension, I allow variable length records.
Given a file name (used only for error messages) and a string to parse, run the parser.
Given a file name, read from that file and run the parser
Given a string, run the parser, and print the result on stdout.