hsemail-1.2: Internet Message ParsersContentsIndex
Text.ParserCombinators.Parsec.Rfc2822
Portabilityportable
Stabilityprovisional
Maintainersimons@cryp.to
Contents
Useful parser combinators
Primitive Tokens (section 3.2.1)
Quoted characters (section 3.2.2)
Folding white space and comments (section 3.2.3)
Atom (section 3.2.4)
Quoted strings (section 3.2.5)
Miscellaneous tokens (section 3.2.6)
Date and Time Specification (section 3.3)
Address Specification (section 3.4)
Addr-spec specification (section 3.4.1)
Overall message syntax (section 3.5)
Field definitions (section 3.6)
The origination date field (section 3.6.1)
Originator fields (section 3.6.2)
Destination address fields (section 3.6.3)
Identification fields (section 3.6.4)
Informational fields (section 3.6.5)
Resent fields (section 3.6.6)
Trace fields (section 3.6.7)
Optional fields (section 3.6.8)
Miscellaneous obsolete tokens (section 4.1)
Obsolete folding white space (section 4.2)
Obsolete Date and Time (section 4.3)
Obsolete Addressing (section 4.4)
Obsolete header fields (section 4.5)
Obsolete origination date field (section 4.5.1)
Obsolete originator fields (section 4.5.2)
Obsolete destination address fields (section 4.5.3)
Obsolete identification fields (section 4.5.4)
Obsolete informational fields (section 4.5.5)
Obsolete resent fields (section 4.5.6)
Obsolete trace fields (section 4.5.7)
Description

This module provides parsers for the grammar defined in RFC2822, "Internet Message Format", http://www.faqs.org/rfcs/rfc2822.html.

Please note: The module is not particularly well tested.

Synopsis
maybeOption :: GenParser tok st a -> GenParser tok st (Maybe a)
unfold :: CharParser a b -> CharParser a b
header :: String -> CharParser a b -> CharParser a b
obs_header :: String -> CharParser a b -> CharParser a b
no_ws_ctl :: CharParser a Char
text :: CharParser a Char
specials :: CharParser a Char
quoted_pair :: CharParser a String
fws :: CharParser a String
ctext :: CharParser a Char
comment :: CharParser a String
cfws :: CharParser a String
atext :: CharParser a Char
atom :: CharParser a String
dot_atom :: CharParser a String
dot_atom_text :: CharParser a String
qtext :: CharParser a Char
qcontent :: CharParser a String
quoted_string :: CharParser a String
word :: CharParser a String
phrase :: CharParser a [String]
utext :: CharParser a Char
unstructured :: CharParser a String
date_time :: CharParser a CalendarTime
day_of_week :: CharParser a Day
day_name :: CharParser a Day
date :: CharParser a (Int, Month, Int)
year :: CharParser a Int
month :: CharParser a Month
month_name :: CharParser a Month
day :: CharParser a Int
time :: CharParser a (TimeDiff, Int)
time_of_day :: CharParser a TimeDiff
hour :: CharParser a Int
minute :: CharParser a Int
second :: CharParser a Int
zone :: CharParser a Int
data NameAddr = NameAddr {
nameAddr_name :: Maybe String
nameAddr_addr :: String
}
address :: CharParser a [NameAddr]
mailbox :: CharParser a NameAddr
name_addr :: CharParser a NameAddr
angle_addr :: CharParser a String
group :: CharParser a [NameAddr]
display_name :: CharParser a String
mailbox_list :: CharParser a [NameAddr]
address_list :: CharParser a [NameAddr]
addr_spec :: CharParser a String
local_part :: CharParser a String
domain :: CharParser a String
domain_literal :: CharParser a String
dcontent :: CharParser a String
dtext :: CharParser a Char
data Message = Message [Field] String
message :: CharParser a Message
body :: CharParser a String
data Field
= OptionalField String String
| From [NameAddr]
| Sender NameAddr
| ReturnPath String
| ReplyTo [NameAddr]
| To [NameAddr]
| Cc [NameAddr]
| Bcc [NameAddr]
| MessageID String
| InReplyTo [String]
| References [String]
| Subject String
| Comments String
| Keywords [[String]]
| Date CalendarTime
| ResentDate CalendarTime
| ResentFrom [NameAddr]
| ResentSender NameAddr
| ResentTo [NameAddr]
| ResentCc [NameAddr]
| ResentBcc [NameAddr]
| ResentMessageID String
| ResentReplyTo [NameAddr]
| Received ([(String, String)], CalendarTime)
| ObsReceived [(String, String)]
fields :: CharParser a [Field]
orig_date :: CharParser a CalendarTime
from :: CharParser a [NameAddr]
sender :: CharParser a NameAddr
reply_to :: CharParser a [NameAddr]
to :: CharParser a [NameAddr]
cc :: CharParser a [NameAddr]
bcc :: CharParser a [NameAddr]
message_id :: CharParser a String
in_reply_to :: CharParser a [String]
references :: CharParser a [String]
msg_id :: CharParser a String
id_left :: CharParser a String
id_right :: CharParser a String
no_fold_quote :: CharParser a String
no_fold_literal :: CharParser a String
subject :: CharParser a String
comments :: CharParser a String
keywords :: CharParser a [[String]]
resent_date :: CharParser a CalendarTime
resent_from :: CharParser a [NameAddr]
resent_sender :: CharParser a NameAddr
resent_to :: CharParser a [NameAddr]
resent_cc :: CharParser a [NameAddr]
resent_bcc :: CharParser a [NameAddr]
resent_msg_id :: CharParser a String
return_path :: CharParser a String
path :: CharParser a String
received :: CharParser a ([(String, String)], CalendarTime)
name_val_list :: CharParser a [(String, String)]
name_val_pair :: CharParser a (String, String)
item_name :: CharParser a String
item_value :: CharParser a String
optional_field :: CharParser a (String, String)
field_name :: CharParser a String
ftext :: CharParser a Char
obs_qp :: CharParser a String
obs_text :: CharParser a String
obs_char :: CharParser a Char
obs_utext :: CharParser a String
obs_phrase :: CharParser a [String]
obs_phrase_list :: CharParser a [String]
obs_fws :: CharParser a String
obs_day_of_week :: CharParser a Day
obs_year :: CharParser a Int
obs_month :: CharParser a Month
obs_day :: CharParser a Int
obs_hour :: CharParser a Int
obs_minute :: CharParser a Int
obs_second :: CharParser a Int
obs_zone :: CharParser a Int
obs_angle_addr :: CharParser a String
obs_route :: CharParser a [String]
obs_domain_list :: CharParser a [String]
obs_local_part :: CharParser a String
obs_domain :: CharParser a String
obs_mbox_list :: CharParser a [NameAddr]
obs_addr_list :: CharParser a [NameAddr]
obs_fields :: GenParser Char a [Field]
obs_orig_date :: CharParser a CalendarTime
obs_from :: CharParser a [NameAddr]
obs_sender :: CharParser a NameAddr
obs_reply_to :: CharParser a [NameAddr]
obs_to :: CharParser a [NameAddr]
obs_cc :: CharParser a [NameAddr]
obs_bcc :: CharParser a [NameAddr]
obs_message_id :: CharParser a String
obs_in_reply_to :: CharParser a [String]
obs_references :: CharParser a [String]
obs_id_left :: CharParser a String
obs_id_right :: CharParser a String
obs_subject :: CharParser a String
obs_comments :: CharParser a String
obs_keywords :: CharParser a [String]
obs_resent_from :: CharParser a [NameAddr]
obs_resent_send :: CharParser a NameAddr
obs_resent_date :: CharParser a CalendarTime
obs_resent_to :: CharParser a [NameAddr]
obs_resent_cc :: CharParser a [NameAddr]
obs_resent_bcc :: CharParser a [NameAddr]
obs_resent_mid :: CharParser a String
obs_resent_reply :: CharParser a [NameAddr]
obs_return :: CharParser a [Char]
obs_received :: CharParser a [(String, String)]
obs_path :: CharParser a String
obs_optional :: CharParser a (String, String)
Useful parser combinators
maybeOption :: GenParser tok st a -> GenParser tok st (Maybe a)
Return Nothing if the given parser doesn't match. This combinator is included in the latest parsec distribution as optionMaybe, but ghc-6.6.1 apparently doesn't have it.
unfold :: CharParser a b -> CharParser a b
unfold = between (optional cfws) (optional cfws)
header :: String -> CharParser a b -> CharParser a b
Construct a parser for a message header line from the header's name and a parser for the body.
obs_header :: String -> CharParser a b -> CharParser a b
Like header, but allows the obsolete white-space rules.
Primitive Tokens (section 3.2.1)
no_ws_ctl :: CharParser a Char
Match any US-ASCII non-whitespace control character.
text :: CharParser a Char
Match any US-ASCII character except for r, n.
specials :: CharParser a Char
Match any of the RFC's "special" characters: ()<>[]:;@,.\".
Quoted characters (section 3.2.2)
quoted_pair :: CharParser a String
Match a "quoted pair". All characters matched by text may be quoted. Note that the parsers returns both characters, the backslash and the actual content.
Folding white space and comments (section 3.2.3)
fws :: CharParser a String
Match "folding whitespace". That is any combination of wsp and crlf followed by wsp.
ctext :: CharParser a Char

Match any non-whitespace, non-control character except for "(", ")", and "\". This is used to describe the legal content of comments.

Note: This parser accepts 8-bit characters, even though this is not legal according to the RFC. Unfortunately, 8-bit content in comments has become fairly common in the real world, so we'll just accept the fact.

comment :: CharParser a String
Match a "comments". That is any combination of ctext, quoted_pairs, and fws between brackets. Comments may nest.
cfws :: CharParser a String
Match any combination of fws and comments.
Atom (section 3.2.4)
atext :: CharParser a Char
Match any US-ASCII character except for control characters, specials, or space. atom and dot_atom are made up of this.
atom :: CharParser a String
Match one or more atext characters and skip any preceeding or trailing cfws.
dot_atom :: CharParser a String
Match dot_atom_text and skip any preceeding or trailing cfws.
dot_atom_text :: CharParser a String
Match two or more atexts interspersed by dots.
Quoted strings (section 3.2.5)
qtext :: CharParser a Char
Match any non-whitespace, non-control US-ASCII character except for "\" and """.
qcontent :: CharParser a String
Match either qtext or quoted_pair.
quoted_string :: CharParser a String
Match any number of qcontent between double quotes. Any cfws preceeding or following the "atom" is skipped automatically.
Miscellaneous tokens (section 3.2.6)
word :: CharParser a String
Match either atom or quoted_string.
phrase :: CharParser a [String]
Match either one or more words or an obs_phrase.
utext :: CharParser a Char
Match any non-whitespace, non-control US-ASCII character except for "\" and """.
unstructured :: CharParser a String

Match any number of utext tokens.

"Unstructured text" is used in free text fields such as subject. Please note that any comments or whitespace that prefaces or follows the actual utext is included in the returned string.

Date and Time Specification (section 3.3)
date_time :: CharParser a CalendarTime

Parse a date and time specification of the form

   Thu, 19 Dec 2002 20:35:46 +0200

where the weekday specification "Thu," is optional. The parser returns a CalendarTime, which is set to the appropriate values. Note, though, that not all fields of CalendarTime will necessarily be set correctly! Obviously, when no weekday has been provided, the parser will set this field to Monday - regardless of whether the day actually is a monday or not. Similarly, the day of the year will always be returned as 0. The timezone name will always be empty: "".

Nor will the date_time parser perform any consistency checking. It will accept

    40 Apr 2002 13:12 +0100

as a perfectly valid date.

In order to get all fields set to meaningful values, and in order to verify the date's consistency, you will have to feed it into any of the conversion routines provided in System.Time, such as toClockTime. (When doing this, keep in mind that most functions return local time. This will not necessarily be the time you're expecting.)

day_of_week :: CharParser a Day
This parser will match a day_name, optionally wrapped in folding whitespace, or an obs_day_of_week and return it's Day value.
day_name :: CharParser a Day
This parser will the abbreviated weekday names ("Mon", "Tue", ...) and return the appropriate Day value.
date :: CharParser a (Int, Month, Int)
This parser will match a date of the form "dd:mm:yyyy" and return a tripple of the form (Int,Month,Int) - corresponding to (year,month,day).
year :: CharParser a Int
This parser will match a four digit number and return it's integer value. No range checking is performed.
month :: CharParser a Month
This parser will match a month_name, optionally wrapped in folding whitespace, or an obs_month and return it's Month value.
month_name :: CharParser a Month
This parser will the abbreviated month names ("Jan", "Feb", ...) and return the appropriate Month value.
day :: CharParser a Int
Match either an obs_day, or a one or two digit number and return it.
time :: CharParser a (TimeDiff, Int)
This parser will match a time_of_day specification followed by a zone. It returns the tuple (TimeDiff,Int) corresponding to the return values of either parser.
time_of_day :: CharParser a TimeDiff
This parser will match a time-of-day specification of "hh:mm" or "hh:mm:ss" and return the corrsponding time as a TimeDiff.
hour :: CharParser a Int
This parser will match a two-digit number and return it's integer value. No range checking is performed.
minute :: CharParser a Int
This parser will match a two-digit number and return it's integer value. No range checking is performed.
second :: CharParser a Int
This parser will match a two-digit number and return it's integer value. No range checking takes place.
zone :: CharParser a Int
This parser will match a timezone specification of the form "+hhmm" or "-hhmm" and return the zone's offset to UTC in seconds as an integer. obs_zone is matched as well.
Address Specification (section 3.4)
data NameAddr
A NameAddr is composed of an optional realname a mandatory e-mail address.
Constructors
NameAddr
nameAddr_name :: Maybe String
nameAddr_addr :: String
show/hide Instances
address :: CharParser a [NameAddr]
Parse a single mailbox or an address group and return the address(es).
mailbox :: CharParser a NameAddr
Parse a name_addr or an addr_spec and return the address.
name_addr :: CharParser a NameAddr
Parse an angle_addr, optionally prefaced with a display_name, and return the address.
angle_addr :: CharParser a String
Parse an angle_addr or an obs_angle_addr and return the address.
group :: CharParser a [NameAddr]

Parse a "group" of addresses. That is a display_name, followed by a colon, optionally followed by a mailbox_list, followed by a semicolon. The found address(es) are returned - what may be none. Here is an example:

    parse group "" "my group: user1@example.org, user2@example.org;"

This input comes out as:

    Right ["user1@example.org","user2@example.org"]
display_name :: CharParser a String
Parse and return a phrase.
mailbox_list :: CharParser a [NameAddr]
Parse a list of mailbox addresses, every two addresses being separated by a comma, and return the list of found address(es).
address_list :: CharParser a [NameAddr]
Parse a list of address addresses, every two addresses being separated by a comma, and return the list of found address(es).
Addr-spec specification (section 3.4.1)
addr_spec :: CharParser a String
Parse an "address specification". That is a local_part, followed by an "@" character, followed by a domain. Return the complete address as String, ignoring any whitespace or any comments.
local_part :: CharParser a String
Parse and return a "local part" of an addr_spec. That is either a dot_atom or a quoted_string.
domain :: CharParser a String
Parse and return a "domain part" of an addr_spec. That is either a dot_atom or a domain_literal.
domain_literal :: CharParser a String
Parse a "domain literal". That is a "[" character, followed by any amount of dcontent, followed by a terminating "]" character. The complete string is returned verbatim.
dcontent :: CharParser a String
Parse and return any characters that are legal in a domain_literal. That is dtext or a quoted_pair.
dtext :: CharParser a Char
Parse and return any ASCII characters except "[", "]", and "\".
Overall message syntax (section 3.5)
data Message
This data type repesents a parsed Internet Message as defined in this RFC. It consists of an arbitrary number of header lines, represented in the Field data type, and a message body, which may be empty.
Constructors
Message [Field] String
show/hide Instances
message :: CharParser a Message

Parse a complete message as defined by this RFC and it broken down into the separate header fields and the message body. Header lines, which contain syntax errors, will not cause the parser to abort. Rather, these headers will appear as OptionalFields (which are unparsed) in the resulting Message. A message must be really, really badly broken for this parser to fail.

This behaviour was chosen because it is impossible to predict what the user of this module considers to be a fatal error; traditionally, parsers are very forgiving when it comes to Internet messages.

If you want to implement a really strict parser, you'll have to put the appropriate parser together yourself. You'll find that this is rather easy to do. Refer to the fields parser for further details.

body :: CharParser a String
This parser will return a message body as specified by this RFC; that is basically any number of text characters, which may be divided into separate lines by crlf.
Field definitions (section 3.6)
data Field
This data type represents any of the header fields defined in this RFC. Each of the various instances contains with the return value of the corresponding parser.
Constructors
OptionalField String String
From [NameAddr]
Sender NameAddr
ReturnPath String
ReplyTo [NameAddr]
To [NameAddr]
Cc [NameAddr]
Bcc [NameAddr]
MessageID String
InReplyTo [String]
References [String]
Subject String
Comments String
Keywords [[String]]
Date CalendarTime
ResentDate CalendarTime
ResentFrom [NameAddr]
ResentSender NameAddr
ResentTo [NameAddr]
ResentCc [NameAddr]
ResentBcc [NameAddr]
ResentMessageID String
ResentReplyTo [NameAddr]
Received ([(String, String)], CalendarTime)
ObsReceived [(String, String)]
show/hide Instances
fields :: CharParser a [Field]

This parser will parse an arbitrary number of header fields as defined in this RFC. For each field, an appropriate Field value is created, all of them making up the Field list that this parser returns.

If you look at the implementation of this parser, you will find that it uses Parsec's try modifier around all of the fields. The idea behind this is that fields, which contain syntax errors, fall back to the catch-all optional_field. Thus, this parser will hardly ever return a syntax error -- what conforms with the idea that any message that can possibly be accepted should be.

The origination date field (section 3.6.1)
orig_date :: CharParser a CalendarTime
Parse a "Date:" header line and return the date it contains a CalendarTime.
Originator fields (section 3.6.2)
from :: CharParser a [NameAddr]
Parse a "From:" header line and return the mailbox_list address(es) contained in it.
sender :: CharParser a NameAddr
Parse a "Sender:" header line and return the mailbox address contained in it.
reply_to :: CharParser a [NameAddr]
Parse a "Reply-To:" header line and return the address_list address(es) contained in it.
Destination address fields (section 3.6.3)
to :: CharParser a [NameAddr]
Parse a "To:" header line and return the address_list address(es) contained in it.
cc :: CharParser a [NameAddr]
Parse a "Cc:" header line and return the address_list address(es) contained in it.
bcc :: CharParser a [NameAddr]
Parse a "Bcc:" header line and return the address_list address(es) contained in it.
Identification fields (section 3.6.4)
message_id :: CharParser a String
Parse a "Message-Id:" header line and return the msg_id contained in it.
in_reply_to :: CharParser a [String]
Parse a "In-Reply-To:" header line and return the list of msg_ids contained in it.
references :: CharParser a [String]
Parse a "References:" header line and return the list of msg_ids contained in it.
msg_id :: CharParser a String
Parse a "message ID:" and return it. A message ID is almost identical to an angle_addr, but with stricter rules about folding and whitespace.
id_left :: CharParser a String
Parse a "left ID" part of a msg_id. This is almost identical to the local_part of an e-mail address, but with stricter rules about folding and whitespace.
id_right :: CharParser a String
Parse a "right ID" part of a msg_id. This is almost identical to the domain of an e-mail address, but with stricter rules about folding and whitespace.
no_fold_quote :: CharParser a String
Parse one or more occurences of qtext or quoted_pair and return the concatenated string. This makes up the id_left of a msg_id.
no_fold_literal :: CharParser a String
Parse one or more occurences of dtext or quoted_pair and return the concatenated string. This makes up the id_right of a msg_id.
Informational fields (section 3.6.5)
subject :: CharParser a String
Parse a "Subject:" header line and return it's contents verbatim.
comments :: CharParser a String
Parse a "Comments:" header line and return it's contents verbatim.
keywords :: CharParser a [[String]]
Parse a "Keywords:" header line and return the list of phrases found. Please not that each phrase is again a list of atoms, as returned by the phrase parser.
Resent fields (section 3.6.6)
resent_date :: CharParser a CalendarTime
Parse a "Resent-Date:" header line and return the date it contains as CalendarTime.
resent_from :: CharParser a [NameAddr]
Parse a "Resent-From:" header line and return the mailbox_list address(es) contained in it.
resent_sender :: CharParser a NameAddr
Parse a "Resent-Sender:" header line and return the mailbox_list address(es) contained in it.
resent_to :: CharParser a [NameAddr]
Parse a "Resent-To:" header line and return the mailbox address contained in it. </