hledger-lib-1.20.4: A reusable library providing the core functionality of hledger

Description

File reading/parsing utilities used by multiple readers, and a good amount of the parsers for journal format, to avoid import cycles when JournalReader imports other readers.

Synopsis

# Documentation

A hledger journal reader is a triple of storage format name, a detector of that format, and a parser from that format to Journal. The type variable m appears here so that rParserr can hold a journal parser, which depends on it.

Constructors

 Reader FieldsrFormat :: StorageFormat rExtensions :: [String] rReadFn :: InputOpts -> FilePath -> Text -> ExceptT String IO Journal rParser :: MonadIO m => ErroringJournalParser m ParsedJournal

#### Instances

Instances details
 Show (Reader m) Source # Instance detailsDefined in Hledger.Read.Common MethodsshowsPrec :: Int -> Reader m -> ShowS #show :: Reader m -> String #showList :: [Reader m] -> ShowS #

data InputOpts Source #

Various options to use when reading journal files. Similar to CliOptions.inputflags, simplifies the journal-reading functions.

Constructors

 InputOpts Fieldsmformat_ :: Maybe StorageFormata file/storage format to try, unless overridden by a filename prefix. Nothing means try all.mrules_file_ :: Maybe FilePatha conversion rules file to use (when reading CSV)aliases_ :: [String]account name aliases to applyanon_ :: Booldo light anonymisation/obfuscation of the dataignore_assertions_ :: Booldon't check balance assertionsnew_ :: Boolread only new transactions since this file was last readnew_save_ :: Boolsave latest new transactions state for next timepivot_ :: Stringuse the given field's value as the account nameauto_ :: Boolgenerate automatic postings when journal is parsedcommoditystyles_ :: Maybe (Map CommoditySymbol AmountStyle)optional commodity display styles affecting all filesstrict_ :: Booldo extra error checking (eg, all posted accounts are declared)

#### Instances

Instances details
 Source # Instance detailsDefined in Hledger.Read.Common MethodsshowList :: [InputOpts] -> ShowS # Source # Instance detailsDefined in Hledger.Read.Common Methods

# parsing utilities

runJournalParser :: Monad m => JournalParser m a -> Text -> m (Either (ParseErrorBundle Text CustomErr) a) Source #

rjp :: Monad m => JournalParser m a -> Text -> m (Either (ParseErrorBundle Text CustomErr) a) Source #

runErroringJournalParser :: Monad m => ErroringJournalParser m a -> Text -> m (Either FinalParseError (Either (ParseErrorBundle Text CustomErr) a)) Source #

rejp :: Monad m => ErroringJournalParser m a -> Text -> m (Either FinalParseError (Either (ParseErrorBundle Text CustomErr) a)) Source #

Construct a generic start & end line parse position from start and end megaparsec SourcePos's.

Given a parser to ParsedJournal, input options, file path and content: run the parser on the content, and finalise the result to get a Journal; or throw an error.

Like parseAndFinaliseJournal but takes a (non-Erroring) JournalParser. Also, applies command-line account aliases before finalising. Used for timeclock/timedot. TODO: get rid of this, use parseAndFinaliseJournal instead

Post-process a Journal that has just been parsed or generated, in this order:

• apply canonical amount styles,
• save misc info and reverse transactions into their original parse order,
• evaluate balance assignments and balance each transaction,
• apply transaction modifiers (auto postings) if enabled,
• check balance assertions if enabled.
• infer transaction-implied market prices from transaction prices

Get amount style associated with default currency.

Returns AmountStyle used to defined by a latest default commodity directive prior to current position within this file or its parents.

Get the AmountStyle declared by the most recently parsed (in the current or parent files, prior to the current position) commodity directive for the given commodity, if any.

# parsers

## dates

Parse a date in YYYY-MM-DD format. Slash (/) and period (.) are also allowed as separators. The year may be omitted if a default year has been set. Leading zeroes may be omitted.

Parse a date and time in YYYY-MM-DD HH:MM[:SS][+-ZZZZ] format. Slash (/) and period (.) are also allowed as date separators. The year may be omitted if a default year has been set. Seconds are optional. The timezone is optional and ignored (the time is always interpreted as a local time). Leading zeroes may be omitted (except in a timezone).

## account names

Parse an account name (plus one following space if present), then apply any parent account prefix and/or account aliases currently in effect, in that order. (Ie first add the parent account prefix, then rewrite with aliases). This calls error if any account alias with an invalid regular expression exists.

Parse an account name, plus one following space if present. Account names have one or more parts separated by the account separator character, and are terminated by two or more spaces (or end of input). Each part is at least one character long, may have single spaces inside it, and starts with a non-whitespace. Note, this means "{account}", "%^!" and ";comment" are all accepted (parent parsers usually prevent/consume the last). It should have required parts to start with an alphanumeric; for now it remains as-is for backwards compatibility.

## amounts

Parse whitespace then an amount, with an optional left or right currency symbol and optional price, or return the special "missing" marker amount.

Parse a single-commodity amount, with optional symbol on the left or right, followed by, in any order: an optional transaction price, an optional ledger-style lot price, and/or an optional ledger-style lot date. A lot price and lot date will be ignored.

To parse the amount's quantity (number) we need to know which character represents a decimal mark. We find it in one of three ways:

1. If a decimal mark has been set explicitly in the journal parse state, we use that
2. Or if the journal has a commodity declaration for the amount's commodity, we get the decimal mark from that
3. Otherwise we will parse any valid decimal mark appearing in the number, as long as the number appears well formed.

Note 3 is the default zero-config case; it means we automatically handle files with any supported decimal mark, but it also allows different decimal marks in different amounts, which is a bit too loose. There's an open issue.

Parse an amount from a string, or get an error.

Parse a mixed amount from a string, or get an error.

Parse a string representation of a number for its value and display attributes.

Some international number formats are accepted, eg either period or comma may be used for the decimal mark, and the other of these may be used for separating digit groups in the integer part. See http://en.wikipedia.org/wiki/Decimal_separator for more examples.

This returns: the parsed numeric value, the precision (number of digits seen following the decimal mark), the decimal mark character used if any, and the digit group style if any.

fromRawNumber :: RawNumber -> Maybe Integer -> Either String (Quantity, Word8, Maybe Char, Maybe DigitGroupStyle) Source #

Interpret a raw number as a decimal number.

Returns: - the decimal number - the precision (number of digits after the decimal point) - the decimal point character, if any - the digit group style, if any (digit group character and sizes of digit groups)

rawnumberp :: TextParser m (Either AmbiguousNumber RawNumber) Source #

Parse and interpret the structure of a number without external hints. Numbers are digit strings, possibly separated into digit groups by one of two types of separators. (1) Numbers may optionally have a decimal mark, which may be either a period or comma. (2) Numbers may optionally contain digit group marks, which must all be either a period, a comma, or a space.

It is our task to deduce the characters used as decimal mark and digit group mark, based on the allowed syntax. For instance, we make use of the fact that a decimal mark can occur at most once and must be to the right of all digit group marks.

>>> parseTest rawnumberp "1,234,567.89"
Right (WithSeparators ',' ["1","234","567"] (Just ('.',"89")))
>>> parseTest rawnumberp "1,000"
Left (AmbiguousNumber "1" ',' "000")
>>> parseTest rawnumberp "1 000"
Right (WithSeparators ' ' ["1","000"] Nothing)


A blank or comment line in journal format: a line that's empty or containing only whitespace or whose first non-whitespace character is semicolon, hash, or star.

Parse the text of a (possibly multiline) comment following a journal item.

>>> rtp followingcommentp ""   -- no comment
Right ""
>>> rtp followingcommentp ";"    -- just a (empty) same-line comment. newline is added
Right "\n"
>>> rtp followingcommentp ";  \n"
Right "\n"
>>> rtp followingcommentp ";\n ;\n"  -- a same-line and a next-line comment
Right "\n\n"
>>> rtp followingcommentp "\n ;\n"  -- just a next-line comment. Insert an empty same-line comment so the next-line comment doesn't become a same-line comment.
Right "\n\n"


Parse a transaction comment and extract its tags.

The first line of a transaction may be followed by comments, which begin with semicolons and extend to the end of the line. Transaction comments may span multiple lines, but comment lines below the transaction must be preceded by leading whitespace.

200011 ; a transaction comment starting on the same line ... ; extending to the next line account1 $1 account2 Tags are name-value pairs. >>> let getTags (_,tags) = tags >>> let parseTags = fmap getTags . rtp transactioncommentp  >>> parseTags "; name1: val1, name2:all this is value2" Right [("name1","val1"),("name2","all this is value2")]  A tag's name must be immediately followed by a colon, without separating whitespace. The corresponding value consists of all the text following the colon up until the next colon or newline, stripped of leading and trailing whitespace. Parse a posting comment and extract its tags and dates. Postings may be followed by comments, which begin with semicolons and extend to the end of the line. Posting comments may span multiple lines, but comment lines below the posting must be preceded by leading whitespace. 200011 account1$1 ; a posting comment starting on the same line ... ; extending to the next line

account2 ; a posting comment beginning on the next line

Tags are name-value pairs.

>>> let getTags (_,tags,_,_) = tags
>>> let parseTags = fmap getTags . rtp (postingcommentp Nothing)

>>> parseTags "; name1: val1, name2:all this is value2"
Right [("name1","val1"),("name2","all this is value2")]


A tag's name must be immediately followed by a colon, without separating whitespace. The corresponding value consists of all the text following the colon up until the next colon or newline, stripped of leading and trailing whitespace.

Posting dates may be expressed with "date"/"date2" tags or with bracketed date syntax. Posting dates will inherit their year from the transaction date if the year is not specified. We throw parse errors on invalid dates.

>>> let getDates (_,_,d1,d2) = (d1, d2)
>>> let parseDates = fmap getDates . rtp (postingcommentp (Just 2000))

>>> parseDates "; date: 1/2, date2: 1999/12/31"
Right (Just 2000-01-02,Just 1999-12-31)
>>> parseDates "; [1/2=1999/12/31]"
Right (Just 2000-01-02,Just 1999-12-31)


Example: tags, date tags, and bracketed dates >>> rtp (postingcommentp (Just 2000)) "; a:b, date:34, [=56]" Right ("a:b, date:34, [=56]n",[("a","b"),("date","3/4")],Just 2000-03-04,Just 2000-05-06)

Example: extraction of dates from date tags ignores trailing text >>> rtp (postingcommentp (Just 2000)) "; date:34=56" Right ("date:34=56n",[("date","34=56")],Just 2000-03-04,Nothing)

## bracketed dates

Parse Ledger-style bracketed posting dates ([DATE=DATE2]), as "date" and/or "date2" tags. Anything that looks like an attempt at this (a square-bracketed sequence of 0123456789/-.= containing at least one digit and one date separator) is also parsed, and will throw an appropriate error.

The dates are parsed in full here so that errors are reported in the right position. A missing year in DATE can be inferred if a default date is provided. A missing year in DATE2 will be inferred from DATE.

>>> either (Left . customErrorBundlePretty) Right $rtp (bracketeddatetagsp Nothing) "[2016/1/2=3/4]" Right [("date",2016-01-02),("date2",2016-03-04)]  >>> either (Left . customErrorBundlePretty) Right$ rtp (bracketeddatetagsp Nothing) "[1]"
Left ...not a bracketed date...

>>> either (Left . customErrorBundlePretty) Right $rtp (bracketeddatetagsp Nothing) "[2016/1/32]" Left ...1:2:...well-formed but invalid date: 2016/1/32...  >>> either (Left . customErrorBundlePretty) Right$ rtp (bracketeddatetagsp Nothing) "[1/31]"
Left ...1:2:...partial date 1/31 found, but the current year is unknown...

>>> either (Left . customErrorBundlePretty) Right \$ rtp (bracketeddatetagsp Nothing) "[0123456789/-.=/-.=]"
Left ...1:13:...expecting month or day...


## misc

Parse any text beginning with a non-whitespace character, until a double space or the end of input. TODO including characters which normally start a comment (;#) - exclude those ?

Similar to singlespacedtextp, except that the text must only contain characters satisfying the given predicate.

Parse one non-newline whitespace character that is not followed by another one.

Get the account name aliases from options, if any.