4      !"#$%&'()*+,-./0123 2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllSafe4Macro expansion text is divided into sections, each of which is classified as one of three kinds: a formal argument (Arg), plain text (Text), or a stringised formal argument (Str).5smart: constructor to avoid warnings from ghc (undefined fields)6NExpand an instance of a macro. Precondition: got a match on the macro name.77Parse a #define, or #undef, ignoring other # directives8BPretty-print hash defines to a simpler format, as key-value pairs.49:;<=>?@ABCDEF567849:;<=>?@ABCDEF67849:;<=?@BD>>>A>CA>EFA56782000-2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk>StableAllSafeG'Index Trees (storing indexes at nodes).HQSymbol Table. Stored values are polymorphic, but the keys are always strings.IJKGLMHNOPQRSTUVWXYGHNOPQRSIJKGLMHNOPQRSTUVWXYSafe takes a filename (for error reports), and transforms the given string, to eliminate the literate comments from the program text. Z[\]^_`abcdZ[\]^_`abcd2006 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllSafeeoRaw command-line options. This is an internal intermediate data structure, used during option parsing only."Options representable as Booleans.,Leave #define and #undef in output of ifdef? Place #line droppings in output?Write #line or {-# LINE #-} ?Keep #pragma in final output?&Remove C eol (//) comments everywhere?+Remove C inline (/**/) comments everywhere? Lex input as Haskell code? -Permit stringise # and catenate ## operators? $Retain newlines in macro expansions? Remove literate markup? Issue warnings?Cpphs options structure.&Files to #include before anything elseDefault options.$Default settings of boolean options.fYParse a single raw command-line option. Parse failure is indicated by result Nothing.gTrim trailing elements of the second list that match any from the first list. Typically used to remove trailing forward/back slashes from a directory path.h7Convert a list of RawOption to a BoolOptions structure.Parse all command-line options.,eijklmnopqrstuvw xfgh g eijklmnopqrstuvw  xfgh2000-2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllSafeSource positions contain a filename, line, column, and an inclusion point, which is itself another source position, recursively.#Constructor. Argument is filename.*Increment column number by given quantity.(Increment row number, reset column to 1.5Increment column number, tab stops are every 8 chars.'Increment row number by given quantity. 6Update position with a new row, and possible filename.!Project the line number."Project the filename.#&Project the directory of the filename.$#cpp-style printing of file position%'haskell-style printing of file position&<Conversion from a cpp-style "#line" to haskell-style pragma.y_Strip non-directory suffix from file name (analogous to the shell command of the same name).'dSigh. Mixing Windows filepaths with unix is bad. Make sure there is a canonical path separator. !"#$%&y'z !"#$%&' !"#$%&y'z2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllSafe(Each token is classified as one of Ident, Other, or Cmd: * Ident is a word that could potentially match a macro name. * Cmd is a complete cpp directive (#define etc). * Other is anything else.{NSubmodes are required to deal correctly with nesting of lexical structures.|A Mode value describes whether to tokenise a la Haskell, or a la Cpp. The main difference is that in Cpp mode we should recognise line continuation characters.}linesCpp is, broadly speaking, Prelude.lines, except that on a line beginning with a #, line continuation characters are recognised. In a line continuation, the newline character is preserved, but the backslash is not.~*Put back the line-continuation characters.,1tokenise is, broadly-speaking, Prelude.words, except that: * the input is already divided into lines * each word-like "token" is categorised as one of {Ident,Other,Cmd} * #define's are parsed and returned out-of-band using the Cmd variant * All whitespace is preserved intact as tokens. * C-comments are converted to white-space (depending on first param) * Parens and commas are tokens in their own right. * Any cpp line continuations are respected. No errors can be raised. The inverse of tokenise is (concatMap deWordStyle).HParse a possible macro call, returning argument list and remaining input()*+{|}~, ()*+}~, ()*+{|}~,2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllNone-KWalk through the document, replacing calls of macros with the expanded RHS. auxiliary.Walk through the document, replacing calls of macros with the expanded RHS. Additionally returns the active symbol table after processing.$Turn command-line definitions (from -D) into <s.5Turn a string representing a macro definition into a <.fTrundle through the document, one word at a time, using the WordStyle classification introduced by , to decide whether to expand a word or macro. Encountering a #define or #undef causes that symbol to be overwritten in the symbol table. Any other remaining cpp directives are discarded and replaced with blanks, except for #line markers. All valid identifiers are checked for the presence of a definition of that name in the symbol table, and if so, expanded appropriately. (Bool arguments are: keep pragmas? retain layout? haskell language?) The result lazily intersperses output text with symbol tables. Lines are emitted as they are encountered. A symbol table is emitted after each change to the defined symbols, and always at the end of processing.Useful helper function.Useful helper function. -$Pre-defined symbols and their values#Options that alter processing styleThe input file contentThe file after processing.$Pre-defined symbols and their values#Options that alter processing styleThe input file content*The file and symbol table after processing-. -. 2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllSafeAttempt to read the given file from any location within the search path. The first location found is returned, together with the file content. (The directory of the calling file is always searched first, then the current directory, finally any specified search path.)filenameinclusion point search pathreport warnings?&discovered filepath, and file contents 1999-2004 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllNone BInternal state for whether lines are being kept or dropped. In  Drop n b ps, n is the depth of nesting, bP is whether we have already succeeded in keeping some lines in a chain of elif's, and ps# is the stack of positions of open #ifF contexts, used for error messages in case EOF is reached too soon./Run a first pass of cpp, evaluating #ifdef's and processing #include's, whilst taking account of #define's and #undef's as we encounter them.EReturn just the list of lines that the real cpp would decide to keep.Auxiliary IO functions[The preprocessor must expand all macros (recursively) before evaluating the conditional.Expansion of symbols.5Return the expansion of the symbol (if there is one).HThe standard "parens" parser does not work for us here. Define our own.Determine filename in #include/File for error reports$Pre-defined symbols and their valuesSearch path for #includes Options controlling output styleThe input file content$The file after processing (in lines)// None0123012301232000-2006 Malcolm WallaceLGPL/Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> experimentalAllNone3  !"#$%&'()*+,-./012330123/,()*+-.  $%&"!#'    !"#$%&'()*+,-./012345678 9 : ; < =>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuIDvwxyz{|}~    cpphs_GFZcQ7jDfUx2dsujmc7HXkLanguage.Preprocessor.UnlitLanguage.Preprocessor.Cpphs&Language.Preprocessor.Cpphs.HashDefine"Language.Preprocessor.Cpphs.SymTab#Language.Preprocessor.Cpphs.Options$Language.Preprocessor.Cpphs.Position$Language.Preprocessor.Cpphs.Tokenise%Language.Preprocessor.Cpphs.MacroPass%Language.Preprocessor.Cpphs.ReadFirst$Language.Preprocessor.Cpphs.CppIfdef$Language.Preprocessor.Cpphs.RunCpphsunlit BoolOptionsmacros locationshashlinepragmastripEolstripC89langansilayoutliteratewarnings CpphsOptionsinfilesoutfilesdefinesincludes preIncludebooloptsdefaultCpphsOptionsdefaultBoolOptions parseOptionsPosnPnnewfileaddcolnewlinetabnewlinesnewposlinenofilename directorycpplinehasklinecpp2hask cleanPath WordStyleIdentOtherCmdtokenise macroPassmacroPassReturningSymTabcppIfdefrunCpphs runCpphsPass1 runCpphsPass2runCpphsReturningSymTab ArgOrTextsymbolReplacement expandMacroparseHashDefinesimplifyHashDefinesArgTextStr HashDefineLineDropnamePragma AntiDefined linebreaksSymbolReplacement replacementMacroExpansion arguments expansionIndTreeSymTabHashable hashWithMaxhashLeafForkemptySTinsertSTdeleteSTlookupST definedST flattenSTitgenitiapitinditfoldmaxHash $fHashable[] ClassifiedProgramBlankCommentIncludePreclassify unclassifyadjacentmessageinlines RawOption rawOptiontrailingboolOptsNoMacroNoLine LinePragmaStripStripEolAnsiLayoutUnlitSuppressWarningsMacroPath PreIncludeIgnoredForCompatibilityflagsdirname $fShowPosnSubModeModelinesCppreslashparseMacroCallAnyPredString LineComment NestCommentCComment CLineCommentHaskellCppother deWordStyle onlyRights preDefine defineMacro macroProcessemit emitSymTabnoPos readFirst KeepStatecppemitOne preExpandexpandSymOrCallparseSymOrCall parenthesisfileKeepDropemitMany gatherDefined notComment parseBoolExp parseExp1 parseExp0parseArithExp1parseArithExp0 parseNumber parseCmpOp parseArithOp1 parseArithOp0recursivelyExpandparseSymnotIdentskip