8m {      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz {"Type alias to ease documentation. |1Describe different kind of content parser usable }"Do what you want with it for now. ~1You can go ahead and use a rather strict parser. 0Indicate a parser which must be tolerant enough  to parse HTML "Content-type field of HTTP header 8Associate extension to parser, used for local file type  recognition. 3Given a MimeType, return the kind of parser to use  for the given data. 6Given a file name, return a ParseableType, explaining 1 the kind of parser to use on the given content. "Given a content type, the same as  isParseable, and 9 a filepath, we add a type extension if the the filepath  don' t have any. 5The intent is to add a valid extension given a valid ) MIME-TYPE, to get correct OS behaviour. Mimetype/extension association. {|}~{|~}}~i'The aim of this typeclass is to permit  the use of different html/xml parser if * if the first one is found to be bad. All & the logic should use this interface. %Minimal implementation : everything. "Get back an attribute of the node  if it exists %If the current node is named, return  it'"s name, otherwise return Nothing. $Get all the children of the current  node. (Retrieve the value of the tag (textual) +Retrieve all the indirectly linked content , of a node, can be used for element like an  HTML link or an linked image/obj  The idea behind link following. 0 The graph engine may have another name for the , resource, so an updated name can be given. 1 The given function is there to log information,  the second is to log errors 0Tell if the history associated is fixed or not. 4 If the history is not fixed and can change (if you 4 are querying the filesystem for example, it should  return False) "Result of indirect access demand. Cannot access the resource. We got something, but we can' t interpret ! it, so we return a binary blob. %We got a result and parsed it, maybe ( it has changed of location, so we give  back the location (Represent indirect links or links which - necessitate the use of the IO monad to walk  around the graph. *Combine two path togethers, you can think  of the  / operator for an equivalence. %conversion to be used to import path  from attributes/document (not really  well specified). 'Move semantic, try to dump the pointed ! resource to the current folder. #Given a graphpath, transform it to ' a filepath which can be used to store  a node. NormalErrverbose loggers. )Type used to propagate different logging  level across the software. )Represent the path used to find the node ' from the starting point of the graph. Return a list of all the children/linked node of a given node. - The given node is not included in the list. 1 A list of node with the taken path is returned. !Given a tag and a name, retrieve + the first matching tags in the hierarchy. 1 It must return the list of ancestors permitting - the acess to the path used to find children 4the returned list must contain : the node itself if 8 it match the name, and all the children containing the  good name. $Return the first found node if any.        CInternal data context. /Context stack used in breadth-first evaluation )State waiting to be executed in a depth-  first execution. .State used to implement branches in the depth  first evaluator. )Buckets used for uniqueness pruning, all  evaluation kind. *Counters used for range evaluation in DFS Current log level *Just an index to a state in the automata. 6Number of elements which arrived by a true transition  to a state in the automata. 4Number of elements seen at a state in the automata. An int used as a counter 2Type used to represent the current logging level.  Default is  #Display many debugging information #Display dumped information and IOs $Only display the dumped information *This type represent the temporary results  of the evaluation of regexp.  +Represent a binary blob, often downloaded.   1The last indirect path used to get to this blob.  The binary data  $Represent a graph node and the path  used to go up to it. &Path from the root of the document to   node. Real node value 1The last indirect path used to get to this node. :Record a graph path in a document, from the last indirect  node to this one. -If the graph is suceptible to move under our 1 feets, we have to search again for the position ! of the node in the parent node. -A path in an immutable graph. The graph that  doesn'(t move under our feets, so we store the * index of the followgin node in the path. WebContext is  as a simple Monad 1Typical use of the WebContextT monad transformer " allowing to download information 0Fuse two history together, is equivalent to the   operator for list. /Append at info at the beginning of an history,  equivalent to the : operator for lists. :Function useful if used in combination of an union-node : 9 - A function produce a node context for a specific type 1 - You want to generalise it for a complex union  - Use this function :) -For example to produce a simple union node : . repurposeNode UnionRight $ initialSimpleNode .Setter for the wait time between two indirect  operations. &The value is stored but not used yet. return the value set by  )Define the text output for written text. 0Retrieve the default file output used for text.  >Set the user agent which must be used for indirect operations &The value is stored but not used yet. !return the value set by   "$Set the value of the logging level. #Tell if the current  LoggingLevel is set to  $TODO : write documentation %TODO : write documentation &7Internally the monad store a stack of state : the list  of currently evaluated . Pushing this context ; with store all the current nodes in it, waiting for later  retrieval. 'Inverse operation of & , retrieve  stored nodes. (:Helper function used to start the evaluation of a webrexp - with a default context, with sane defaults. )%Return normal, error, verbose logger *5Record a node in the context for the DFS evaluation. +.Get the last record from the top of the stack ,Add a 'frame'( context to the current DFS evaluation. ? A frame context possess a node to revert to and two counters. 9 A counter for seen nodes which must be evaluated before  backtracking < A counter for valid node count, to keep track if the whole % frame has a valid result or not. You can look at - and . ) for other frame manipulation functions. -,Retrieve the frame on the top of the stack. + for more information regarding frames see , .8Add seen node count and valid node count to the current  frame. *for more information regarding frames see , /8Initialisation function which must be called before the # beginning of a webrexp execution. "Inform the monad of the number of Unique bucket in the > expression, permitting the allocation of the required number  of Set to hold them. Unique bucket count Range counter count 05Used for node range, return the current value of the  counter and increment it. 1<Tell if a string has already been recorded for a bucket ID. $ Used for the implementation of the Unique constructor of a webrexp. Return False, unless 2 has been called with the same  string before. 2Record the visit of a string. 1 will return True & for the same string after this call. 8      !"#$%&'()*+,-./0128         !"#$%&'()*+,-./012 3.Debugging function, only displayed in verbose  logging mode. 42If a webrexp output some text, it must go through 6 this function. It ensure the writting in the correct  file. 3434 5Argument list Pipeline argument Result 64Type used to describe evaluator for function inside  webrexp actions. Argument list Pipeline argument Result 77Data used for the evaluation of actions. Represent the - whole set of representable data at runtime. 89:;<Typecast operation, from :  - string to int  - Bool to int =Convert any value to string >?-Remove blank space before and after a string @/This function take a string as first parameter 2 (the template string) and a list of string to be  inserted at some points. 5The format string is made up of some tagged indices,  for example '{0}'& reference the first inserted content  and '{2}' the third one. the '}' character can be  escapped by prefixing it by a '\'    format "da {0} bu {1} \\ {0} do {1}" ["head", "second"]  -> Just "da head bu second {0} do second" >It work as intented, there is no syntax error in the formated ' string, and all indices are in bound.    format "da {0} bu {1} \\ {0} do {2}" ["head", "second"]  -> A the '2'/ index is out of bound, so the function return  A    format "da {0} bu {1} \\ {0} do {1a}" ["head", "second"]  -> A A is returned because '1a' is not a valid index. Template string Inserted content The formated string B'Format a string given a list of action ) values, if the first parameter is not a  string, return a type error. C+Given a prefix and a list, return the rest  of the list D(Replace globally (for each repeatition) ) a sublist by another one in a give list The substituted list The replaced sublist The replacement EF56789:;<=>?@BEF567;:9889:;<=>?@BEF Main data type *Represent a ressource spread on internet. 2Represent a file stored on the hard-drive of this  machine. -Given a ressource, transforme it to a string 6 representation. This function should be used instead  of the G' instance, which is aimed at debugging  only. HI#Resource path combiner, similar to  / in use,  but also handle URI. JK=Helper function to grab a resource on internet and returning  it's binary representation, and it's real place if any.  L/Data type which is an instance of graphwalker. ) Use it to combine two other node types. MNO<Extension of GraphWalker class to be able to query the type  about it''s possibility of parsing. Very ad-hoc. P-Provide a dummy element just to be passed at Q. ? Forcing a monoid instance was not ideal, so here is the hack. Q5Tell if a node type can parse a given document, used  in the node type decision. RThe real parsing function. STU8Allow recursion of union node, so a tree of multidomain  node can be built. LMNOPQRLNMMNOPQRPQR V5Type representing a local folder directory as a node  (and not as a path) WXY+Type introduced to avoid stupid positional  errors in the V type. Z[\]^1Transform a filepath into a valid directory node - if the path is valid in the current system. _/Create a node rooted in the current directory. `a.The problem of this instance is the fact that  it's a sink" instance, it accepts everything. V^_V^_bcd/Given a resource path, do the required loading bbefgh/Given a resource path, do the required loading eeN#Type representation of web-regexp,  main type. '<' operator in the language.  Select the parent node '~' operator in the language, used & to select the previous sibling node. '+' operator in the language, used " to select the next sibling node. '>>' operator in the language, used  to follow hyper link !(This constructor is an optimisation, it - combine an Ref followed by an action, where + every action is a predicate. Help pruning 0 quickly the evaluation tree in DFS evaluation. "2Find children who are the different descendent of  the current nodes. # every tag/ class name $'[ ... ]' Filtering Range , The Int is used as an index for a counter  in the DFS evaluator. %"{ ... }" &"..."- A string constant in the source expression. ''!' = Possess an unique index to differentiate all the differents A uniques. Negative value are considered invalid, all positive or  null one are accepted. ('|'$ Represent two alternative path, if ) the first fail, the second one is taken )... #{ } *... * +,... ... (each action followed, no rollback) ,( ... , ... , ... ) -( ... ; ... ; ... ) ./012$Represent an action Each production & of the grammar more or less map to a  data constructor of this type. 3funcName(..., ...) 4the i action. Dump the content of  the current element. 5'$' ... operator ( Used to put the action value back into  the evaluation pipeline. 6A string constant 7An integer constant. 8&Find a value of a given attribute for  the current element. 9!Basic binary opertor application :{ ... ; ... ; ... ; ... } # A list of action to execute, each  one must return a valid value to  continue the evaluation ;'Type used to index built-in functions  in actions. <=>?@AB*Definitions of the operators available in  the actions of the webrexp. C':' concatenate two strings D'|='% op beginning, as the CSS3 operator. E'^='% op beginning, as the CSS3 operator. F'$='% op beginning, as the CSS3 operator. G'^='% op beginning, as the CSS3 operator. H'~='# op contain, as the CSS3 operator. I'=~' regexp matching J'|' (j in Haskell) K'&' (k in Haksell) L'!=' (l in Haskell) M'=' in webrexp (m in Haskell) Nn Oo Pp Qq Rr Ss Tt Uu V/Ranges to be able to filter nodes by position. Wmin-max X... Yrepresent an element Z#... Check the value of the 'id' attribute [@+... Check for the presence of an attribute \"... . ... Check the value of the 'class' attribute ] ... Search for a named element. ^'*' Any subelement. v_0This function is an helper function to simplify 4 the handling the node range. After simplification, 1 the ranges are sorted in ascending order and no  node range overlap. `,Tell if an action operator return a boolean - operation. Useful to tell if an action is a  predicate. See a a-Tell if an action is a predicate and is only 2 used to filter nodes. Expression can be modified 0 with this information to help prunning as soon % as possible with the DFS evaluator. bAThis function permit the rewriting of a wabrexp in a depth-first + fashion while carying out an accumulator. c5Preparation function for webrexp, assign all indices % used for evaluation as an automata. dw+Set the index for every unique, return the - new webrexp and the count of unique element x5Set the indices for the Range constructor (filtering  by ID). ePretty printing for Y. It's should be reparsable  by the WebRexp parser. f5Helper function to check if a given in dex is within  all the ranges K !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefKY^]\[ZVXWBUTSRQPONMLKJIHGFEDC2:9876543-,+*)('&%$#"! .10/;A@?>=<_bcedf`aK-,+*)('&%$#"!  !"#$%&'()*+,-.10//012:98765433456789:;A@?>=<<=>?@ABUTSRQPONMLKJIHGFEDCCDEFGHIJKLMNOPQRSTUVXWWXY^]\[ZZ[\]^_`abcdef!yLittle shortcut. g Parser used to parse a webrexp.  Use just like any z 3.0 parser. {|}~Parse some range ggg Function to cal result h$Evaluate embedded action in WebRexp hh8Given a node search for valid children, check for their # validity against the requirement. Do we recurse?  Ref to find The root nood for the search The found nodes. i8Evaluate the leaf nodes of a webrexp, this way the code ; can be shared between the Breadth first evaluator and the  Depth first one. 2Let access sibling nodes with a predefined index. hihii*Simply the index of the state in a table. %The automata representing a WebRexp,  ready to be executed. "Action to perform, action on True  action on False 9General function to translate a webrexp to an evaluation  automata. 1Debug function dumping the automata in form of a . graphviz file. The file can be used with the 'dot' / tool to produce a visualisation of the graph. %Text used as title for the automata. 3Where the graphviz representation will be written. Automata to dump Main transformation function. 0 Assume that each state has two output, one for 0 true and one for false, simplifying the design  of the function. 3The idea is to be able to store the automata in an 3 array after the generation, hence the propagation  of different indexes. .Expression to be transformed into an automata Last free index  The input/ output for the current automata 6 | The first unused, the index of the beggining state : of the converted webrexp, and finaly the list of states. 4Simple function performing a depth first evaluation Main Evaluation function Automata to evaluate State to evaluate  Are we coming from a true link. Current evaluated element +Pop a record and start evaluation for him. $Evaluation function for an element. Evaluation automata Current state in the automata 1If we are coming from a True link or a False one Currently evaluated element 4Main function to evaluate the expression in breadth  first order. Automata to evaluate State to evaluate  Are we coming from a true link. Current evaluated element -Main evaluation function for BFS evaluation. Evaluation automata Current state in the automata 1If we are coming from a True link or a False one Currently evaluated elements 0For the current state, filter the value to keep 0 only the values which are included in the node  range. jklmnopqrstuvPrepare a webrexp. / This function is useful if the expression has  to be applied many times. w#Evaluation for pre-parsed webrexp. . Best method if a webrexp has to be evaluated  many times. x*Simple evaluation function, evaluation is  the breadth first type. y%Simplest function to eval a webrexp. . Return the evaluation status of the webrexp, # True for full evaluation success. zjklmnopqrstuvwxyzxyvwjklmnopqrstuzj klmnopqrstklmnopqrstuvwxyz !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{||}~                                  ! " # $ % & ' ()*+ , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S)TU V W X Y Z)[\]^_` a b c d e f g h i j k l m n n o p q r s t uvwxyz{|)*})~)~)~)~)~)~)~)~)))) Webrexp-1.0Webrexp.GraphWalkerWebrexp.ResourcePathWebrexp.ExprtypesWebrexp.Parser Webrexp.EvalWebrexpWebrexp.Remote.MimeTypesWebrexp.ProjectByteStringWebrexp.WebContext Webrexp.LogWebrexp.Eval.ActionFuncWebrexp.UnionNodeWebrexp.DirectoryNodeWebrexp.JsonNodeWebrexp.HaXmlNodeWebrexp.Eval.ActionWebrexp.WebRexpAutomata GraphWalkerattribOfnameOf childrenOfvalueOf indirectLinks accessGraphisHistoryMutable AccessResult AccessErrorDataBlobResult GraphPath importPathdumpDataAtPath localizePathLoggersLoggerNodePath descendants findNamedfindFirstNamed ResourcePathRemoteLocalrezPathToStringdownloadBinaryWebRexpParentPreviousSibling NextSiblingDiggLinkConstrainedRef DirectChildRefRangeActionStrUnique AlternativeRepeatStarListUnionsBranch RepeatCount RepeatBetween RepeatAtLeast RepeatTimes ActionExprCall OutputAction NodeReplaceCstSCstIARefBinOp ActionExprs BuiltinFunc BuiltinSystem BuiltinFormatBuiltinToString BuiltinToNumBuiltinSubsitute BuiltinTrimOpOpConcat OpHyphenBegin OpSubstringOpEndOpBegin OpContainOpMatchOpOrOpAndOpNeOpEqOpGeOpGtOpLeOpLtOpDivOpMulOpSubOpAdd NodeRangeIntervalIndexWebRefOfNameAttribOfClassElemWildcardsimplifyNodeRangesisOperatorBooleanisActionPredicate foldWebRexpassignWebrexpIndicespackRefFilteringprettyShowWebRef isInNodeRange webRexpParser evalActionevalWebRexpForConfhammeringDelay userAgentoutputverbosequietexprshowHelpdepthEvaluationoutputGraphViz defaultConf parseWebRexpevalParsedWebRexp evalWebRexpevalWebRexpDepthFirstevalWebRexpWithConf ContentType ParseableType ParseableJson ParseableXML ParseableHTMLfindContentTypeOf fileExtensiongetParserForMimeType getParseKindaddContentTypeExtension mimeExtensionbytestring-0.9.1.10Data.ByteStringemptyData.ByteString.Internal ByteStringData.ByteString.Char8 appendFile writeFilereadFile readIntegerreadIntunwordswordsunlineslinesunzipzipWithzipfindfilternotElemelemcount findIndices findIndex elemIndices elemIndexEnd elemIndexindexgroupBy splitWithsplitbreakEndspanEndspanbreak dropWhile takeWhileunfoldrNunfoldr replicatescanr1scanrscanl1scanl mapAccumR mapAccumLminimummaximumallany concatMapfoldr1'foldr1foldl1'foldl1foldr'foldrfoldl'foldl interspersemaplastheadunconssnocconsunpackpack singletoninteract getContents hGetContentshGetNonBlockinghGetputStrLnputStr hPutStrLnhPutStrhPuthGetLinegetLinecopypackCStringLen packCStringuseAsCStringLen useAsCStringsorttailsinitsfindSubstrings findSubstringbreakSubstring isInfixOf isSuffixOf isPrefixOf intercalategroupsplitAtdroptakeconcat transposereverseappendinittaillengthnull WebContextTrunWebContextTContext contextStack waitingStates branchContext uniqueBucket countBucketlogLevel httpDelay httpUserAgent defaultOutput StateNumberValidSeenCounter SeenCounterCounterLogLevelNormalVerboseQuiet EvalStateBlobTextNodeBinBlob sourcePathblobData NodeContextparentsthisrootRef HistoryPathMutableHistoryImmutableHistory WebContext WebCrawler^+baseGHC.Base++^: repurposeNode emptyContext setHttpDelay getHttpDelay setOutput getOutput setUserAgent getUserAgent setLogLevel isVerboseaccumulateCurrentStatepopAccumulationpushCurrentStatepopCurrentStateevalWithEmptyContext prepareLogger recordNode popLastRecordpushToBranchContextpopBranchContextaddToBranchContextsetBucketCountincrementGetRangeCounterhasResourceBeenVisitedsetResourceVisiteddebugLog textOutput ActionFuncM ActionFunc ActionValue ATypeErrorAStringABoolAInttoNumtoString funToString trimStringformat Data.MaybeNothing formatString dropPrefix substitutesubstituteFunc funcSysCallGHC.ShowShow toRezPath combinePathextractFileNamedumpResourcePath UnionNode UnionRight UnionLeft PartialGraph dummyElemisResourceParseable parseResource parseUnionloadData$fPartialGraphUnionNoderezPath DirectoryNodeFile DirectoryFullPathFileName extractPathbuildParentListtoDirectoryNodecurrentDirectoryNode listDirectory'$fPartialGraphDirectoryNodeResourcePathJsonNode parseJsonloadJson HaXmLNode pureChildren parserOfKindloadHtml. GHC.Classes||&&/===>=><=<GHC.RealdivGHC.Num*-+simplifySortedNodeRangessetUniqueIndicessetRangeIndicesParsed parsec-3.1.1Text.Parsec.PrimParsec reservedOpnatural stringLiteralparensbrackets whiteSpacelexerwebrexpCombinator operatorDefs functionMap noderange rangeParser webrexpOp repeatCountrepeatOperatorwebidentwebrefopwebref actionCall actionTerm actionExpr actionListwebrexp exprUnionexprPathexptermspaceSurroundedbinaryprefixpostfixbinArithintOnly stringOnlystringPredicateintCompbinCompboolCompisActionResultValid dumpActionVal dumpContent actionFunEvalactionFunEvalM searchRefIn downLinks diggLinkssiblingAccessor StateIndex FirstStateFreeIdStateListBuilderAutomata autoStates beginState AutomataState AutoStateAutomataActionGatherScatter AutoSimpleAutoTruePopPopPushPush AutomataSink nodeCount buildAutomata dumpAutomata toAutomataevalDepthFirstevalAutomataDFSscheduleNextElement evalStateDFSevalBreadthFirstevalAutomataBFS evalStateBFS filterNodesCrawled CrawledNode initialStateevalWebRexpWithEvaluator