Зн]╦<■Ъ" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐░▒▓⌠■∙√≈≤≥ ⌡°²·÷═║╒ёє╔ії╗╘ ╙ ╚ ╛ ґ ў ╞ ╟ ╠ ╡ Ё Є ╣ І Ї ╦ ╧ ╨ ╩ ╪ Ґ Ў © ю а б ц д е ф г х и й к л м н о п я р с т у ж в ь ы з ш э щ ч ъ Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О П Я Р С Т У Ж В Ь Ы З Ш Э Щ Ч Ъ ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ ─ │ ┌ ┐ └ ┘ ├ ┤ ┬ ┴ ┼ ▀ ▄ █ ▌ ▐ ░ ▒ ▓ ⌠ ■ ∙ √ ≈ ≤ ≥ ⌡ ° ² · ÷ ═ ║ ╒ ё є ╔ і ї╗╘╙╚╛ґў╞╟╠ ╡ Ё Є ╣ І Ї ╦ ╧ ╨ ╩ ╪ Ґ Ў © ю абцдефгхийклмнопярстужвьызшэщчъЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ !None @Defaulting Map; a Map that returns a default value when queried for a key that does not exist. Create an empty BQuery the map for a value. Returns the default if the key is not found. Create a " from a default value and a list. Access the keys as a list. )Access the non-default values as a list. )Map a function over the values in a map. !Fold over the values in the map. Note that this *does* not fold B over the default value -- this fold behaves in the same way as a standard <Compute the union of two maps using the specified per-value ? combination function and the specified new map default value. "Combine values with this function The new map's default value The first map to combine The second map to combine "#$ "#$None8Path to the directory containing all the PLUG archives. None6Boolean type to indicate case sensitivity for textual comparisons. Just a handy alias for Text %&%&NoneA fallback POS tag instance. A fall-back " instance, analogous to The class of POS Tags. CWe use a typeclass here because POS tags just need a few things in A excess of equality (they also need to be serializable and human E readable). Passing around all the constraints everywhere becomes a hassle, and it'4s handy to have a uniform interface to the diferent kinds of tag types. :This typeclass also allows for corpus-specific tags to be E distinguished; They have different semantics, so they should not be C merged. That said, if you wish to create a unifying POS Tag set, C and mappings into that set, you can use the type system to ensure that that is done correctly. This may+ get renamed to POSTag at some later date. !$Check if a tag is a determiner tag. ",The class of things that can be regarded as chunks ; Chunk tags @ are much like POS tags, but should not be confused. Generally, E chunks distinguish between different phrasal categories (e.g.; Noun 6 Phrases, Verb Phrases, Prepositional Phrases, etc..) &?The class of named entity sets. This typeclass can be defined 6 entirely in terms of the required class constraints. '"Tag instance for unknown tagsets. !"#$%&'((')*+ !"#$%&'(&'("#$% ! !"#$%&'((')*+None)Raw tokenized text. ) has a , instance to simplify use. +A POS-tagged token. /?A tagged sentence has POS Tags. Generated by a part-of-speech ? tagger. (tagger :: Tag tag => Sentence -> TaggedSentence tag) 13A Chunk that strictly contains chunks or POS tags. 3BA data type to represent the portions of a parse tree for Chunks. B Note that this part of the parse tree could be a POS tag with no chunk. 6?A chunked sentence has POS tags and chunk tags. Generated by a chunker. W(chunker :: (Chunk chunk, Tag tag) => TaggedSentence tag -> ChunkedSentence chunk tag) 8@A sentence of tokens without tags. Generated by the tokenizer. ! (tokenizer :: Text -> Sentence) :Extract the token list from a 8 ;Apply a parallel list of s to a 8. =AGenerate a Text representation of a TaggedSentence in the common tagged format, eg: "the/at dog/nn jumped/vbd ./." >'Remove the tags from a tagged sentence ?>Extract the tags from a tagged sentence, returning a parallel 2 list of tags along with the underlying Sentence. A>Combine the results of POS taggers, using the second param to fill in entries, where possible. BMerge /* values, preffering the tags in the first /. Delegates to C. C-Returns the first param, unless it is tagged . - Throws an error if the text does not match. DHelper to create 3 types. EHelper to create 3' types that just hold POS tagged data. F%Show the underlying text token only. HShow the text and tag. IExtract the text of a ) J'Extract the last three characters of a ), if the token is 5 long enough, otherwise returns the full token text. KExtract the list of + tags from a / LCalculate the length of a / (in terms of the number of tokens). MBrutally concatenate two /s N@True if the input sentence contains the given text token. Does C not do partial or approximate matching, and compares details in a fully case-sensitive manner. O7True if the input sentence contains the given POS tag. 8 Does not do partial matching (such as prefix matching) P6Compare the POS-tag token with a supplied tag string. Q1Compare the POS-tagged token with a text string. R$Compare a token with a text string. 2)*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR-./01234*)*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR*89:;6735412</0=>?@ABCDE+,-.FGH)*IJKLMNOPQR()*+,-./0123546789:;<=>?@ABCDEFGHIJKLMNOPQR-./01234NoneS,Data type to indicate IOB tags for chunking TNot in a chunk. U In chunk tag VBeging marker. [ Turn an IOB result into a tree. \Parse an IOB-encoded corpus. STUVWXYZ[\]^5STUVWXYZ[\]^SVUTWXYZ[\]^ SVUTWXYZ[\]^5None_`_``__` Safe-Inferred 6789:;<=>?@AB7=>?@AB 6789:;<=>?@ABNone7a?These tags may actually be the Penn Treebank tags. But I have = not (yet?) seen the punctuation tags added to the Penn set. 5This particular list was complied from the union of: U All tags used on the Conll2000 training corpus. (contributing the punctuation tags) ) * The PennTreebank tags, listed here: Khttps://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html6 (which contributed LS over the items in the corpus). ? * The tags: START, END, and Unk, which are used by Chatter. c Wh-adverb dPossessive wh-pronoun eWh-pronoun fWh-determiner g"Verb, 3rd person singular present h&Verb, non-3rd person singular present iVerb, past participle j#Verb, gerund or present participle kVerb, past tense lVerb, base form m Interjection nto oSymbol p Particle qAdverb, superlative rAdverb, comparative sAdverb tPossessive pronoun uPersonal pronoun vPossessive ending wPredeterminer xProper noun, plural yProper noun, singular z Noun, plural {Noun, singular or mass |Modal }List item marker ~Adjective, superlative Adjective, comparative ─ Adjective │)Preposition or subordinating conjunction ┌ Foreign word ┐Existential there └Determiner ┘Cardinal number ├Coordinating conjunction ┤: ┬. Sentence Terminator ┴, ┼) ▀( ▄`` █'' ▌$ ▐# ░END tag, used in training. ▒START tag, used in training. ▓.Phrase chunk tags defined for the Conll task. ⌠out; not a chunk. ■ Verb Phrase. ≤Prepositional Phrase. ≥ Noun Phrase. ÷9Named entity categories defined for the Conll 2003 task. ╔?Order matters here: The patterns are replaced in reverse order B when generating tags, and in top-to-bottom when generating tags. Qabcdefghijklmnopqrstuvwxyz{|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐░▒▓⌠■∙√≈≤≥ ⌡°²·÷═║╒ёє╔ії╗CDEFGHIJKHabcdefghijklmnopqrstuvwxyz{|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐░▒▓⌠■∙√≈≤≥ ⌡°²·÷═║╒ёє╔ії╗H÷ё╒║═▓·²°⌡ ≥≤≈√∙■⌠є╔ії╗a▒░▐▌█▄▀┼┴┬┤├┘└┐┌│─~}|{zyxwvutsrqponmlkjihgfedcba0▒░▐▌█▄▀┼┴┬┤├┘└┐┌│─~}|{zyxwvutsrqponmlkjihgfedcb▓·²°⌡ ≥≤≈√∙■⌠÷ё╒║═є╔ії╗CDEFGHIJK NoneК╙ Unknown. ╚'WH-adverb + modal auxillary e.g.; where'd ╛!WH-adverb + preposition e.g.; why'n ґWH-adverb + verb to do, present tense, 3rd person singular e.g.; how's ўWH-adverb + verb to do, past tense, negated e.g.; whyn't ╞WH-adverb + verb to do, past tense e.g.; where'd how'd ╟WH-adverb + verb to do, present, not 3rd person singular e.g.; howda ╠WH-adverb + verb to be, present, 3rd person singular e.g.; how's where's ╡WH-adverb + verb to be, present, 2nd - person singular or all persons plural e.g.; where're Ё/WH-adverb e.g.; however when where why whereby 1 wherever how whenever whereon wherein wherewith $ wheare wherefore whereof howsabout ЄWH-qualifier e.g.; however how ╣)WH-pronoun, nominative + modal auxillary e.g.; who'll that'd who'd that'll ІWH-pronoun, nominative + verb to have, * present tense, 3rd person singular e.g.; who's that's ЇWH-pronoun, nominative + verb to have, past tense e.g.; who'd ╦WH-pronoun, nominative + verb to be, ( present, 3rd person singular e.g.; that's who's ╧.WH-pronoun, nominative e.g.; that who whoever whosoever what whatsoever ╨+WH-pronoun, accusative e.g.; whom that who ╩*WH-pronoun, genitive e.g.; whose whosever ╪WH-determiner + verb to have , present & tense, 3rd person singular e.g.; what's ҐWH-determiner + verb to do , past tense e.g.; what'd ЎWH-determiner + verb to do, & uninflected present tense + pronoun, & personal, nominative, not 3rd person singular e.g.; whaddya ©WH-determiner + verb to be , present & tense, 3rd person singular e.g.; what's юWH-determiner + verb to be, % present, 2nd person singular or all % persons plural + pronoun, personal, % nominative, not 3rd person singular e.g.; whaddya аWH-determiner + verb to be , present + tense, 2nd person singular or all persons plural e.g.; what're б2WH-determiner e.g.; which what whatever whichever whichever-the-hell ц/verb, present tense, 3rd person singular e.g.; 4 deserves believes receives takes goes expires says 3 opposes starts permits expects thinks faces votes 1 teaches holds calls fears spends collects backs - eliminates sets flies gives seeks reads ... д-verb, past participle + infinitival to e.g.; gotta е2verb, past participle e.g.; conducted charged won 5 received studied revised operated accepted combined / experienced recommended effected granted seen / protected adopted retarded notarized selected composed gotten printed ... ф*verb, present participle + infinitival to e.g.; gonna г)verb, present participle or gerund e.g.; 5 modernizing improving purchasing Purchasing lacking 3 enabling pricing keeping getting picking entering - voting warning making strengthening setting 0 neighboring attending participating moving ... х*verb, past tense e.g.; said produced took 1 recommended commented urged found added praised 2 charged listed became announced brought attended / wanted voted defeated received got stood shot $ scheduled feared promised made ... и,verb, base: uninflected present, imperative / or infinitive; hypenated pair e.g.; say-speak й,verb, base: uninflected present, imperative , or infinitive + infinitival to e.g.; wanta wanna к,verb, imperative + adverbial particle e.g.; g'ahn c'mon л+verb, uninflected present tense + pronoun, personal, accusative e.g.; let's lemme gimme м,verb, base: uninflected present, imperative * or infinitive + adjective e.g.; die-dead н,verb, base: uninflected present, imperative * or infinitive + preposition e.g.; lookit о.verb, base: uninflected present or infinitive + article e.g.; wanna п/verb, base: uninflected present, imperative or 4 infinitive e.g.; investigate find act follow inure 6 achieve reduce take remedy re-set distribute realize 7 disable feel receive continue place protect eliminate + elaborate work permit run enter force ... я4interjection e.g.; Hurrah bang whee hmpf ah goodbye 6 oops oh-the-pain-of-it ha crunch say oh why see well 7 hello lo alas tarantara rum-tum-tum gosh hell keerist Jesus Keeeerist boy c'mon 'mon goddamn bah hoo-pig damn ... р(infinitival to + verb, infinitive e.g.; t'jawn t'lah сinfinitival to e.g.; to t' т(adverb, particle + preposition e.g.; out'n outta у2adverb, particle e.g.; up out off down over on in about through across after ж%adverb, nominal e.g.; here afar then в,adverb, superlative e.g.; most best highest 5 uppermost nearest brightest hardest fastest deepest farthest loudest ... ь#adverb, comparative + conjunction, coordinating e.g.; more'n ы1adverb, comparative e.g.; further earlier better 5 later higher tougher more harder longer sooner less 5 faster easier louder farther oftener nearer cheaper 0 slower tighter lower worse heavier quicker ... з)adverb + conjunction, coordinating e.g.; well's soon's шadverb + verb to be, present tense, 3rd person singular e.g.; here's there's эadverb, genitive e.g.; else's щ4adverb e.g.; only often generally also nevertheless 3 upon together back newly no likely meanwhile near 2 then heavily there apparently yet outright fully 4 aside consistently specifically formally ever just ... ч*qualifier, post e.g.; indeed enough still 'nuff ъ4qualifier, pre e.g.; well less very most so real as / highly fundamentally even how much remarkably 5 somewhat more completely too thus ill deeply little 7 overly halfway almost impossibly far severly such ... Ю'pronoun, personal, nominative, not 3rd person singular + verb to verb, uninflected present tense e.g.; y'know А'pronoun, personal, nominative, not 3rd ) person singular + modal auxillary e.g.; you'll we'll I'll we'd I'd they'll they'd you'd Б'pronoun, personal, nominative, not 3rd person singular + verb to have , past tense e.g.; I'd you'd we'd they'd Ц'pronoun, personal, nominative, not 3rd person singular + verb to have, uninflected present tense e.g.; I've we've they've you've Д'pronoun, personal, nominative, not 3rd person singular + verb to be , present % tense, 3rd person singular, negated e.g.; taint Е'pronoun, personal, nominative, not 3rd person singular + verb to be , present % tense, 3rd person singular e.g.; you's Ф'pronoun, personal, nominative, not 3rd person singular + verb to be , present + tense, 2nd person singular or all persons plural e.g.; we're you're they're Г'pronoun, personal, nominative, not 3rd person singular + verb to be , present # tense, 1st person singular e.g.; I'm Ahm Х.pronoun, personal, nominative, not 3rd person ) singular e.g.; they we I you ye thou you'uns И*pronoun, personal, nominative, 3rd person $ singular + modal auxillary e.g.; he'll she'll it'll he'd it'd she'd Й*pronoun, personal, nominative, 3rd person singular + verb to have, present tense, 3rd person singular e.g.; it's he's she's К*pronoun, personal, nominative, 3rd person singular + verb to have, past tense e.g.; she'd he'd it'd Л*pronoun, personal, nominative, 3rd person singular + verb to be, present tense, 3rd person singular e.g.; it's he's she's М3pronoun, personal, nominative, 3rd person singular e.g.; it he she thee Н3pronoun, personal, accusative e.g.; them it him me us you 'em her thee we'uns О,pronoun, plural, reflexive e.g.; themselves ourselves yourselves П2pronoun, singular, reflexive e.g.; itself himself ) myself yourself herself oneself ownself Я(pronoun, possessive e.g.; ours mine his hers theirs yours Р)determiner, possessive e.g.; our its his & their my your her out thy mine thine С)pronoun, nominal + modal auxillary e.g.; someone'll somebody' ll anybody'd Тpronoun, nominal + verb to have , present ( tense, 3rd person singular e.g.; nobody's somebody's one's Уpronoun, nominal + verb to have, past tense e.g.; nobody'd Жpronoun, nominal + verb to be , present ) tense, 3rd person singular e.g.; nothing's everything' s somebody's nobody' s someone's В$pronoun, nominal, genitive e.g.; one's someone' s anybody's nobody's everybody's anyone' s everyone's Ь1pronoun, nominal e.g.; none something everything 6 one anyone nothing nobody everybody everyone anybody anything someone no-one nothin Ы3numeral, ordinal e.g.; first 13th third nineteenth 0 2d 61st second sixth eighth ninth twenty-first . eleventh 50th eighteenth- Thirty-ninth 72nd 1/20th 5 twentieth mid-19th thousandth 350th sixteenth 701st ... З.noun, plural, adverbial e.g.; Sundays Mondays % Saturdays Wednesdays Souths Fridays Ш,noun, singular, adverbial + modal auxillary e.g.; today'll Э*noun, singular, adverbial, genitive e.g.; Saturday's Monday's yesterday' s tonight's tomorrow's Sunday's Wednesday's Friday's today' s Tuesday's West's Today's South's Щ,noun, singular, adverbial e.g.; Friday home 4 Wednesday Tuesday Monday Sunday Thursday yesterday 4 tomorrow tonight West East Saturday west left east 4 downtown north northeast southeast northwest North South right ... Ч%noun, plural, proper, genitive e.g.; Republicans' Orioles' Birds' Yanks' Redbirds' Bucs' Yankees' Stevenses' Geraghtys' Burkes' Wackers' Achaeans' Dresbachs' Russians' Democrats' Gershwins' Adventists' Negroes' Catholics' ... Ъ,noun, plural, proper e.g.; Chases Aderholds . Chapelles Armisteads Lockies Carbones French 6 Marskmen Toppers Franciscans Romans Cadillacs Masons 4 Blacks Catholics British Dixiecrats Mississippians Congresses ... )noun, singular, proper + modal auxillary e.g.; Gyp'll John'll noun, singular, proper + verb to have, * present tense, 3rd person singular e.g.; Bill' s Guardino's Celie' s Skolman' s Crosson's Tim's Wally's noun, singular, proper + verb to be, , present tense, 3rd person singular e.g.; W.'s Ike's Mack's Jack's Kate's Katharine's Black's Arthur's Seaton' s Buckhorn's Breed's Penny's Rob's Kitty's Blackwell's Myra's Wally's Lucille' s Springfield's Arlene's 'noun, singular, proper, genitive e.g.; Green's Landis' Smith' s Carreon' s Allison's Boston's Spahn's Willie's Mickey's Milwaukee's Mays' Howsam's Mantle's Shaw's Wagner's Rickey's Shea's Palmer's Arnold' s Broglio's ... ,noun, singular, proper e.g.; Fulton Atlanta * September-October Durwood Pye Ivan Allen 7 Jr. Jan. Alpharetta Grady William B. Hartsfield Pearl 4 Williams Aug. Berry J. M. Cheshire Griffin Opelika Ala. E. Pelham Snodgrass ... -noun, plural, common + modal auxillary e.g.; duds'd oystchers'll %noun, plural, common, genitive e.g.; taxpayers' children' s members' States' women's cutters' motorists' steelmakers' hours' Nations' lawyers' prisoners' architects' tourists' Employers' secretaries' Rogues' ... *noun, plural, common e.g.; irregularities 5 presentments thanks reports voters laws legislators 4 years areas adjustments chambers $100 bonds courts 3 sales details raises sessions members congressmen votes polls calls ... .noun, singular, common, hyphenated pair e.g.; stomach-belly )noun, singular, common + modal auxillary e.g.; cowhand'd sun'll +noun, singular, common + preposition e.g.; buncha noun, singular, common + verb to have, - present tense, 3rd person singular e.g.; guy's Knife's boat's summer's rain' s company's noun, singular, common + verb to have, past tense e.g.; Pa'd noun, singular, common + verb to be, * present tense, 3rd person singular e.g.; water's camera's sky's kid's Pa's heat's throat's father's money's undersecretary's granite's level's wife's fat's Knife's fire's name's hell's leg's sun' s roulette's cane's guy's kind' s baseball's ... 'noun, singular, common, genitive e.g.; season's world's player's night' s chapter's golf' s football' s baseball's club's U.'s coach's bride's bridegroom's board's county's firm' s company's superintendent's mob's Navy's ... 2noun, singular, common e.g.; failure burden court 6 fire appointment awarding compensation Mayor interim 7 committee fact effect airport management surveillance . jail doctor intern extern night weekend duty legislation Tax Office ... .modal auxillary + infinitival to e.g.; oughta %modal auxillary + pronoun, personal, + nominative, not 3rd person singular e.g.; willya modal auxillary + verb to have, uninflected & form e.g.; shouldda musta coulda must' ve woulda could've ,modal auxillary, negated e.g.; cannot couldn't wouldn't can't won' t shouldn't shan't mustn't musn't 2modal auxillary e.g.; should may might will would & must can could shall ought need wilt 2adjective, superlative e.g.; best largest coolest 5 calmest latest greatest earliest simplest strongest 5 newest fiercest unhappiest worst youngest worthiest 0 fastest hottest fittest lowest finest smallest staunchest ... .adjective, semantically superlative e.g.; top 3 chief principal northernmost master key head main 4 tops utmost innermost foremost uppermost paramount topmost ,adjective + conjunction, coordinating e.g.; lighter'n 3adjective, comparative e.g.; greater older further 1 earlier later freer franker wider better deeper 3 firmer tougher faster higher bigger worse younger 5 lighter nicer slower happier frothier Greater newer Elder ... +adjective, hyphenated pair e.g.; big-large long-far adjective, genitive e.g.; Great's )adjective e.g.; recent over-all possible 7 hard-fought favorable hard meager fit such widespread . outmoded inadequate ambiguous grand clerical * effective orderly federal foster general proportionate ... ,preposition + pronoun, personal, accusative e.g.; t'hi-im $preposition, hyphenated pair e.g.; f'ovuh 1preposition e.g.; of in for by considering to on 7 among at through with under into regarding than since 6 despite according per before toward against as after 7 during including between without except upon out over ... verb to have, present tense, 3rd person singular, negated e.g.; hasn't ain't verb to have%, present tense, 3rd person singular e.g.; has hath !verb to have, past participle e.g.; had "verb to have%, present participle or gerund e.g.; having #verb to have, past tense, negated e.g.; hadn't $verb to have, past tense e.g.; had %verb to have, uninflected present tense + infinitival to e.g.; hafta &verb to have, uninflected present tense or imperative, negated e.g.; haven't ain't 'verb to have, uninflected present tense, * infinitive or imperative e.g.; have hast (/foreign word: WH-pronoun, nominative e.g.; qui )+foreign word: WH-pronoun, accusative e.g.; quibusdam */foreign word: WH-determiner e.g.; quo qua quod que quok +.foreign word: verb, present tense, 3rd person . singular e.g.; gouverne sinkt sigue diapiace ,.foreign word: verb, past participle e.g.; vue # verstrichen rasa verboten engages -*foreign word: verb, present participle or & gerund e.g.; nolens volens appellant * seq. obliterans servanda dicendi delenda .,foreign word: verb, past tense e.g.; stabat peccavi audivi /+foreign word: verb, present tense, not 3rd 1 person singular, imperative or infinitive e.g.; 0 nolo contendere vive fermate faciunt esse vade , noli tangere dites duces meminisse iuvabit gosaimasu voulez habla ksuu'peliafo lacheln " miuchi say allons strafe portant 0/foreign word: interjection e.g.; sayonara bien 1 adieu arigato bonjour adios bueno tchalo ciao o 1%foreign word: infinitival to + verb, infinitive e.g.; d'entretenir 2$foreign word: adverb + conjunction, coordinating e.g.; forisque 3-foreign word: adverb e.g.; bas assai deja um 4 wiederum cito velociter vielleicht simpliciter non / zu domi nuper sic forsan olim oui semper tout despues hors 4$foreign word: qualifier e.g.; minus 5!foreign word: pronoun, personal, , nominative, not 3rd person singular + verb to have , present tense, not 3rd person singular e.g.; j'ai 6-foreign word: pronoun, personal, nominative, / not 3rd person singular e.g.; ich vous sie je 7-foreign word: pronoun, personal, nominative, 3rd person singular e.g.; il 8!foreign word: pronoun, personal, , accusative + preposition e.g.; mecum tecum 9/pronoun, personal, accusative e.g.; lui me moi mi :!foreign word: pronoun, singular, & reflexive + verb, present tense, 3rd person singular e.g.; s'excuse s'accuse ;+foreign word: pronoun, singular, reflexive e.g.; se <+foreign word: determiner, possessive e.g.; mea mon deras vos =)foreign word: pronoun, nominal e.g.; hoc >-foreign word: numeral, ordinal e.g.; 18e 17e quintus ?.foreign word: noun, singular, adverbial e.g.; heute morgen aujourd'hui hoy @)foreign word: noun, plural, proper e.g.; Svenskarna Atlantes Dieux A+foreign word: noun, singular, proper e.g.; 1 Karshilama Dieu Rundfunk Afrique Espanol Afrika Spagna Gott Carthago deus B,foreign word: noun, plural, common e.g.; al , culpas vopos boites haflis kolkhozes augen 0 tyrannis alpha-beta-gammas metis banditos rata 0 phis negociants crus Einsatzkommandos kamikaze 3 wohaws sabinas zorrillas palazzi engages coureurs # corroborees yori Ubermenschen ... C&foreign word: noun, singular, common, ) genitive e.g.; corporis intellectus arte's & dei aeternitatis senioritatis curiae patronne' s chambre's D+foreign word: noun, singular, common e.g.; . ballet esprit ersatz mano chatte goutte sang 3 Fledermaus oud def kolkhoz roi troika canto boite 2 blutwurst carne muzyka bonheur monde piece force ... E+foreign word: adjective, superlative e.g.; optimo F+foreign word: adjective, comparative e.g.; fortiori G-foreign word: adjective e.g.; avant Espagnol 3 sinfonica Siciliana Philharmonique grand publique + haute noire bouffe Douce meme humaine bel - serieuses royaux anticus presto Sovietskaya " Bayerische comique schwarzen ... H"foreign word: preposition + noun, singular, proper e.g.; d'Yquem d'Eiffel I"foreign word: preposition + noun, singular, common e.g.; d'etat d'hotel d'argent d' identite d'art J*foreign word: preposition + article e.g.; della des du aux zur d'un del dell' K/foreign word: preposition e.g.; ad de en a par 2 con dans ex von auf super post sine sur sub avec # per inter sans pour pendant in di Lforeign word: verb to have, present tense, not 3rd person singular e.g.; habe Mforeign word: determiner/pronoun, plural e.g.; haec N foreign word: determiner + verb to be, * present tense, 3rd person singular e.g.; c'est Oforeign word: determiner/pronoun, singular e.g.; hoc P/foreign word: conjunction, subordinating e.g.; bevor quam ma Q/foreign word: numeral, cardinal e.g.; une cinq deux sieben unam zwei R1foreign word: conjunction, coordinating e.g.; et ma mais und aber och nec y Sforeign word: verb to be, present tense, 3rd person singular e.g.; ist est Tforeign word: verb to be, present tense, 2nd 2 person singular or all persons plural e.g.; sind sunt etes Uforeign word: verb to be, infinitive or imperative e.g.; sit V(foreign word: article + noun, singular, proper e.g.; L'Astree L' Imperiale W(foreign word: article + noun, singular, common e.g.; l'orchestre l' identite l'arcade l'ange l'assistance l' activite L'Universite l'independance L'Union L'Unita l'osservatore X0foreign word: article e.g.; la le el un die der ein keine eine das las les Il Y'foreign word: negator e.g.; pas non ne Z*existential there + modal auxillary e.g.; there'll there'd [existential there + verb to have , present ' tense, 3rd person singular e.g.; there's \existential there + verb to have, past tense e.g.; there'd ]existential there + verb to be , present ' tense, 3rd person singular e.g.; there's ^existential there e.g.; there _0determiner, pronoun or double conjunction e.g.; neither either one `pronoun, plural + verb to be , present & tense, 3rd person singular e.g.; them's a determiner/'pronoun, plural e.g.; these those them b determiner/&pronoun, singular or plural e.g.; any some c determiner/ pronoun + modal auxillary e.g.; that'll this'll d determiner/pronoun + verb to be , present & tense, 3rd person singular e.g.; that's e determiner/"pronoun, singular, genitive e.g.; another's f determiner/"pronoun, singular e.g.; this each another that 'nother gverb to do, present tense, 3rd person singular, negated e.g.; doesn't don't hverb to do%, present tense, 3rd person singular e.g.; does iverb to do , past tense, negated e.g.; didn't jverb to do, past tense e.g.; did done kverb to do, past or present tense + / pronoun, personal, nominative, not 3rd person singular e.g.; d'you lverb to do, uninflected present tense or imperative, negated e.g.; don't mverb to do(, uninflected present tense, infinitive or imperative e.g.; do dost n/conjunction, subordinating e.g.; that as after 5 whether before while like because if since for than 2 altho until so unless though providing once lest sposin% till whereas whereupon supposing tho' albeit then so's 'fore o&numeral, cardinal, genitive e.g.; 1960's 1961's .404's p4numeral, cardinal e.g.; two one 1 four 2 1913 71 74 6 637 1937 8 five three million 87-31 29-5 seven 1,119 6 fifty-three 7.5 billion hundred 125,000 1,700 60 100 six ... q0conjunction, coordinating e.g.; and or but plus & either neither nor yet n and/or minus an' rverb to be, present tense, 3rd person singular, negated e.g.; isn't ain't sverb to be%, present tense, 3rd person singular e.g.; is tverb to be, present tense, 2nd person / singular or all persons plural, negated e.g.; aren't ain't uverb to be%, present tense, 2nd person singular % or all persons plural e.g.; are art vverb to be, past participle e.g.; been wverb to be, present tense, 1st person singular, negated e.g.; ain't xverb to be%, present tense, 1st person singular e.g.; am yverb to be%, present participle or gerund e.g.; being zverb to be!, past tense, 1st and 3rd person singular, negated e.g.; wasn't {verb to be!, past tense, 1st and 3rd person singular e.g.; was |verb to be", past tense, 2nd person singular + or all persons plural, negated e.g.; weren't }verb to be%, past tense, 2nd person singular or all persons plural e.g.; were ~verb to be$, infinitive or imperative e.g.; be "article e.g.; the an no a every th' ever' ye ─ determiner/pronoun, post-determiner, ! hyphenated pair e.g.; many-much │ determiner/#pronoun, post-determiner, genitive e.g.; other's ┌ determiner/)pronoun, post-determiner many other next 3 more last former little several enough most least 7 only very few fewer past same Last latter less single plenty '#nough lesser certain various manye 4 next-to-last particular final previous present nuf ┐ determiner/pronoun, double conjunction or pre-quantifier both └ determiner/'pronoun, pre-quantifier e.g.; all half many nary ┘ determiner/(pronoun, pre-qualifier e.g.; quite such rather ├: ┤. Sentence Terminator ┬ ┴, ┼, not n't ▀) ▄( █END tag, used in training. ▌START tag, used in training. ░Out not a chunk. ▒Clause. ▓Prepositional Phrase. ⌠ Verb Phrase. ■ Noun Phrase. L?Order matters here: The patterns are replaced in reverse order B when generating tags, and in top-to-bottom when generating tags. В╘╙╚╛ґў╞╟╠╡ЁЄ╣ІЇ╦╧╨╩╪ҐЎ©юабцдефгхийклмнопярстужвьызшэщчъЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐░▒▓⌠■MLNOPQRSTUVЛ╘╙╚╛ґў╞╟╠╡ЁЄ╣ІЇ╦╧╨╩╪ҐЎ©юабцдефгхийклмнопярстужвьызшэщчъЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐░▒▓⌠■Л╘▌█▄▀┼┴┬┤├┘└┐┌│─~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! ЪЧЩЭШЗЫЬВЖУТСРЯПОНМЛКЙИХГФЕДЦБАЮъчщэшзыьвжутсряпонмлкйихгфедцбаю©ЎҐ╪╩╨╧╦ЇІ╣ЄЁ╡╠╟╞ўґ╛╚╙▐■⌠▓▒░ ╘Е▌█▄▀┼┴┬┤├┘└┐┌│─~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! ЪЧЩЭШЗЫЬВЖУТСРЯПОНМЛКЙИХГФЕДЦБАЮъчщэшзыьвжутсряпонмлкйихгфедцбаю©ЎҐ╪╩╨╧╦ЇІ╣ЄЁ╡╠╟╞ўґ╛╚╙▐■⌠▓▒░MLNOPQRSTUV None∙Document corpus. DThis is a simple hashed corpus, the document content is not stored. ≈'The number of documents in the corpus. ≤:A count of the number of documents each term occurred in. ≥-Part of Speech tagger, with back-off tagger. <A sequence of pos taggers can be assembled by using backoff D taggers. When tagging text, the first tagger is run on the input, * possibly tagging some tokens as unknown ('Tag Unk'). The first C backoff tagger is then recursively invoked on the text to fill in D the unknown tags, but that may still leave some tokens marked with 'Tag Unk'9. This process repeats until no more taggers are found. ; (The current implementation is not very efficient in this respect.). @Back off taggers are particularly useful when there is a set of ? domain specific vernacular that a general purpose statistical B tagger does not know of. A LitteralTagger can be created to map D terms to fixed POS tags, and then delegate the bulk of the text to @ a statistical back off tagger, such as an AvgPerceptronTagger. ≥4 values can be serialized and deserialized by using and NLP.POS.deserialize`. This is a bit tricky D because the POSTagger abstracts away the implementation details of E the particular tagging algorithm, and the model for that tagger (if D any). To support serialization, each POSTagger value must provide 2 a serialize value that can be used to generate a W = representation of the model, as well as a unique id (also a W,). Furthermore, that ID must be added to a `Map < ByteString (ByteString -> Maybe POSTagger -> Either String POSTagger)` that is provided to deserialize. The function in the map takes the output of ═, and possibly a backoff = tagger, and reconstitutes the POSTagger that was serialized A (assigning the proper functions, setting up closures as needed, etc.) Look at the source for and for examples. ⌡#The initial part-of-speech tagger. °5Training function to train the immediate POS tagger. ²&A tagger to invoke on unknown tokens. ·A tokenizer; ( will work.) ÷4A sentence splitter. If your input is formatted as ! one sentence per line, then use , otherwise try Erik Kow's fullstop library. ═Store this POS tagger to a bytestring. This does not serialize the backoff taggers. ║#A unique id that will identify the + algorithm used for this POS Tagger. This is used in deserialization ╒5Get the number of documents that a term occurred in. ёAdd a document to the corpus. 9This can be dangerous if the documents are pre-processed < differently. All corpus-related functions assume that the E documents have all been tokenized and the tokens normalized, in the same way. є9Create a corpus from a list of documents, represented by normalized tokens. ∙√≈≤≥ ⌡°²·÷═║╒ёє╔іXYZT !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR∙√≈≤≥ ⌡°²·÷═║╒ёє╔і≥ ⌡°²·÷═║∙√≈≤╒ёє╔і ∙√≈≤≥ ⌡°²·÷═║╒ёє╔іXYZNone╗ACreate a Literal Tagger using the specified back-off tagger as a ! fall-back, if one is specified. 'This uses a tokenizer adapted from the tokenize package for a tokenizer, and Erik Kow',s fullstop sentence segmenter as a sentence splitter. ╘ACreate a tokenizer that protects the provided terms (to tokenize multi-word terms) ╛Adeserialization for Literal Taggers. The serialization logic is B in the posSerialize record of the POSTagger created in mkTagger. ї╗[╘╙╚╛ ї╗╘╙╚╛ ╙╚╗ї╛╘ї╗[╘╙╚╛None╞1Create an unambiguous tagger, using the supplied \ as a source of tags. ╟(Trainer method for unambiguous taggers. ґў╞╟ґў╞╟ґў╞╟ґў╞╟ None ╠The perceptron model. Ё9Each feature gets its own weight vector, so weights is a dict-of-dicts Є9The accumulated values, for the averaging. These will be keyed by feature/clas tuples ╣?The last time the feature was changed, for the averaging. Also keyed by feature/clas tuples # (tstamps is short for timestamps) ІNumber of instances seen ЇATypedef for doubles to make the code easier to read, and to make % this simple to change if necessary. ╦>The classes that the perceptron assigns are represnted with a newtype-wrapped String. CEventually, I think this should become a typeclass, so the classes C can be defined by the users of the Perceptron (such as custom POS % tag ADTs, or more complex classes). ╪-An empty perceptron, used to start training. Ґ(Predict a class given a feature vector. Ported from python: def predict(self, features): S '''Dot-product the features and current weights and return the best label.''' ! scores = defaultdict(float) * for feat, value in features.items(): 4 if feat not in self.weights or value == 0: continue & weights = self.weights[feat] / for label, weight in weights.items(): - scores[label] += value * weight 5 # Do a secondary alphabetic sort, for stability H return max(self.classes, key=lambda label: (scores[label], label)) Ў*Update the perceptron with a new example. & update(self, truth, guess, features) ... self.i += 1 if truth == guess: return None for f in features: m weights = self.weights.setdefault(f, {}) -- setdefault is Map.findWithDefault, and destructive. > upd_feat(truth, f, weights.get(truth, 0.0), 1.0) ? upd_feat(guess, f, weights.get(guess, 0.0), -1.0) return None ]ported from python: + def update(self, truth, guess, features): * '''Update the feature weights.''' " def upd_feat(c, f, w, v): param = (f, c) F self._totals[param] += (self.i - self._tstamps[param]) * w ) self._tstamps[param] = self.i & self.weights[f][c] = w + v ©Average the weights Ported from Python: def average_weights(self): 0 for feat, weights in self.weights.items(): new_feat_weights = {} . for clas, weight in weights.items(): " param = (feat, clas) ) total = self._totals[param] ? total += (self.i - self._tstamps[param]) * weight 8 averaged = round(total / float(self.i), 3) if averaged: 3 new_feat_weights[clas] = averaged / self.weights[feat] = new_feat_weights return None ^8round a fractional number to a specified decimal place. roundTo 2 3.14593.15╠╡ЁЄ╣ІЇ╦╧╨╩_╪`abcҐЎ]©^юdefgh╠╡ЁЄ╣ІЇ╦╧╨╩╪ҐЎ©ю╠╡ЁЄ╣І╦╧Ї╨╩╪ҐюЎ©╠╡ЁЄ╣ІЇ╦╧╨╩_╪`abcҐЎ]©^юdefghNoneа7The type of Chunkers, incorporates chunking, training, 3 serilazitaion and unique IDs for deserialization. г+The unique ID for this implementation of a а хdeserialize an AvgPerceptronChunker from a W. иCreate a chunker from a ╠. й>Chunk a list of POS-tagged sentence, generating a parse tree. к$Chunk a single POS-tagged sentence. i Turn an IOB result into a tree. jECopied directly from the AvgPerceptronTagger; should be generalized? k;start markers to ensure all features in context are valid, even for the first real tokens. l7end markers to ensure all features are valid, even for the last real tokens. mTrain on one sentence. абцдефгхnийкoipл1The number of times to iterate over the training 1 data, randomly shuffling after each iteration. (5 is a reasonable choice.) The ╠ to train. The training data. (A list of [(Text, Tag)]'s) $A trained perceptron. IO is needed for randomization. qjklmr0The full sentence that this word is located in. The index of the current word. The current word/ tag pair. *The predicted class of the previous word. stuабцдефгхийклилйкабцдефгхабцдефгхnийкoipлqjklmrstuNoneя7An efficient (ish) representation for documents in the bag of words sense. с'Make a document from a list of tokens. тDAccess the underlying DefaultMap used to store term vector details. уGenerate a я from a tokenized document. ж*Invokes similarity on full strings, using v for @ tokenization, and no stemming. The return value will be in the range [0, 1] 5There *must* be at least one document in the corpus. в)Determine how similar two documents are. DThis function assumes that each document has been tokenized and (if desired) stemmed/case-normalized. This is a wrapper around ь#, which is a *much* more efficient C implementation. If you need to run similarity against any single 1 document more than once, then you should create яs for each of your documents and use ь instead of в. +The return value will be in the range [0, 1]. 5There *must* be at least one document in the corpus. ь)Determine how similar two documents are. @Calculates the similarity between two documents, represented as TermVectors', returning a double in the range [0, 1] where 1 represents most similar. ы6Return the raw frequency of a term in a body of text. AThe firt argument is the term to find, the second is a tokenized E document. This function does not do any stemming or additional text modification. з*Calculate the inverse document frequency. BThe IDF is, roughly speaking, a measure of how popular a term is. ш?Calculate the tf*idf measure for a term given a document and a corpus. щ>Add two term vectors. When a term is added, its value in each vector is used (or that vector'$s default value is used if the term D is absent from the vector). The new term vector resulting from the / addition always uses a default value of zero. чA zero vector term vector (i.e. addVector v zeroVector = v). ъNegate a term vector. ЮAdd a list of term vectors. А%Calculate the magnitude of a vector. Б%find the dot product of two vectors. мнопярстужвьызшэщчъЮАБЦДwxyмнопярстужвьызшэщчъЮАБЦДярмнопстужвьызшэщчъЮАБЦДмнопярстужвьызшэщчъЮАБЦДwxyNoneФ'Consume a token with the given POS Tag Х7Text equality matching with optional case sensitivity. И7Consume a token with the given lexical representation. Й!Consume any one non-empty token. ЛASkips any number of fill tokens, ending with the end parser, and # returning the last parsed result. %This is useful when you know what you're looking for and (for instance) don't care what comes first. ЕФГХИЙКЛz{ЕФГХИЙКЛЕФГХИЙКЛ ЕФГХИЙКЛz{NoneМ.Find a clause in a larger collection of text. A clause is defined by the Н! extractor, and is a Noun Phrase ) followed (immediately) by a Verb Phrase =findClause skips over leading tokens, if needed, to locate a clause. Н-Find a Noun Phrase followed by a Verb Phrase МНОПЯМНОПЯМНОПЯМНОПЯNoneР;Read a POS-tagged corpus out of a Text string of the form: token/tag token/tag... %readPOS "Dear/jj Sirs/nns :/: Let/vb"5[("Dear",JJ),("Sirs",NNS),(":",Other ":"),("Let",VB)]Т@Returns all but the last element of a string, unless the string 1 is empty, in which case it returns that string. РСТРСТРСТРСТNoneВBCreate an Averaged Perceptron Tagger using the specified back-off - tagger as a fall-back, if one is specified. 'This uses a tokenizer adapted from the _ package for a tokenizer, and Erik Kow's fullstop sentence segmenter (+http://hackage.haskell.org/package/fullstop) as a sentence splitter. ЬTrain a new ╠. =The training corpus should be a collection of sentences, one B sentence on each line, and with each token tagged with a part of speech. For example, the input: ; "The/DT dog/NN jumped/VB ./.\nThe/DT cat/NN slept/VB ./." defines two training sentences. Btagger <- trainNew "Dear/jj Sirs/nns :/: Let/vb\nUs/nn begin/vb\n"-tag tagger $ map T.words $ T.lines "Dear sir""Dear/jj Sirs/nns :/: Let/vb"ЫTrain a new ╠ on a corpus of files. З'Add training examples to a perceptron. Otagger <- train emptyPerceptron "Dear/jj Sirs/nns :/: Let/vb\nUs/nn begin/vb\n"-tag tagger $ map T.words $ T.lines "Dear sir""Dear/jj Sirs/nns :/: Let/vb"If you'=re using multiple input files, this can be useful to improve < performance (by folding over the files). For example, see Ы |;start markers to ensure all features in context are valid, even for the first real tokens. }7end markers to ensure all features are valid, even for the last real tokens. Ш)Tag a document (represented as a list of 8 s) with a trained ╠ Ported from Python: ' def tag(self, corpus, tokenize=True): # '''Tags a string `corpus`.''' P # Assume untokenized corpus has \n between sentences and ' ' between words K s_split = nltk.sent_tokenize if tokenize else lambda t: t.split('\n') G w_split = nltk.word_tokenize if tokenize else lambda s: s.split() def split_sents(corpus): # for s in s_split(corpus): yield w_split(s) prev, prev2 = self.START tokens = [] ' for words in split_sents(corpus): O context = self.START + [self._normalize(w) for w in words] + self.END * for i, word in enumerate(words): * tag = self.tagdict.get(word) if not tag: N features = self._get_features(i, word, context, prev, prev2) 4 tag = self.model.predict(features) ( tokens.append((word, tag)) prev2 = prev prev = tag return tokens ЭTag a single sentence. ЩTrain a model from sentences. Ported from Python: 7 def train(self, sentences, save_loc=None, nr_iter=5): # self._make_tagdict(sentences) ' self.model.classes = self.classes prev, prev2 = START " for iter_ in range(nr_iter): c = 0 n = 0 ' for words, tags in sentences: I context = START + [self._normalize(w) for w in words] + END . for i, word in enumerate(words): 0 guess = self.tagdict.get(word) if not guess: O feats = self._get_features(i, word, context, prev, prev2) 7 guess = self.model.predict(feats) > self.model.update(tags[i], guess, feats) , prev2 = prev; prev = guess ' c += guess == tags[i] n += 1 # random.shuffle(sentences) N logging.info("Iter {0}: {1}/{2}={3}".format(iter_, c, n, _pc(c, n))) " self.model.average_weights() # Pickle as a binary file if save_loc is not None: G pickle.dump((self.model.weights, self.tagdict, self.classes), 0 open(save_loc, 'wb'), -1) return None ~Train on one sentence. 6Adapted from this portion of the Python train method: I context = START + [self._normalize(w) for w in words] + END . for i, word in enumerate(words): 0 guess = self.tagdict.get(word) if not guess: O feats = self._get_features(i, word, context, prev, prev2) 7 guess = self.model.predict(feats) > self.model.update(tags[i], guess, feats) , prev2 = prev; prev = guess ' c += guess == tags[i] n += 1 ,Predict a Part of Speech, defaulting to the Unk tag, if no classification is found. ─Default feature set. 9 def _get_features(self, i, word, context, prev, prev2): C '''Map tokens into a feature representation, implemented as a I {hashable: float} dict. If the features change, a new model must be trained. ''' def add(name, *args): 8 features[' '.join((name,) + tuple(args))] += 1 i += len(self.START) ! features = defaultdict(int) O # It's useful to have a constant feature, which acts sort of like a prior add('bias') add('i suffix', word[-3:]) add('i pref1', word[0]) add('i-1 tag', prev) add('i-2 tag', prev2) ' add('i tag+i-2 tag', prev, prev2) add('i word', context[i]) - add('i-1 tag+i word', prev, context[i]) # add('i-1 word', context[i-1]) * add('i-1 suffix', context[i-1][-3:]) # add('i-2 word', context[i-2]) # add('i+1 word', context[i+1]) * add('i+1 suffix', context[i+1][-3:]) # add('i+2 word', context[i+2]) return features УЖВ│ЬЫЗThe POS tag parser. The inital model. +Training data; formatted with one sentence , per line, and standard POS tags after each space-delimeted token. |}ШЭЩ1The number of times to iterate over the training 1 data, randomly shuffling after each iteration. (5 is a reasonable choice.) The ╠ to train. The training data. (A list of [(Text, Tag)]'s) $A trained perceptron. IO is needed for randomization. ┌┐└~─┘ ╪УЖВЬЫЗШЭЩ ВЬЫЗЩШЭ╪УЖУЖВ│ЬЫЗ|}ШЭЩ┌┐└~─┘#Part-of-Speech tagging facilities. experimentalcreswick@gmail.comNone ЧA basic POS tagger. Ъ?A POS tagger that has been trained on the Conll 2000 POS tags. 6A POS tagger trained on a subset of the Brown corpus. ?The default table of tagger IDs to readTagger functions. Each = tagger packaged with Chatter should have an entry here. By D convention, the IDs use are the fully qualified module name of the tagger package. Store a POSTager to a file. !Load a tagger, using the interal . If you need to E specify your own mappings for new composite taggers, you should use . >This function checks the filename to determine if the content 0 should be decompressed. If the file ends with .gz , then we assume it is a gziped model. >Tag a chunk of input text with part-of-speech tags, using the ; sentence splitter, tokenizer, and tagger contained in the POSTager. Tag the tokens in a string. @Returns a space-separated string of tokens, each token suffixed ( with the part of speech. For example: tag tagger "the dog jumped .""the/at dog/nn jumped/vbd ./." Text version of tagStr <Train a tagger on string input in the standard form for POS tagged corpora: 0 trainStr tagger "the/at dog/nn jumped/vbd ./." The ├ version of Train a ≥ on a corpus of sentences. This will recurse through the ≥ stack, training all the E backoff taggers as well. In order to do that, this function has to B be generic to the kind of taggers used, so it is not possible to ( train up a new POSTagger from nothing: wouldn't know what tagger to create. 8To get around that restriction, you can use the various mkTagger implementations, such as or % NLP.POS.AvgPerceptronTagger.mkTagger'. For example: + import NLP.POS.AvgPerceptronTagger as APT : let newTagger = APT.mkTagger APT.emptyPerceptron Nothing , posTgr <- train newTagger trainingExamples Evaluate a POSTager. 4Measures accuracy over all tags in the test corpus. Accuracy is calculated as: * |tokens tagged correctly| / |all tokens| ЧЪ ЧЪ ЧЪЧЪ Phrase Chunking facilities. experimentalcreswick@gmail.comNone A basic Phrasal chunker. 3Convenient function to load the Conll2000 Chunker. 1Train a chunker on a set of additional examples. Chunk a /% that has been produced by a Chatter D tagger, producing a rich representation of the Chunks and the Tags detected. ?If you just want to see chunked output from standard text, you probably want or . :Convenience funciton to Tokenize, POS-tag, then Chunk the A provided text, and format the result in an easy-to-read format. > tgr <- defaultTagger > chk <- defaultChunker ? > chunkText tgr chk "The brown dog jumped over the lazy cat." V "[NP The/DT brown/NN dog/NN] [VP jumped/VBD] [NP over/IN the/DT lazy/JJ cat/NN] ./." A wrapper around that packs strings. ?The default table of tagger IDs to readTagger functions. Each = tagger packaged with Chatter should have an entry here. By D convention, the IDs use are the fully qualified module name of the tagger package. Store a а to disk. Load a а% from disk, optionally gunzipping if # needed. (based on file extension) None@Different classes of Named Entity used in the WikiNER data set. out not a chunk. ┤>Convert wikiNer format to basic IOB (one token perline, space 9 separated tags, and a blank line between each sentence) ┬;Tranlsate a WikiNER sentence into a list of IOB-lines, for parsing with Y !&Train a chunker on a provided corpus. ┤ ┬!┴┼▀ ! ! ┤ ┬!┴┼▀▄!"#$%&'()*+,-./012345667789:;<=>?@ABCDEFGGHHIJKLMMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{8|}~─│┌┐└┘├┤┬┴┼▀▄█▌▐H░▒▓⌠■∙√≈≤≥ ⌡°²·÷═║╒ёє╔ії╗╘╙M╚╛ґў╞╟╠╡ЁЄ╣ІDЇ╦╧╨╩╪ҐЎ© 8 | ю а б ц д е ф г } х и й к л м н ~ о п я р с т ─ │ у ┐ ж └ ┘ в ь ы з ш э щ ├ ┤ ч ┬ ъ ┼ Ю А Б ▄ Ц Д Е █ Ф Г Х И Й К Л М Н О П Я Р С Т У Ж В Ь Ы З Ш Э Щ Ч Ъ ╠ ⌠ ■ ∙ ≈ ≤ ≥ ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ° ^ _ ` a b c d ² e f g h i j k l m · ÷ n o p q r s t u v w x y z { | } ~ ─ │ ═ ║ ┌ ╒ ┐ ё є ╘ ╙ M └ ┘ ├ ┤ ┬ ┴ ┴ ┼ ▀ ▄ ▄ █ ▌ ▐ ░ ▒ ▓ ⌠ ■ ∙ √ ≈ ≤≥ ⌡°≥ ² · · ÷ ═ ║ ╒ ё є є ╔ і ї ╗ ╘ ╙ ²╚╚╛ґў╞╟╠╡ЁЄ╣ІІЇ╦╧╧╨╩╪ҐЎ©юабцдефгхи(&йклмнопярстужвьы≥ зш²⌡°╣эщчъЮА⌡БЦДЕФ²ГХИ²ЁЙКЛМНАM└╨╧Ї╦ОПЯРСТУЖВЬЫЗШЭЩЧЪ ╪ Ґ © ! " # $%&'( ) * + , - . / 0 1 2 3 456789:;<=>?@ABCDEFG789H>:=6IABJKwLMchatter-0.8.0.1Data.DefaultMapNLP.Corpora.EmailNLP.Types.GeneralNLP.Types.TagsNLP.Types.Tree NLP.Types.IOBNLP.Tokenize.ChatterNLP.Corpora.ConllNLP.Corpora.Brown NLP.TypesNLP.POS.LiteralTaggerNLP.POS.UnambiguousTaggerNLP.ML.AvgPerceptronNLP.Chunk.AvgPerceptronChunkerNLP.Similarity.VectorSimNLP.Extraction.Parsec&NLP.Extraction.Examples.ParsecExamplesNLP.Corpora.ParsingNLP.POS.AvgPerceptronTaggerNLP.POS NLP.ChunkNLP.Corpora.WikiNerData.Mapfoldl Paths_chatter serializetaggerTable readTagger Data.TextwordslinesmkTagger DefaultMapDefMap defDefaultdefMapemptylookupfromListkeyselemsmap unionWithplugDataPathplugArchiveTextplugArchiveTokensfullPlugArchivereadF CaseSensitiveInsensitive SensitiveErrortoEitherErrRawTagRawChunkTagfromTagparseTagtagUNKtagTermstartTagendTagisDtChunkTag fromChunk parseChunknotChunkNERTag fromNERTagparseNERTagTokenPOSposTagposTokenTaggedSentence TaggedSentChunkChunkOrPOS_CNChunk_CNChunkedSentenceChunkedSentSentenceSenttokens applyTagsshowChunkedSentprintTS stripTags unzipTagsunzipChunkscombinecombineSentencespickTagmkChunkmkChink showPOStok showPOStagprintPOSshowToksuffixunTStsLengthtsConcatcontainscontainsTag posTagMatches posTokMatchestokenMatchesIOBChunkOChunkIChunkBChunkgetPOStoTaggedSentenceparseIOBLine iobBuildertoChunkTreeparseIOB parseSentencegetSentencestokenizerunTokenizerUnkWRBWPdollarWPWDTVBZVBPVBNVBGVBDVBUHTOSYMRPRBSRBRRB PRPdollarPRPPDTNNPSNNPNNSNNMDLSJJSJJRJJINFWEXDTCDCCColonTermCommaCl_ParenOp_Paren OpenDQuoteCloseDQuoteDollarHashENDSTARTOVPUCPSBARPRTPPNPLSTINTJCONJPADVPADJPMISCLOCORGPERreadTagtagTxtPatternsreversePatternsshowTag replaceAll WRB_pl_MD WRB_pl_IN WRB_pl_DOZWRB_pl_DODstar WRB_pl_DOD WRB_pl_DO WRB_pl_BEZ WRB_pl_BERWQL WPS_pl_MD WPS_pl_HVZ WPS_pl_HVD WPS_pl_BEZWPSWPO WDT_pl_HVZ WDT_pl_DODWDT_pl_DO_pl_PPS WDT_pl_BEZWDT_pl_BER_pl_PP WDT_pl_BER VBN_pl_TO VBG_pl_TOVB_pl_VBVB_pl_TOVB_pl_RP VB_pl_PPOVB_pl_JJVB_pl_INVB_pl_ATTO_pl_VBRP_pl_INRNRBT RBR_pl_CSRB_pl_CS RB_pl_BEZRBdollarQLPQL PPSS_pl_VB PPSS_pl_MDPPSS_pl_HVD PPSS_pl_HVPPSS_pl_BEZstarPPSS_pl_BEZPPSS_pl_BERPPSS_pl_BEMPPSS PPS_pl_MD PPS_pl_HVZ PPS_pl_HVD PPS_pl_BEZPPSPPOPPLSPPLPPdollardollarPPdollarPN_pl_MD PN_pl_HVZ PN_pl_HVD PN_pl_BEZPNdollarPNODNRSNR_pl_MDNRdollarNR NPSdollarNPSNP_pl_MD NP_pl_HVZ NP_pl_BEZNPdollar NNS_pl_MD NNSdollarNN_pl_NNNN_pl_MDNN_pl_IN NN_pl_HVZ NN_pl_HVD NN_pl_BEZNNdollarMD_pl_TO MD_pl_PPSSMD_pl_HVMDstarJJT JJR_pl_CSJJ_pl_JJJJdollar IN_pl_PPOIN_pl_INHVZstarHVZHVNHVGHVDstarHVDHV_pl_TOHVstarHVFW_WPSFW_WPOFW_WDTFW_VBZFW_VBNFW_VBGFW_VBDFW_VBFW_UHFW_TO_pl_VBFW_RB_pl_CCFW_RBFW_QL FW_PPSS_pl_HVFW_PPSSFW_PPSFW_PPO_pl_INFW_PPO FW_PPL_pl_VBZFW_PPLFW_PPdollarFW_PNFW_ODFW_NRFW_NPSFW_NPFW_NNSFW_NNdollarFW_NNFW_JJTFW_JJRFW_JJFW_IN_pl_NPFW_IN_pl_NNFW_IN_pl_ATFW_INFW_HVFW_DTSFW_DT_pl_BEZFW_DTFW_CSFW_CDFW_CCFW_BEZFW_BERFW_BEFW_AT_pl_NPFW_AT_pl_NNFW_ATFW_starEX_pl_MD EX_pl_HVZ EX_pl_HVD EX_pl_BEZDTX DTS_pl_BEZDTSDTIDT_pl_MD DT_pl_BEZDTdollarDOZstarDOZDODstarDOD DO_pl_PPSSDOstarDOCSCDdollarBEZstarBEZBERstarBERBENBEMstarBEMBEGBEDZstarBEDZBEDstarBEDBEATAP_pl_APAPdollarAPABXABNABLDashNegatorC_OC_CLC_PPC_VPC_NPCorpus corpLengthcorpTermCounts POSTagger posTagger posTrainer posBackoffposTokenizerposSplitterposSerializeposID termCountsaddDocumentmkCorpusaddTermsaddTermtaggerIDprotectTermstagtagSentencetrain Perceptronweightstotalststamps instancesWeightClassFeatureFeatemptyPerceptronpredictupdateaverageWeightsChunker chChunker chTrainerchSerializechId chunkerIDreadChunker mkChunkerchunk chunkSentencetrainIntDocumentdocTermFrequencies docTokens TermVector mkDocumentfromTVmkVectorsim similaritytvSimtfidftf_idfcosVec addVectors zeroVectornegatesum magnitudedotProd ExtractorposTok posPrefixmatchestxtTokanyTokenoneOf followedBy findClauseclause prepPhrase nounPhrase verbPhrasereadPOSreadPOSWithsafeInittrainNewtrainOnFiles defaultTaggerconllTaggerbrownTagger saveTagger loadTaggerdeserialize tagTokenstagStrtagTexttrainStr trainTextevaldefaultChunkerconllChunker chunkTextchunkStrchunkerTablesaveChunkerloadChunkerparseWikiNerwikiNerChunkertrainChunker$fArbitraryDefaultMap$fNFDataDefaultMap$fSerializeDefaultMap$fArbitraryCaseSensitive$fSerializeCaseSensitive$fTagRawTag$fArbitraryRawTag$fSerializeRawTag$fChunkTagRawChunk$fSerializeRawChunkbaseData.StringIsString$fIsStringToken$fArbitraryToken$fArbitraryPOS$fArbitraryChunk$fArbitraryChunkOr$fArbitraryTaggedSentence$fArbitraryChunkedSentence$fArbitrarySentence$fArbitraryIOBChunkcatchIOversionbindirlibdirdatadir libexecdir sysconfdir getBinDir getLibDir getDataDir getLibexecDir getSysconfDirgetDataFileName$fChunkTagChunk$fSerializeTag$fArbitraryTag$fTagTag$fSerializeChunk$fNERTagNERTag$fSerializeNERTag$fArbitraryNERTag parseBrownTagshowBrownTagbytestring-0.10.0.2Data.ByteString.Internal ByteString$fArbitraryCorpus$fSerializeCorpus$fNFDataCorpusescapeRegexCharscontainers-0.5.0.0 Data.Map.BaseMapupd_featroundToinfinityincrementInstancesgetTimestampgetTotalgetFeatureWeighttrainEx$fNFDataPerceptron$fSerializePerceptron$fSerializeClass$fSerializeFeaturetoTreetrainCls startToksendToks trainSentenceitterationspredictChunk toChunkOr toClassLstgetFeaturestagsSinceDttagsSinceHelper mkFeaturetext-1.2.1.3$fArbitraryDocument$fNFDataDocument$fArbitraryTermVector$fStreamChunkedSentencemChunkOr$fStreamTaggedSentencemPOS predictPostokenToClassData.Text.InternalText toIOBLines