>>> import pgfOnce you have the module imported, you can use the dir and help functions to see what kind of functionality is available. dir takes an object and returns a list of methods available in the object:
>>> dir(pgf)help is a little bit more advanced and it tries to produce more human readable documentation, which more over contains comments:
>>> help(pgf)A grammar is loaded by calling the method readPGF:
>>> gr = pgf.readPGF("App12.pgf")From the grammar you can query the set of available languages. It is accessible through the property languages which is a map from language name to an object of class pgf.Concr which respresents the language. For example the following will extract the English language:
>>> eng = gr.languages["AppEng"] >>> print eng <pgf.Concr object at 0x7f7dfa4471d0>
>>> i = eng.parse("this is a small theatre")This gives you an iterator which can enumerates all possible abstract trees. You can get the next tree by calling next:
>>> p,e = i.next()The results are always pairs of probability and tree. The probabilities are negated logarithmic probabilities and which means that the lowest number encodes the most probable result. The possible trees are returned in decreasing probability order (i.e. increasing negated logarithm). The first tree should have the smallest p:
>>> print p 35.9166526794and this is the corresponding abstract tree:
>>> print e PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVocThe parse method has also the following optional parameters:
cat | start category |
n | maximum number of trees |
heuristics | a real number from 0 to 1 |
callbacks | a list of category and callback function |
>>> i = eng.parse("a small theatre", cat="NP")
The heuristics factor can be used to trade parsing speed for quality. By default the list of trees is sorted by probability this corresponds to factor 0.0. When we increase the factor then parsing becomes faster but at the same time the sorting becomes imprecise. The worst factor is 1.0. In any case the parser always returns the same set of trees but in different order. Our experience is that even a factor of about 0.6-0.8 with the translation grammar, still orders the most probable tree on top of the list but further down the list the trees become shuffled.
The callbacks is a list of functions that can be used for recognizing literals. For example we use those for recognizing names and unknown words in the translator.
>>> e = pgf.readExpr("AdjCN (PositA red_A) (UseN theatre_N)")and then we can linearize it:
>>> print eng.linearize(e) red theatreThis method produces only a single linearization. If you use variants in the grammar then you might want to see all possible linearizations. For that purpouse you should use linearizeAll:
>>> for s in eng.linearizeAll(e): print s red theatre red theaterIf, instead, you need an inflection table with all possible forms then the right method to use is tabularLinearize:
>>> eng.tabularLinearize(e): {'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
Finally, you could also get a linearization which is bracketed into a list of phrases:
>>> [b] = eng.bracketedLinearize(e) >>> print b (CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))Each bracket is actually an object of type pgf.Bracket. The property cat of the object gives you the name of the category and the property children gives you a list of nested brackets. If a phrase is discontinuous then it is represented as more than one brackets with the same category name. In that case, the index that you see in the example above will have the same value for all brackets of the same phrase. The linearization works even if there are functions in the tree that doesn't have linearization definitions. In that case you will just see the name of the function in the generated string. It is sometimes helpful to be able to see whether a function is linearizable or not. This can be done in this way:
>>> print eng.hasLinearization("apple_N")
An already constructed tree can be analyzed and transformed in the host application. For example you can deconstruct a tree into a function name and a list of arguments:
>>> e.unpack() ('AdjCN', [<pgf.Expr object at 0x7f7df6db78c8>, <pgf.Expr object at 0x7f7df6db7878>])The result from unpack can be different depending on the form of the tree. If the tree is a function application then you always get a tuple of function name and a list of arguments. If instead the tree is just a literal string then the return value is the actual literal. For example the result from:
>>> pgf.readExpr('"literal"').unpack() 'literal'is just the string 'literal'. Situations like this can be detected in Python by checking the type of the result from unpack.
For more complex analyses you can use the visitor pattern. In object oriented languages this is just a clumpsy way to do what is called pattern matching in most functional languages. You need to define a class which has one method for each function in the abstract syntax of the grammar. If the functions is called f then you need a method called on_f. The method will be called each time when the corresponding function is encountered, and its arguments will be the arguments from the original tree. If there is no matching method name then the runtime will to call the method default. The following is an example:
>>> class ExampleVisitor: def on_DetCN(self,quant,cn): print "Found DetCN" cn.visit(self) def on_AdjCN(self,adj,cn): print "Found AdjCN" cn.visit(self) def default(self,e): pass >>> e2.visit(ExampleVisitor()) Found DetCN Found AdjCNHere we call the method visit from the tree e2 and we give it, as parameter, an instance of class ExampleVisitor. ExampleVisitor has two methods on_DetCN and on_AdjCN which are called when the top function of the current tree is DetCN or AdjCN correspondingly. In this example we just print a message and we call visit recursively to go deeper into the tree. Constructing new trees is also easy. You can either use readExpr to read trees from strings, or you can construct new trees from existing pieces. This is possible by using the constructor for pgf.Expr:
>>> quant = pgf.readExpr("DetQuant IndefArt NumSg") >>> e2 = pgf.Expr("DetCN", [quant, e]) >>> print e2 DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
>>> gr.embed("App") <module 'App' (built-in)> >>> import AppNow creating new trees is just a matter of calling ordinary Python functions:
>>> print App.DetCN(quant,e) DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N))
for entry in eng.fullFormLexicon(): print entryThe second one implements a simple lookup. The argument is a word form and the result is a list of analyses:
print eng.lookupMorpho("letter") [('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
>>> gr.functions ....or a list of categories:
>>> gr.categories ....You can also access all functions with the same result category:
>>> gr.functionsByCat("Weekday") ['friday_Weekday', 'monday_Weekday', 'saturday_Weekday', 'sunday_Weekday', 'thursday_Weekday', 'tuesday_Weekday', 'wednesday_Weekday']The full type of a function can be retrieved as:
>>> print gr.functionType("DetCN") Det -> CN -> NP
The runtime type checker can do type checking and type inference for simple types. Dependent types are still not fully implemented in the current runtime. The inference is done with method inferExpr:
>>> e,ty = gr.inferExpr(e) >>> print e AdjCN (PositA red_A) (UseN theatre_N) >>> print ty CNThe result is a potentially updated expression and its type. In this case we always deal with simple types, which means that the new expression will be always equal to the original expression. However, this wouldn't be true when dependent types are added.
Type checking is also trivial:
>>> e = gr.checkExpr(e,pgf.readType("CN")) >>> print e AdjCN (PositA red_A) (UseN theatre_N)In case of type error you will get an exception:
>>> e = gr.checkExpr(e,pgf.readType("A")) pgf.TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
$ gf -make -split-pgf App12.pgfNow you can load the grammar as usual but this time only the abstract syntax will be loaded. You can still use the languages property to get the list of languages and the corresponding concrete syntax objects:
>>> gr = pgf.readPGF("App.pgf") >>> eng = gr.languages["AppEng"]However, if you now try to use the concrete syntax then you will get an exception:
>>> gr.languages["AppEng"].lookupMorpho("letter") Traceback (most recent call last): File "Before using the concrete syntax, you need to explicitly load it:", line 1, in pgf.PGFError: The concrete syntax is not loaded
>>> eng.load("AppEng.pgf_c") >>> print eng.lookupMorpho("letter") [('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]When you don't need the language anymore then you can simply unload it:
>>> eng.unload()
>>> print gr.graphvizAbstractTree(e) graph { n0[label = "AdjCN", style = "solid", shape = "plaintext"] n1[label = "PositA", style = "solid", shape = "plaintext"] n2[label = "red_A", style = "solid", shape = "plaintext"] n1 -- n2 [style = "solid"] n0 -- n1 [style = "solid"] n3[label = "UseN", style = "solid", shape = "plaintext"] n4[label = "theatre_N", style = "solid", shape = "plaintext"] n3 -- n4 [style = "solid"] n0 -- n3 [style = "solid"] }
>>> print eng.graphvizParseTree(e) graph { node[shape=plaintext] subgraph {rank=same; n4[label="CN"] } subgraph {rank=same; edge[style=invis] n1[label="AP"] n3[label="CN"] n1 -- n3 } n4 -- n1 n4 -- n3 subgraph {rank=same; edge[style=invis] n0[label="A"] n2[label="N"] n0 -- n2 } n1 -- n0 n3 -- n2 subgraph {rank=same; edge[style=invis] n100000[label="red"] n100001[label="theatre"] n100000 -- n100001 } n0 -- n100000 n2 -- n100001 }