A Birds-Eye View of GF as a Grammar Formalism

Author: Aarne Ranta
Last update: Thu Feb 2 14:16:01 2006



Abstract. This document gives a general description of the Grammatical Framework (GF), with comparisons to other grammar formalisms such as CG, ACG, HPSG, and LFG.

GF in a few words

Grammatical Framework (GF) is a grammar formalism based on constructive type theory.

GF makes a distinction between abstract syntax and concrete syntax.

The abstract syntax part of GF is a logical framework, with dependent types and higher-order functions.

The concrete syntax is a system of records containing strings and features.

A GF grammar defines a reversible homomorphism from an abstract syntax to a concrete syntax.

A multilingual GF grammar is a set of concrete syntaxes associated with one abstract syntax.

GF grammars are written in a high-level functional programming language, which is compiled into a core language (GFC).

GF grammars can be used as resources, i.e. as libraries for writing new grammars; these are compiled and optimized by the method of grammar composition.

GF has a module system that supports grammar engineering and separate compilation.

History of GF

1988. Intuitionistic Categorial Grammar; type theory as abstract syntax, playing the role of Montague's analysis trees. Grammars implemented in Prolog.

1994. Type-Theoretical Grammar. Abstract syntax organized as a system of combinators. Grammars implemented in ALF.

1996. Multilingual Type-Theoretical Grammar. Rules for generating six languages from the same abstract syntax. Grammars implemented in ALF, ML, and Haskell.

1998. The first implementation of GF as a language of its own.

2000. New version of GF: high-level functional source language, records used for concrete syntax.

2003. The module system.

2004. Ljunglöf's thesis Expressivity and Complexity of GF.

Some key ingredients of GF in other grammar formalisms

/ GF ACG LFG HPSG CG
abstract vs concrete syntax X X ? - -
type theory X X - - X
records and features X - X X -

Examples of descriptions in each formalism

To be written...

Lambda terms and records

In CS, abstract syntax is trees and concrete syntax is strings. This works more or less for programming languages.

In CG, all syntax is lambda terms.

In Montague grammar, abstract syntax is lambda terms and concrete syntax is trees. Abstract syntax as lambda terms can be considered well-established.

In PATR and HPSG, concrete syntax it records. This can be considered well-established for natural languages.

In ACG, both are lambda terms. This is more general than GF, but reversibility requires linearity restriction, which can be unnatural for grammar writing.

In GF, linearization from lambda terms to records is reversible, and grammar writing is not restricted to linear terms.

Grammar composition in ACG is just function composition. In GF, it is more restricted...

The structure of GF formalisms

The following diagram (to be drawn properly!) describes the levels.

         |   programming language design
         V
    GF source language
         |
         |   type-directed partial evaluation
         V
    GFC assembly language
         |
         |   Ljunglöf's translation
         V
    MCFG parser

The last two phases are nontrivial mathematica properties.

In most grammar formalisms, grammarians have to work on the GFC (or MCFG) level.

Maybe they use macros - they are therefore like macro assemblers. But there are no separately compiled library modules, no type checking, etc.

The expressivity of GF

Parsing complexity is the same as MCFG: polynomial, with unrestricted exponent depending on grammar. This is between TAG and HPSG.

If semantic well-formedness (type theory) is taken into account, then arbitrary logic can be expressed. The well-formedness of abstract syntax is decidable, but the well-formedness of a concrete-syntax string can require an arbitrary proof construction and is therefore undecidable.

Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF has the goal of assigning intended trees for strings. This is generalized to shared trees for different languages.

The high-level language strives after the properties of writability and readability (programming language notions).

Grammars and parsing

In many projects, a grammar is just seen as a declarative parsing program.

For GF, a grammar is primarily the definition of a language.

Detaching grammars from parsers is a good idea, giving

Separating abstract from concrete syntax is a prerequisite for this: we want parsers to return abstract syntax objects, and these must exist independently of parse trees.

A possible radical approach to parsing: use a grammar to generate a treebank and machine-learn a statistical parser from this.

Comparison: Steedman in CCG has done something like this.

Grammars as software libraries

Reuse for different purposes.

Grammar composition.

Multilinguality

In application grammars, the AS is a semantic model, and a CS covers domain terminology and idioms.

This can give publication-quality translation on limited domains (e.g. the WebALT project).

Resource grammars with grammar composition lead to compile-time transfer.

When is run-time transfer necessary?

Cf. CLE (Core Language Engine).

Parametrized modules

This notion comes from the ML language in the 1980's.

It can be used for sharing even more code between languages than their AS.

Especially, for related languages (Scandinavian, Romance).

Cf. grammar porting in CLE: what they do with untyped macro packages GF does with typable interfaces.