#LyX 2.0 created this file. For more info see http://www.lyx.org/ \lyxformat 413 \begin_document \begin_header \textclass article \begin_preamble \usepackage{times} \end_preamble \use_default_options true \maintain_unincluded_children false \language english \language_package default \inputencoding auto \fontencoding global \font_roman default \font_sans default \font_typewriter default \font_default_family default \use_non_tex_fonts false \font_sc false \font_osf false \font_sf_scale 100 \font_tt_scale 100 \graphics default \default_output_format default \output_sync 0 \bibtex_command default \index_command default \paperfontsize 11 \spacing single \use_hyperref true \pdf_bookmarks true \pdf_bookmarksnumbered false \pdf_bookmarksopen false \pdf_bookmarksopenlevel 1 \pdf_breaklinks false \pdf_pdfborder false \pdf_colorlinks true \pdf_backref false \pdf_pdfusetitle true \papersize a4paper \use_geometry false \use_amsmath 1 \use_esint 1 \use_mhchem 1 \use_mathdots 1 \cite_engine natbib_authoryear \use_bibtopic false \use_indices false \paperorientation portrait \suppress_date false \use_refstyle 1 \index Index \shortcut idx \color #008000 \end_index \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \paragraph_indentation default \quotes_language english \papercolumns 1 \papersides 1 \paperpagestyle default \tracking_changes false \output_changes false \html_math_output 0 \html_css_as_file 0 \html_be_strict false \end_header \begin_body \begin_layout Title A Bilingual Treebank for the FraCaS Test Suite \begin_inset Newline newline \end_inset CLT Project Report \end_layout \begin_layout Author Peter Ljunglöf and Magdalena Siverbo \begin_inset Newline newline \end_inset Centre for Language Technology \begin_inset Newline newline \end_inset University of Gothenburg \begin_inset Newline newline \end_inset E-mail: \begin_inset Flex URL status open \begin_layout Plain Layout peter.ljunglof@gu.se \end_layout \end_inset \end_layout \begin_layout Date 31st October, 2011 \end_layout \begin_layout Abstract \noindent We have created a bilingual treebank for 99% of the sentences in the FraCaS test suite. The treebank is built together with an associated bilingual English-Swedish lexicon written in the Grammatical Framework Resource Grammar. The original FraCaS sentences are English, and we have tested the multilinguali ty of the Resource Grammar by analysing the grammaticality and naturalness of the Swedish translations. 86% of the sentences are grammatically and semantically correct and sound natural. About 10% can probably be fixed by adding new lexical items or grammatical rules, and only a small amount are considered to be difficult to cure. \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash thispagestyle{empty} \end_layout \end_inset \end_layout \begin_layout Section Introduction \end_layout \begin_layout Standard In this project we have created a bilingual treebank for the FraCaS test suite \begin_inset CommandInset citation LatexCommand citep key "CooperCrouchEijck1996:Using-the-Framework" \end_inset , using the Grammatical Framework Resource Grammar Library \begin_inset CommandInset citation LatexCommand citep key "Ranta2009:The-GF-Resource-Grammar-Library,Ranta2009:Grammatical-Framework:-A-Multilingual,Ranta2011:Grammatical-Framework:-Programming" \end_inset . The project consisted of two parts that were partly interwoven. The first aim was to construct a treebank, which involved creating a lexicon and a limited grammar specific for the FraCaS test suite, parsing the sentences and selecting the most representative trees. The second aim was to build a FraCaS corpus in Swedish, using the treebank constructed in the first part of the project. This involved translating the English lexicon and grammar into Swedish equivalents, generating Swedish sentences for all the trees in the treebank and evaluate the results. \end_layout \begin_layout Standard \begin_inset Newpage pagebreak \end_inset \end_layout \begin_layout Subsection The FraCaS Corpus \end_layout \begin_layout Standard The FraCaS textual inference problem set \begin_inset CommandInset citation LatexCommand citep key "CooperCrouchEijck1996:Using-the-Framework" \end_inset was built in the mid 1990's by the FraCaS project, a large collaboration aimed at developing resources and theories for computational semantics. This test set was later modified and converted to XML by Bill MacCartney: \end_layout \begin_layout Standard \noindent \align center \family sans \begin_inset CommandInset href LatexCommand href target "http://www-nlp.stanford.edu/~wcmac/downloads/fracas.xml" \end_inset \end_layout \begin_layout Standard It is the latter, modified version that has been used in this project. The corpus consists of 346 problems each containing one or more statements and one yes/no-question (except for four problems, where there is no question). The total number of sentences in the corpus is 1220, but since some of them are repeated in several problems, there are in total 874 unique sentences. \end_layout \begin_layout Standard The FraCaS problems contain relatively simple sentences, and the premise and hypothesis sentences are usually syntactically similar. Despite this simplicity, the problems are intended to reflect a broad variety of semantic and inferential phenomena. For this reason, the FraCaS corpus has been used as a benchmark for evaluating different computational semantics systems \begin_inset CommandInset citation LatexCommand citep key "MacCartneyManning2008:Modeling-semantic-containment" \end_inset . \end_layout \begin_layout Standard The FraCaS corpus only contains made-up sentences, which are intended to be grammatically correct. Therefore we took the opportunity to correct some obvious minor mistakes, such as \emph on \begin_inset Quotes eld \end_inset a executive \begin_inset Quotes erd \end_inset \emph default . \emph on \begin_inset Quotes eld \end_inset does \family typewriter [\SpecialChar \ldots{} ] \family default has \begin_inset Quotes erd \end_inset \emph default , \emph on \begin_inset Quotes eld \end_inset did \family typewriter [\SpecialChar \ldots{} ] \family default delivered \begin_inset Quotes erd \end_inset \emph default , and \emph on \begin_inset Quotes eld \end_inset Jones's \begin_inset Quotes erd \end_inset \emph default . In total 7 sentences were corrected. \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Subsubsection from MacCartney's thesis: \end_layout \begin_layout Plain Layout The FraCaS test suite \begin_inset CommandInset citation LatexCommand cite key "CooperCrouchEijck1996:Using-the-Framework" \end_inset (Cooper et al. 1996) of NLI problems was one product of the FraCaS Consortium, a large collaboration in the mid-1990s aimed at developing a range of resources related to computational semantics. The FraCaS problems contain comparatively simple sentences, and the premise and hypothesis sentences are usu- ally quite similar, so that just a few edits suffice to transform p into h. Despite this simplicity, the problems are designed to reflect a broad diversity of semantic and infer- ential phenomena. For this reason, the FraCaS test suite has proven to be invaluable as a developmental test bed for the NatLog system and as a yardstick for evaluating its effectiveness. Indeed, the test suite was created with just such an application as its primary goal. As the authors write: \end_layout \begin_layout Quote In light of the view expressed elsewhere in this and other FraCaS de- liverables ... that inferential ability is not only a central manifestation of semantic competence but is in fact centrally constitutive of it, it shouldn’t be a surprise that we regard inferencing tasks as the best way of testing an NLP system’s semantic capacity.2 \end_layout \begin_layout Subsubsection from MacCartney & Manning (2007): \end_layout \begin_layout Plain Layout The FraCaS test suite (Cooper et al., 1996) was de- veloped as part of a collaborative research effort in computational semantics. It contains 346 inference problems reminiscent of a textbook on formal se- mantics. In the authors’ view, “inferencing tasks [are] the best way of testing an NLP system’s se- mantic capacity.” \end_layout \begin_layout Plain Layout The problems are divided into nine sections, each focused on a category of semantic phenomena, such as quantifiers or anaphora (see table 2). Each prob- lem consists of one or more premise sentences, fol- lowed by a one-sentence question. For this project, the questions were converted into declarative hy- potheses. \end_layout \begin_layout Plain Layout Each problem also has an answer, which (usually) takes one of three values: yes (the hypoth- esis can be inferred from the premise(s)), no (the negation of the hypothesis can be inferred), or unk (neither the hypothesis nor its negation can be in- ferred). \end_layout \begin_layout Subsubsection from Mac&Mann (2008): \end_layout \begin_layout Plain Layout The FraCaS test suite (Cooper et al., 1996) con- tains 346 NLI problems, divided into nine sec- tions, each focused on a specific category of se- mantic phenomena (listed in table 3). Each prob- lem consists of one or more premise sentences, a question sentence, and one of three answers: yes, no, or unknown \end_layout \end_inset \end_layout \begin_layout Subsubsection Examples from the FraCaS Corpus \end_layout \begin_layout Standard The FraCaS problems are divided into 9 broad categories which cover many aspects of semantic inference. The categories are called \emph on quantifiers \emph default , \emph on plurals \emph default , \emph on anaphora \emph default , \emph on ellipsis \emph default , \emph on adjectives \emph default , \emph on comparatives \emph default , \emph on temporal reference \emph default , \emph on verbs \emph default , and \emph on attitudes \emph default , and they are also sub-categorised and sub-sub-categorised in an hierarchy of semantic phenomena. Each problem starts with one or more premises, and a question that can be answered with yes, no or unknown. Here are two similar examples with different semantic inferences from the \emph on anaphora \emph default category: \end_layout \begin_layout Labeling \labelwidthstring (999) (135) P: Every customer who owns a computer has a service contract for it. \begin_inset Newline newline \end_inset P: MFI is a customer that owns several computers. \begin_inset Newline newline \end_inset Q: Does MFI have a service contract for all its computers? \begin_inset Newline newline \end_inset A: Yes. \end_layout \begin_layout Labeling \labelwidthstring (999) (136) P: Every executive who had a laptop computer brought it to take notes at the meeting. \begin_inset Newline newline \end_inset P: Smith is an executive who owns five different laptop computers. \begin_inset Newline newline \end_inset Q: Did Smith take five laptop computers to the meeting? \begin_inset Newline newline \end_inset A: Unknown. \end_layout \begin_layout Standard Some of the problems are equivalent to each other, but with different answers depending on ambiguity. This happens for the following problem from the \emph on ellipsis \emph default category: \end_layout \begin_layout Labeling \labelwidthstring (160--161) (160--161) P: John owns a red car. \begin_inset Newline newline \end_inset P: Bill owns a fast one. \begin_inset Newline newline \end_inset Q: Does Bill own a fast red car? \begin_inset Newline newline \end_inset A: Yes or unknown, depending on the reading of \begin_inset Quotes eld \end_inset one \begin_inset Quotes erd \end_inset . \end_layout \begin_layout Subsection Grammatical Framework \end_layout \begin_layout Standard Grammatical Framework (GF) \begin_inset CommandInset citation LatexCommand citep key "Ranta2009:Grammatical-Framework:-A-Multilingual,Ranta2011:Grammatical-Framework:-Programming" \end_inset is a grammar formalism based on type theory. The main feature is the separation of abstract and concrete syntax. The abstract syntax of a grammar defines a set of abstract syntactic structures , called abstract terms or trees; and the concrete syntax defines a relation between abstract structures and concrete structures. The concrete syntax is expressive enough to describe language-specific linguistic features such as word order, gender and case inflection, and discontinuous phrases. This makes it very suitable for writing multilingual grammars, where the abstract syntax is lifted to a more language universal level. \end_layout \begin_layout Subsubsection Simple GF Example \end_layout \begin_layout Standard As an example to show the possibilities of GF, we define adjectives as noun-modi fying functions in the spirit of categorial grammar: \end_layout \begin_layout Description (Abstract) \begin_inset Formula $\mathit{green:CN\rightarrow CN}$ \end_inset \end_layout \begin_layout Standard This means that \emph on green \emph default is a grammatical construction that create common nouns (CN) from common nouns (CN). This does not say anything about the word order, which is instead defined in the linearisation rules in the concrete syntax. In English, the adjective comes before the noun: \end_layout \begin_layout Description \series bold (English) \series default \begin_inset Formula $\mathit{green\; n="\! green"\,+\negmedspace\negmedspace+\:\: n}$ \end_inset \end_layout \begin_layout Standard Whereas in French the adjective comes after: \end_layout \begin_layout Description (French) \begin_inset Formula $\mathit{green\; n=n\:+\negmedspace\negmedspace+\:\:"\! vert"}$ \end_inset \end_layout \begin_layout Standard But since French adjectives are inflected by number and gender, this is only correct for singular masculine nouns. That is why GF concrete syntax has support for inflection tables, inherent attributes and discontinuous constituents, which makes the formalism as expressive as Multiple Context-Free Grammars \begin_inset CommandInset citation LatexCommand citep key "Ljunglof2004:Expressivity-and-Complexity-of-GF" \end_inset . A slightly more correct French variant of the adjective \emph on green \emph default would then be: \end_layout \begin_layout Description \series bold (French) \series default \begin_inset Formula $\mathit{green\; n=\mathbf{table}\left\{ \begin{array}{l} Sg\:\Rightarrow\: n\,!\, Sg\:+\negmedspace\negmedspace+\:\:"\! vert"\\ Pl\:\Rightarrow\: n\,!\, Pl\:+\negmedspace\negmedspace+\:\:"\! verts" \end{array}\right\} }$ \end_inset \end_layout \begin_layout Standard But this still does not handle feminine nouns, which of course is possible. Even better is to make use of the GF Resource Grammar, where all these inflection paradigms are already defined. \end_layout \begin_layout Subsubsection The GF Resource Grammar \end_layout \begin_layout Standard GF has a rich module system which facilitates grammar writing as an engineering task, by reusing common grammars. The abstract syntax of one grammar can be used as a concrete syntax of another grammar. This makes it possible to implement grammar resources to be used in several different application domains. These points are currently exploited in the GF Resource Grammar Library \begin_inset CommandInset citation LatexCommand citep key "Ranta2009:The-GF-Resource-Grammar-Library,Ranta2011:Grammatical-Framework:-Programming" \end_inset , which is a multilingual GF grammar with a common abstract syntax for 20 languages, including Finnish, Persian, Russian and Urdu. The main purpose of the Grammar Library is as a resource for writing domain-spe cific grammars. \end_layout \begin_layout Standard Now we can define the French and English linearisations for the adjective functions using the resource grammar, which then takes care of all kinds of inflection: \end_layout \begin_layout Description (French) \begin_inset Formula $\mathit{green\; n=AdjCN\:(PositA\:(mkA\;"\! vert"))\: n}$ \end_inset \end_layout \begin_layout Description (English) \begin_inset Formula $\mathit{green\; n=AdjCN\:(PositA\:(mkA\;"\! green"))\: n}$ \end_inset \end_layout \begin_layout Standard Here \emph on AdjCN \emph default is a function that modifies a common noun with an adjective phrase, \emph on PositA \emph default uses the positive form of an adjective, and \emph on mkA \emph default creates all possible inflections of a regular adjective. Note that the structures of the English and French linearisations are the same, except for the lexical entries, and this can be exploited in GF by creating a language-independent concrete syntax. The FraCaS treebank is language-independent in this sense, since the tree for each sentence is the same for both English and Swedish. \end_layout \begin_layout Section The English Treebank \end_layout \begin_layout Subsection The FraCaS Grammar \end_layout \begin_layout Standard To be able to construct a GF treebank we need a grammar and a lexicon that can describe every sentence in the corpus. We have used the GF Resource Grammar as underlying grammar, and added lexical items that capture the FraCaS domain. On top of the resource grammar we have added a few new grammatical construction s, as well as functions for handling elliptic phrases. \end_layout \begin_layout Standard In total, we used 107 grammatical functions out of the 189 that are defined in the resource grammar. In addition we added four new grammatical constructions that were lacking, and 7 different elliptic phrases. \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Plain Layout In order to construct the treebank for FraCaS, two modules were written, one lexicon module and one grammar module. \end_layout \begin_layout Subsubsection Lexicon module \end_layout \begin_layout Plain Layout The FraCaS lexicon module consists of an abstract and a concrete part. \end_layout \begin_layout Description FraCaSLex Abstract lexicon for the FraCaS test suite \end_layout \begin_layout Description FraCaSLexEng Concrete lexicon for the FraCaS test suite \end_layout \begin_layout Plain Layout The lexicon was built using the functions mkN, mkA, mkV etc, mainly from the Paradigms module. \end_layout \begin_layout Subsubsection Grammar module \end_layout \begin_layout Plain Layout The FraCaS grammar module consists of an abstract and a concrete part. \end_layout \begin_layout Description FraCaS Abstract grammar for the FraCaS test suite \end_layout \begin_layout Description FraCaSEng Concrete grammar for the FraCaS test suite \end_layout \begin_layout Plain Layout Initially, the whole Grammar module from the resource grammar was imported, but in the end only parts of the Grammar module (namely Noun, Verb, Adjective, Adverb, Numeral and Tense) were imported, while other parts were opened and necessary functions used in the FraCaS module. A few functions were added, mainly on clause and sentence level, in order to simplify the tree structures. \end_layout \end_inset \end_layout \begin_layout Subsubsection Lexicon \end_layout \begin_layout Standard The lexicon has in total 531 entries, some of which are structural words already defined in the resource grammar. Some of the lexical items denote different meanings of the same word. Examples of this include the word \emph on \begin_inset Quotes eld \end_inset than \begin_inset Quotes erd \end_inset \emph default which can function as a preposition and as a subjunction, the verb \emph on \begin_inset Quotes eld \end_inset go \begin_inset Quotes erd \end_inset \emph default which can mean \emph on \begin_inset Quotes eld \end_inset travel \begin_inset Quotes erd \end_inset \emph default or \emph on \begin_inset Quotes eld \end_inset walk \begin_inset Quotes erd \end_inset \emph default , and the conjunction \emph on \begin_inset Quotes eld \end_inset and \begin_inset Quotes erd \end_inset \emph default which can be a phrase initial conjunction and an ordinary conjuntion. Other entries denote different valencies of the same meaning. This is most common for verbs, such as the transitive verb \emph on \begin_inset Quotes eld \end_inset finish \begin_inset Quotes erd \end_inset \emph default which can take a noun phrase or a verb phrase argument, and the verb \emph on \begin_inset Quotes eld \end_inset know \begin_inset Quotes erd \end_inset \emph default which can take either a question or a sentence as argument. \end_layout \begin_layout Standard The lexicon entries are divided into 63 adjectives, 77 adverbials, 20 conjunctio ns/subjunctions, 34 determiners, 142 nouns, 19 numerals, 40 proper nouns, 15 prepositions, 12 pronouns, and 109 verbs. Out of these, 55 adverbials and 28 nouns/proper nouns are multi-word expression s. \end_layout \begin_layout Subsubsection Multi-word Lexical Items \begin_inset CommandInset label LatexCommand label name "sub:Multi-word-Lexical-Items" \end_inset \end_layout \begin_layout Standard 83 of the lexical items denote multi-word phrases. They were mainly divided into two types: \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Itemize P: Modified proper nouns (A + PN) could not be parsed. \begin_inset Newline newline \end_inset S: “southern Europe” was defined as PN in FraCaSLex. \end_layout \begin_layout Itemize P: Compounds constructed from a proper noun and a noun (PN + N) , and hyphenated nouns (N-N) could not be parsed. \begin_inset Newline newline \end_inset S: “Labour MP”, “APCOM manager”, “stock-market” etc. were defined as N in FraCaSLex. \end_layout \begin_layout Itemize (SKIP) P: Certain indefinite pronouns were not recognized as they did not exist in the resource grammar. \begin_inset Newline newline \end_inset S: “all”, “anyone”, “everyone”, “no one” and “someone” were defined as NP in FraCaSLex. \end_layout \end_inset \begin_inset Note Note status collapsed \begin_layout Paragraph Quantifiers \end_layout \begin_layout Itemize P: Numbers written without spaces between the digits were not recognized. \begin_inset Newline newline \end_inset S: “10”, “99”, “100”, “2500” etc. defined as Det in FraCaSLex. \end_layout \begin_layout Itemize P: Certain longer numerical expressions could not be parsed. \begin_inset Newline newline \end_inset S: “one or more”, “the other 99” and “two out of ten” were defined as Det in FraCaSLex. \end_layout \begin_layout Itemize P: Certain quantifiers were not recognized as they did not exist in the resource grammar. \begin_inset Newline newline \end_inset S: “a few”, “both”, “either”, “most of the”, “several” etc. were defined as Det in FraCaSLex. \end_layout \begin_layout Paragraph Conjunctions \end_layout \begin_layout Itemize P: Sentences starting with a conjunction could not be parsed. \begin_inset Newline newline \end_inset S: The functions SentencePAnd and SentencePBut were added in FraCaS. \end_layout \begin_layout Itemize P: Conjunctions preceded by comma or semicolon could not be parsed. \begin_inset Newline newline \end_inset S: “, and” and “; and” were defined as Conj in FraCaSLex. \end_layout \end_inset \end_layout \begin_layout Description Compounds Compound noun phrases such as \emph on \begin_inset Quotes eld \end_inset southern Europe \begin_inset Quotes erd \end_inset \emph default (adjective + proper noun), \emph on \begin_inset Quotes eld \end_inset APCOM manager \begin_inset Quotes erd \end_inset \emph default (proper noun + noun) and \emph on \begin_inset Quotes eld \end_inset university student \begin_inset Quotes erd \end_inset \emph default (noun + noun) were problematic. Partly because the Resource Grammar currently cannot handle all kinds of compounding, but mostly because many of the corresponding Swedish phrases are single compound words. In total there were 28 wulti-word compounds, divided between nouns, proper nouns and adjectives. \end_layout \begin_layout Description Time \begin_inset space ~ \end_inset and \begin_inset space ~ \end_inset Date \begin_inset space ~ \end_inset Expressions Time and date expressions were problematic for different reasons. First, although a generic multilingual time and date resource grammar is in the making, it is not finished yet. Second, different languages use different syntactic constructions for times and dates. Especially the use prepositions differ a lot: \emph on \begin_inset Quotes eld \end_inset in 1990 \begin_inset Quotes erd \end_inset \emph default , \emph on \begin_inset Quotes eld \end_inset in February \begin_inset Quotes erd \end_inset \emph default and \emph on \begin_inset Quotes eld \end_inset in two years \begin_inset Quotes erd \end_inset \emph default , are translated to Swedish as \emph on \begin_inset Quotes eld \end_inset 1990 \begin_inset Quotes erd \end_inset \emph default , \emph on \begin_inset Quotes eld \end_inset i februari \begin_inset Quotes erd \end_inset \emph default and \emph on \begin_inset Quotes eld \end_inset om två år \begin_inset Quotes erd \end_inset \emph default , respectively. For these reasons, we have defined all time and date expressions as multi-word adverbials. In total we defined 55 different time and date phrases. \end_layout \begin_layout Subsubsection Grammar Additions \end_layout \begin_layout Standard Three different grammatical constructions were added to the grammar. They consist of natural extensions to and slight modifications of existing functions. The intention is that they will be added to the resource grammar in the near future. Examples include the idiom \emph on \begin_inset Quotes eld \end_inset so do I \begin_inset Quotes erd \end_inset \emph default / \emph on \begin_inset Quotes eld \end_inset so did she \begin_inset Quotes erd \end_inset \emph default , and question adverbials such as \emph on \begin_inset Quotes eld \end_inset if Smith signed the contract, did Jones sign the contract? \begin_inset Quotes erd \end_inset \emph default . \end_layout \begin_layout Subsubsection Elliptic Phrases \end_layout \begin_layout Standard The resource grammar cannot handle all kinds of conjunctions and elliptical phrases. In the FraCaS corpus there are 35 sentences with more advanced elliptical constructions. Examples include \emph on \begin_inset Quotes eld \end_inset Bill did \family typewriter [\SpecialChar \ldots{} ] \family default too \begin_inset Quotes erd \end_inset \emph default , and \emph on \begin_inset Quotes eld \end_inset Smith saw Jones sign the contract and \family typewriter [\SpecialChar \ldots{} ] \family default his secretary make a copy \begin_inset Quotes erd \end_inset \emph default . Our solution was to introduce empty phrases, one for each grammatical category. E.g., in the first example, the ellipsis is an empty verb phrase, and the longer example contains an empty ditransitive verb. \end_layout \begin_layout Subsection Coverage \end_layout \begin_layout Standard Of the 874 unique sentences, 812 could be parsed directly with the Resource Grammar and the implemented lexicon, as shown in table \begin_inset CommandInset ref LatexCommand ref reference "tab:coverage" \end_inset . With the three additional grammatical constructions 14 more sentences were parsed. The addition of elliptical phrases increased the number of sentences by another 34. Of the 14 remaining sentences, we could parse 6 more by doing some minor reformulations, such as moving a comma or adding a preposition. \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Total \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout % of sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Unique sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 874 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 100% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Accepted by the RG \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 812 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 92.9% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - with grammar extensions \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 826 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 94.5% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - with elliptic phrases \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 860 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 98.4% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - with slight reformulation of sentence \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 866 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 99.1% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Unable to parse \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0.9% \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout The coverage of the English FraCaS grammar \begin_inset CommandInset label LatexCommand label name "tab:coverage" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Plain Layout Grammatical extensions: RelNP_nocomma, SoDoI, ExtAdvQS, ConjQS. \end_layout \begin_layout Plain Layout Note that this statistics is very strict in the sense that punctuation (in particular commas) are included and has to be incorporated by the grammar. \end_layout \begin_layout Plain Layout After having taken measures to solve the problems described in section 2.2, the parsing rate was at 84,6%. Part of these sentences could be parsed, but returned no representative trees, which gave a lower percentage of correctly parsed sentences (83,2%). There were various reasons why certain sentences could not be parsed, with various degrees of severity. The table below shows the results after changing the corpus by giving substitut ions for problematic sentences on each of these levels. The first number is the number of sentences out of 1220, while the percentage is on the next line. \end_layout \begin_layout Plain Layout These are explanations for the different levels: \end_layout \begin_layout Enumerate the original corpus with no changes. \end_layout \begin_layout Enumerate substitution for simple spelling or grammar mistakes, such as double punctuation or incorrect verb forms. The change also involved using only uncontracted negation, for the sake of conformity and simplicity. There were only a few sentences of these types, so changing them did not make a major difference to the results. \end_layout \begin_layout Enumerate rewriting of certain constructions that could not be handled by the parser. These were constructions like “the people [..] all voted...”, changed to “all the people [...] voted...”. \end_layout \begin_layout Enumerate filling of gaps in gap constructions, e.g. adding “spoken to Mary” to “Bill has”, rendering “Bill has spoken to Mary”. \end_layout \begin_layout Plain Layout \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout FraCaS version \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Parsed \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Correctly parsed \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1. original \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1032 84,6% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1015 83,2% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2. mistakes corrected; uncontracted negation \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1037 85,0% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1020 83,6% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3. reconstructions \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1040 85,2% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1026 84,1% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4. gap filling \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1045 85,7% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1043 85,5% \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout As we can see, the changes made in the corpus did not cause any major increase in the percentage of parsed sentences, and only a slightly higher increase in the percentage of correctly parsed sentences. It would take more radical changes for a more radical increase. In the following section, we will look into what those changes would concern. \end_layout \end_inset \begin_inset Note Note status collapsed \begin_layout Plain Layout The following are a few examples of tree structures resulting from parsing FraCaS sentences using this grammar. \end_layout \begin_layout Description Positive \begin_inset space ~ \end_inset declarative: \begin_inset Quotes eld \end_inset No delegate finished the report. \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout Sentence (DeclPos TPast ASimul (PredVP (DetCN (DetQuant no_Quant NumSg) (UseN delegate_N)) (ComplSlash (SlashV2a finish_V2) (DetCN (DetQuant DefArt NumSg) (UseN report_N))))) \end_layout \end_deeper \begin_layout Description Negative \begin_inset space ~ \end_inset declarative: \begin_inset Quotes eld \end_inset Bill did not speak to Mary on Monday. \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout Sentence (DeclNeg TPast ASimul (PredVP (UsePN bill_PN) (AdvVP (ComplSlash (SlashV2a speak_to_V2) (UsePN mary_PN)) on_monday_Adv))) \end_layout \end_deeper \begin_layout Description Question: \begin_inset Quotes eld \end_inset Did a Swede win a Nobel prize? \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout Sentence (Question TPast ASimul (PredVP (DetCN (DetQuant IndefArt NumSg) (UseN swede_N)) (ComplSlash (SlashV2a win_V2) (DetCN (DetQuant IndefArt NumSg) (UseN nobel_prize_N))))) \end_layout \end_deeper \begin_layout Description Clause \begin_inset space ~ \end_inset conjunction: \begin_inset Quotes eld \end_inset Smith took a machine on Tuesday, and Jones took a machine on Wednesday. \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout Sentence (DeclConj comma_and_Conj TPast ASimul (PredVP (UsePN smith_PN) (AdvVP (ComplSlash (SlashV2a take_V2) (DetCN (DetQuant IndefArt NumSg) (UseN machine_N))) on_tuesday_Adv)) (PredVP (UsePN jones_PN) (AdvVP (ComplSlash (SlashV2a take_V2) (DetCN (DetQuant IndefArt NumSg) (UseN machine_N))) on_wednesday_Adv))) \end_layout \end_deeper \begin_layout Description Sentence-initial \begin_inset space ~ \end_inset conjunction: \begin_inset Quotes eld \end_inset But only one woman. \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout SentencePBut (UttNP (PredetNP only_Predet (DetCN (DetQuant IndefArt (NumCard (NumNumeral (num (pot2as3 (pot1as2 (pot0as1 pot01))))))) (UseN woman_N)))) \end_layout \end_deeper \begin_layout Description Noun \begin_inset space ~ \end_inset phrase \begin_inset space ~ \end_inset conjunction: \begin_inset Quotes eld \end_inset John and his colleagues went to a meeting. \begin_inset Quotes erd \end_inset \end_layout \begin_deeper \begin_layout Plain Layout Sentence (DeclPos TPast ASimul (PredVP (ConjNP2 and_Conj (UsePN john_PN) (DetCN (DetQuant (PossPron he_Pron) NumPl) (UseN colleague_N))) (AdvVP (UseV go8walk_V) (PrepNP to_Prep (DetCN (DetQuant IndefArt NumSg) (UseN meeting_N)))))) \end_layout \end_deeper \end_inset \begin_inset Note Note status collapsed \begin_layout Plain Layout Three of the sentences that are encoded as synonyms have attachment ambiguities that can be encoded in the grammar. This means that they have different trees in different problems (169.1.p/170.1.p, 175.1.p/176.1.p, 244.1.p/245.1.p). But we don't count them in this statistics. \end_layout \end_inset \end_layout \begin_layout Subsection Syntactical Ambiguity \end_layout \begin_layout Standard All trees in the FraCaS treebank are implemented in the GF grammar described above. This grammar can be used by itself for parsing and analysing similar sentences. It is useful to know how ambiguous the grammar is, so we have parsed the 866 sentences that are covered by the grammar and counted the number of trees for each sentence. Table \begin_inset CommandInset ref LatexCommand ref reference "tab:ambiguity" \end_inset shows that the grammar is moderately ambiguous, where almost 70% of the sentences have less than 10 different parse trees, and over 90% have less than 100 trees. The median is for a sentence to have 5 parse trees, and the largest number of trees for a sentence is 33,048. The ambiguous sentence is: \emph on \begin_inset Quotes eld \end_inset Since APCOM bought its present office building it has been paying mortgage interest on it for more than 10 years. \begin_inset Quotes erd \end_inset \end_layout \begin_layout Standard Note that the number of parse trees are misleading for the 34 sentences with elliptic phrases, since ellipsis is linearised as \emph on \begin_inset Quotes eld \end_inset \family typewriter [\SpecialChar \ldots{} ] \family default \begin_inset Quotes erd \end_inset \emph default in the FraCaS grammar. If we had made the elliptic phrases invisible, the number of parse trees would increase dramatically. \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout No. parse trees \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout No. sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 -- 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 598 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 69.1% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 -- 99 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 203 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 23.4% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 100 -- 999 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 49 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5.7% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\geq$ \end_inset 1000 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 16 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1.8% \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Ambiguity of the FraCaS treebank \begin_inset CommandInset label LatexCommand label name "tab:ambiguity" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Subsection Problems remaining \end_layout \begin_layout Plain Layout Some problems could not be solved, due to their complexity and/or the time limitations of the project. Remaining problems are listed below, categorised according to their nature. Examples from the FraCaS corpus are given with the relevant parts italicized. For each type of problem, the number of affected sentences is given in brackets (out of the 177 sentences that were not correctly parsed). A few sentences had more than one problem, but was only counted in one category. \end_layout \begin_layout Paragraph Adverbials (46) \end_layout \begin_layout Plain Layout Certain kinds and uses of adverbials were problematic. \end_layout \begin_layout Itemize Verb phrase adverbials (1) \end_layout \begin_deeper \begin_layout Plain Layout “Every executive who had a laptop computer brought it to take notes at the meeting.” \end_layout \end_deeper \begin_layout Itemize Noun phrase adverbials (3) \end_layout \begin_deeper \begin_layout Plain Layout “It lasted 2 days.” \end_layout \begin_layout Plain Layout “Smith had been travelling the day before she arrived in Katmandu.” \end_layout \end_deeper \begin_layout Itemize Sentence-initial adverbials (34) \end_layout \begin_deeper \begin_layout Plain Layout “Since 1992 ITEL has been in Birmingham.” \end_layout \begin_layout Plain Layout “Yesterday APCOM signed the contract.” \end_layout \begin_layout Plain Layout “Then she took a taxi to the station.” \end_layout \begin_layout Plain Layout “Two years from now Smith will have been to Florence at least four times.” \end_layout \end_deeper \begin_layout Itemize To this group also belong sentence-initial subordinate clauses. (Subordinate clauses following the main clause are treated as adverbials, so it is only natural to treat subordinate clauses preceding the main clause as adverbials too.) \end_layout \begin_deeper \begin_layout Plain Layout “If Smith and Anderson did not sign the contract, Jones signed the contract.” \end_layout \begin_layout Plain Layout “When Smith arrived in Katmandu she had been travelling for three days.” \end_layout \begin_layout Plain Layout “Before APCOM bought its present office building, it had been paying mortgage interest [...].” \end_layout \end_deeper \begin_layout Itemize Adverbials with copula (8) \end_layout \begin_deeper \begin_layout Plain Layout “It is now 1996.” \end_layout \begin_layout Plain Layout “Today is Saturday, July 14th.” \end_layout \end_deeper \begin_layout Paragraph Verb phrase conjunctions (5) \end_layout \begin_layout Plain Layout The grammar could handle conjunction on the noun phrase and clause level, but not verb phrase conjunctions. \end_layout \begin_layout Plain Layout “ICM is one of the companies and owns 150 computers.” \end_layout \begin_layout Plain Layout “She took a taxi to the station and caught the first train to Luxembourg.” \end_layout \begin_layout Plain Layout “Jones graduated in March and has been employed ever since.” \end_layout \begin_layout Paragraph Auxiliary verbs (17) \end_layout \begin_layout Plain Layout Auxiliary verbs used independently could not be parsed. \end_layout \begin_layout Plain Layout “John wanted to buy a car, and he did.” \end_layout \begin_layout Plain Layout “Bill spoke to everyone that John did.” \end_layout \begin_layout Plain Layout “She finished before he did.” \end_layout \begin_layout Paragraph Complex comparisons (23) \end_layout \begin_layout Plain Layout Simple comparatives worked well, but not comparatives embedded in a noun phrase or other complex comparisons. \end_layout \begin_layout Plain Layout “John is a fatter politician than Bill.” \end_layout \begin_layout Plain Layout “ITEL won more orders than APCOM lost.” \end_layout \begin_layout Plain Layout “ITEL sold 3000 more computers than APCOM.” \end_layout \begin_layout Plain Layout “APCOM has a more important customer than ITEL.” \end_layout \begin_layout Plain Layout “Mary's story lasted as long as Jones's updating the program.” \end_layout \begin_layout Paragraph Relative clauses (11) \end_layout \begin_layout Plain Layout Some relative clauses could not be parsed or parsed correctly. \end_layout \begin_layout Itemize -- Relative clauses using present participle (1) \end_layout \begin_deeper \begin_layout Plain Layout “No one gambling seriously stops until he is broke.” \end_layout \end_deeper \begin_layout Itemize -- Relative clauses modifying a pronoun (8) \end_layout \begin_deeper \begin_layout Plain Layout “No one who starts gambling seriously stops until he is broke.” \end_layout \begin_layout Plain Layout “Everyone who starts gambling seriously continues until he is broke.” \end_layout \begin_layout Plain Layout “Nobody who is asleep ever knows that he is asleep.” \end_layout \end_deeper \begin_layout Itemize -- Relative clauses with object gap (2) \end_layout \begin_deeper \begin_layout Plain Layout “There is a representative that Smith wrote to every week.” \end_layout \end_deeper \begin_layout Paragraph Complement infinitive clauses (17) \end_layout \begin_layout Plain Layout The verb “see” as in “see someone do something”, defined as V2V, does not work. It requires an infinitive marker, which should not be present in this case. \end_layout \begin_layout Plain Layout “Smith saw Jones sign the contract.” \end_layout \begin_layout Plain Layout “Smith saw Jones' heart beat.” \end_layout \begin_layout Paragraph Other (58) \end_layout \begin_layout Plain Layout Apart from the problems in the categories above, there are other problems that are harder to classify. Some of these could have been solved, had time permitted, while others are of a more intricate type. Each problem is exemplified by one sentence from the FraCaS corpus. \end_layout \begin_layout Plain Layout “Mary represents her own company.” (15) \end_layout \begin_layout Plain Layout “APCOM sold exactly 2500 computers.” (1) \end_layout \begin_layout Plain Layout “Smith spent two hours writing the report.” (12) \end_layout \begin_layout Plain Layout “No representative took less than half a day to read the report.” (1) \end_layout \begin_layout Plain Layout “The conference was over on July 8th, 1994.” (2) \end_layout \begin_layout Plain Layout “Bill owns a blue one.” (6) \end_layout \begin_layout Plain Layout “That is, there was one lawyer who signed all the reports.” (1) \end_layout \begin_layout Plain Layout “Bill is going to speak to Mary.” (1) \end_layout \begin_layout Plain Layout “It is the case that Jones is not and will never be allowed to write his memoirs.” (4) \end_layout \begin_layout Plain Layout “It took the representatives more than a week to read the report.” (2) \end_layout \begin_layout Plain Layout “Smith represents his company and so does Jones.” (13) \end_layout \begin_layout Subsection Tree selection \end_layout \begin_layout Plain Layout When having parsed the whole corpus, a selection had to be made for each sentence to be represented by the most adequate tree structure. Most of the time there was a clear choice, while at other times, two trees were kept since it was not clear which one was the most suitable representation of the sentence. This was especially common for sentences using a copula with an indefinite noun phrase as complement. In these cases, both the tree with the indefinite article represented and the one without were kept. \end_layout \end_inset \end_layout \begin_layout Section The Swedish Corpus \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Subsection Modules \end_layout \begin_layout Plain Layout In order to build the Swedish version of the FraCaS corpus, two modules were written, one lexicon module and one grammar module. \end_layout \begin_layout Subsubsection Lexicon module \end_layout \begin_layout Plain Layout FraCaSLexSwe is the Swedish concrete lexicon. It was built in a very similar way to the English counterpart, using the functions mkN, mkA, mkV etc, mainly from the Paradigms module. \end_layout \begin_layout Subsubsection Grammar module \end_layout \begin_layout Plain Layout FraCaSSwe is the Swedish concrete grammar. Just as for the English counterpart, parts of the Grammar module (namely Noun, Verb, Adjective, Adverb, Numeral and Tense) were imported, while other parts were opened and necessary functions used in FraCaSSwe. \end_layout \end_inset \begin_inset Note Note status collapsed \begin_layout Plain Layout Some of the FraCaS sentences depend on lexical ambiguity that cannot be expressed adequately in Swedish. \end_layout \end_inset \end_layout \begin_layout Standard A long-term goal of this project is that the treebank should be truly multilingu al for all the languages in the GF resource grammar. Of course this is not possible in the general case, since some of the sentences cannot even be translated without changing their semantic content. But at least we can try to create a multlingual treebank of as many sentences as possible. \end_layout \begin_layout Standard As a first step we have created Swedish translations of the sentences, by writing a new Swedish lexicon. Then we evaluated the translations and iteratively made changes to the trees to make the translations better. Note that since we use exactly the same syntax trees for the Swedish and English sentences, we had to make sure that the English translation was not changed when we modified the trees. \end_layout \begin_layout Standard This means the corpus was not created by manually translating the English sentences, but instead we translated the lexicon and let the Swedish Resource Grammar take care of the syntactical translation. Currently, out of the 866 sentences in the treebank, 748 are translated into grammatically correct and comprehensible Swedish sentences. \end_layout \begin_layout Subsection The Swedish Lexicon \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Plain Layout When creating the Swedish lexicon \end_layout \begin_layout Plain Layout As was the case for the parsing part of the project, certain problems were also discovered in the process of generating into Swedish. Often these problems had to be solved by going back to the English lexicon and making changes so that more suitable, often more general, trees would be constructed. This is where the two project parts were interwoven. \end_layout \begin_layout Plain Layout Some of the problems could be solved and some remain. The solutions are presented in this section, while remaining problems are listed in the next section on statistics (3.3). \end_layout \begin_layout Plain Layout The problems encountered have been divided into categories as seen below. The explanations follow P (Problem) and S (Solution). FraCaSLex here refers to both the abstract lexicon and the two concrete lexicons (FraCaSLexEng and FraCaSLexSwe). In the same way, FraCaS refers to both the abstract grammar and the two concrete grammars (FraCaSEng and FraCaSSwe). \end_layout \end_inset \end_layout \begin_layout Standard When we created the Swedish lexicon, we often had to go back to the English lexicon and make changes so that more suitable trees could be constructed. Sometimes we merged several lexical entries into one multi-word entry, and sometimes we split one entry into different meanings. Most of the changes consisted of the following types: \end_layout \begin_layout Description Compounds Many compound noun phrases, such as \emph on “company car” \emph default , \emph on “mortgage interest” \emph default and \emph on \begin_inset Quotes eld \end_inset APCOM manager \begin_inset Quotes erd \end_inset \emph default , are single words in Swedish ( \emph on \begin_inset Quotes eld \end_inset tjänstebil \begin_inset Quotes erd \end_inset \emph default , \emph on \begin_inset Quotes eld \end_inset hypoteksränta \begin_inset Quotes erd \end_inset \emph default and \emph on \begin_inset Quotes eld \end_inset APCOM-direktör \begin_inset Quotes erd \end_inset \emph default , respectively). We solved this by defining them as multi-word nouns, as described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Multi-word-Lexical-Items" \end_inset . \end_layout \begin_layout Description Lexical \begin_inset space ~ \end_inset ambiguity Several words in English are translated into different Swedish words, depending on the context. Such words were split into different lexical entries. The adjective \emph on “poor” \emph default , for example, was handled by creating two different functions, one with the meaning \emph on \begin_inset Quotes eld \end_inset not good \begin_inset Quotes erd \end_inset \emph default (Swedish \emph on \begin_inset Quotes eld \end_inset dålig \begin_inset Quotes erd \end_inset \emph default ), and one with the meaning \emph on \begin_inset Quotes eld \end_inset not rich \begin_inset Quotes erd \end_inset \emph default (Swedish \emph on \begin_inset Quotes eld \end_inset fattig \begin_inset Quotes erd \end_inset \emph default ). \end_layout \begin_layout Description Prepositions Prepositions are often translated differently in different contexts. E.g., \emph on \begin_inset Quotes eld \end_inset inhabitant of \begin_inset Quotes erd \end_inset \emph default is translated to \emph on \begin_inset Quotes eld \end_inset invånare i \begin_inset Quotes erd \end_inset \emph default if the argument is a country or a town, but to \emph on \begin_inset Quotes eld \end_inset invånare på \begin_inset Quotes erd \end_inset \emph default if the argument is an island. This was solved, either by creating different lexical entries, or by making the preposition a part of the main verb. \end_layout \begin_layout Description Adverbials Most of the multi-word adverbials are time and date expressions. The reason for this is that many time and date expressions are translated very differently between different languages. E.g., the English preposition \emph on \begin_inset Quotes eld \end_inset in \begin_inset Quotes erd \end_inset \emph default is translated differently for different time and date expressions: \emph on \begin_inset Quotes eld \end_inset in March \begin_inset Quotes erd \end_inset \emph default becomes \emph on \begin_inset Quotes eld \end_inset i mars \begin_inset Quotes erd \end_inset \emph default and \emph on \begin_inset Quotes eld \end_inset in a month \begin_inset Quotes erd \end_inset \emph default translates to \emph on \begin_inset Quotes eld \end_inset om en månad \begin_inset Quotes erd \end_inset \emph default , whereas \emph on “in 1994” \emph default is best formulated as the bare word \emph on \begin_inset Quotes eld \end_inset 1994 \begin_inset Quotes erd \end_inset \emph default in Swedish. As already explained, we defined all time and date expressions as multi-word adverbials. \end_layout \begin_layout Subsection Coverage \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Total \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout % of sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Sentences in treebank \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 866 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 100% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Correct Swedish translation \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 748 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 86.4% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Problematic sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 118 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 13.6% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset -- idioms \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 31 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3.6% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset -- agreement \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 24 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2.8% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset -- future tense \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1.4% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset -- elliptical \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 19 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2.2% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset -- uncomprehensible \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 32 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3.7% \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout The coverage of the Swedish FraCaS grammar \begin_inset CommandInset label LatexCommand label name "tab:swedish-coverage" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard Table \begin_inset CommandInset ref LatexCommand ref reference "tab:swedish-coverage" \end_inset gives an overview of the coverage of the Swedish lexicon and grammar. Of the 866 unique sentences in the treebank, we consider 748 to have good Swedish translations. The remaining 118 sentences had some problems which we divided into five different classes -- idioms, agreement, future tense, elliptical phrases, and more difficult errors. Table \begin_inset CommandInset ref LatexCommand ref reference "tab:swedish-problems" \end_inset gives examples of some of the encountered problems, and in the next section are short descriptions. \end_layout \begin_layout Standard \begin_inset Float table wide false sideways false status open \begin_layout Plain Layout \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout English original \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Direct translation \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Better idiom \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Literally in English \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold idioms \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X is likely to Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X \series bold är trolig \series default att Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on det är troligt \series default att X Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on it is likely that X Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on members of the committee \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on medlemmar av \series default kommittén \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on kommitté \series bold medlemmar \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on committee-members \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X is asleep \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X \series bold är sovande \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X \series bold sover \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X sleeps \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on the previous one \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on den förra \series bold en \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on den förra \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on the previous \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold agreement \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X has the right to Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X har \series bold rätten \series default att Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X har \series bold rätt \series default att Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X has right to Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on traffic increased \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on trafik \series default ökade \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on trafiken \series default ökade \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on the traffic increased \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on one of the tenors \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on ett \series default av tenorerna \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on en \series default av tenorerna \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on --- \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on everyone continues until he is broke \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on alla fortsätter tills \series bold han \series default är pank \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on alla fortsätter tills \series bold de \series default är panka \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on all continue until they are broke \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on clients at the demonstration \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on klienter \series default på presentationen \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \emph on klienterna \series default på presentationen \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on the clients at the demonstration \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold future tense \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X will make a poor stock market trader \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X \series bold ska \series default bli en dålig aktiehandlare \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X \series bold kommer att \series default bli en dålig aktiehandlare \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on --- \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold elliptical phrases \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X wanted to buy a car, and he did \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X ville köpa en bil, och han gjorde \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X ville köpa en bil, och han gjorde \series bold det \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X wanted to buy a car, and he did it \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X did too \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X gjorde också \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X gjorde \series bold det \series default också \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X did it too \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold more difficult \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X took less than half a day to Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X tog mindre än en halv dag att Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \emph on X tog mindre än en halv dag \series bold på sig för \series default att Y \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout --- \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Examples of encountered problems with the Swedish translation \begin_inset CommandInset label LatexCommand label name "tab:swedish-problems" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Subsubsection Types of translation problems \end_layout \begin_layout Description Idioms We encountered 10 problematic idioms in 31 sentences, where the direct translation of a phrase is not the most natural, but instead we should use a different syntactical construction. \end_layout \begin_layout Description Agreement There were 7 different noun phrase agreement problems in 24 of the sentences, where the Swedish translation would be more natural if we could change the number, definiteness or gender of the noun phrase. \end_layout \begin_layout Description Future \begin_inset space ~ \end_inset tense Swedish future tense takes two different forms, either \emph on \begin_inset Quotes eld \end_inset ska \begin_inset Quotes erd \end_inset \emph default or \emph on \begin_inset Quotes eld \end_inset kommer att \begin_inset Quotes erd \end_inset \emph default . The resource grammar defaults to \emph on \begin_inset Quotes eld \end_inset ska \begin_inset Quotes erd \end_inset \emph default , but \emph on \begin_inset Quotes eld \end_inset kommer att \begin_inset Quotes erd \end_inset \emph default is the more natural translation for all 12 FraCaS sentences using future tense. This is the case for 12 sentences, one example is \emph on \begin_inset Quotes eld \end_inset Bill will talk to Mary \begin_inset Quotes erd \end_inset \emph default , which should be translated to \emph on \begin_inset Quotes eld \end_inset Bill kommer att prata med Mary \begin_inset Quotes erd \end_inset \emph default . \end_layout \begin_layout Description Elliptical \begin_inset space ~ \end_inset phrases 19 sentences has problems with elliptical phrases in Swedish. 15 of them has to do with the auxiliary verb \emph on \begin_inset Quotes eld \end_inset do/does/did \begin_inset Quotes erd \end_inset \emph default , which sounds very awkward when it is translated to the Swedish verb \emph on \begin_inset Quotes eld \end_inset gör/gjorde \begin_inset Quotes erd \end_inset \emph default . E.g., \emph on \begin_inset Quotes eld \end_inset Bill did too \begin_inset Quotes erd \end_inset \emph default is translated as \emph on \begin_inset Quotes eld \end_inset Bill gjorde också \begin_inset Quotes erd \end_inset \emph default . In Swedish we also need an object \emph on \begin_inset Quotes eld \end_inset det \begin_inset Quotes erd \end_inset \emph default (lit. \emph on \begin_inset Quotes eld \end_inset it \begin_inset Quotes erd \end_inset \emph default ), so a better translation is \emph on \begin_inset Quotes eld \end_inset Bill gjorde det också \begin_inset Quotes erd \end_inset \emph default (lit. \emph on \begin_inset Quotes eld \end_inset Bill did it too \begin_inset Quotes erd \end_inset \emph default ). The remaining four problematic elliptical sentences are more difficult to analyse. \end_layout \begin_layout Description Serious 32 of the sentences had more serious problems in Swedish. Some of them did not translate at all, since one of the grammatical constructio ns had not been implemented for Swedish yet. Others translated, but with a very strange word order or inflection, since the corresponding grammatical construction did not function as expected. \end_layout \begin_layout Standard All in all, out of the 118 problematic Swedish sentences we believe than more than two thirds of them should be possible to correct without too much trouble. \end_layout \begin_layout Standard \begin_inset Note Note status collapsed \begin_layout Paragraph Idioms \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset in business \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset i affärsverksamhet \begin_inset Quotes erd \end_inset ? (3) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset Bill is likely to [..] \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset är sannolik/trolig att \begin_inset Quotes erd \end_inset ? [bättre: \begin_inset Quotes eld \end_inset det är troligt att Bill [..] \begin_inset Quotes erd \end_inset ] (2) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset Mary is female \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset Mary är kvinnlig \begin_inset Quotes erd \end_inset ? [bättre: \begin_inset Quotes eld \end_inset Mary är kvinna \begin_inset Quotes erd \end_inset ] (2) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset members of the committee \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset medlemmar av kommittén \begin_inset Quotes erd \end_inset [bättre: \begin_inset Quotes eld \end_inset kommittémedlem \begin_inset Quotes erd \end_inset ] (2) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset had his paper accepted \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset hade sin uppsats godkänd \begin_inset Quotes erd \end_inset [bättre: \begin_inset Quotes eld \end_inset fick sin uppsats godkänd \begin_inset Quotes erd \end_inset ] (3) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset made a loss \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset gjorde en förlust \begin_inset Quotes erd \end_inset [bättre: \begin_inset Quotes eld \end_inset gick med förlust \begin_inset Quotes erd \end_inset ] (4) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset a chain of businesses \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset en kedja av affärsverksamheter \begin_inset Quotes erd \end_inset [bättre: \begin_inset Quotes eld \end_inset en affärskedja \begin_inset Quotes erd \end_inset ] (7) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset be sleeping \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset är sovande \begin_inset Quotes erd \end_inset [bättre: \begin_inset Quotes eld \end_inset sover \begin_inset Quotes erd \end_inset ] (4) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset no one stops until \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset eveyone continues until \begin_inset Quotes erd \end_inset => [ \begin_inset Quotes eld \end_inset ingen slutar förrän \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset alla fortsätter tills \begin_inset Quotes erd \end_inset ] \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset a blue one \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset en blå en \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset en blå \begin_inset Quotes erd \end_inset (3) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset the previous one \begin_inset Quotes erd \end_inset => ?? / \begin_inset Quotes eld \end_inset den förra \begin_inset Quotes erd \end_inset (1) \end_layout \begin_layout Plain Layout \series bold OK \series default : \begin_inset Quotes eld \end_inset comes cheap \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset fås billigt \begin_inset Quotes erd \end_inset ? [bättre: \begin_inset Quotes eld \end_inset är billig \begin_inset Quotes erd \end_inset ] (3) \end_layout \begin_layout Plain Layout \series bold OK \series default : (group_N2) \begin_inset Quotes eld \end_inset a group of people \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset en grupp av människor \begin_inset Quotes erd \end_inset [ \begin_inset Quotes eld \end_inset en grupp människor \begin_inset Quotes erd \end_inset ] (2) \end_layout \begin_layout Paragraph OK: Passive form \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset was blamed \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset blev beskyllda \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset beskylldes \begin_inset Quotes erd \end_inset (3) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset was used \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset blev använd \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset användes \begin_inset Quotes erd \end_inset (2) \end_layout \begin_layout Paragraph Agreement \end_layout \begin_layout Plain Layout 16 of these contained variations of the definite noun phrase \begin_inset Quotes eld \end_inset \emph on the right \begin_inset Quotes erd \end_inset \emph default (used in the context \emph on \begin_inset Quotes eld \end_inset \emph default X \emph on has the right to live in \emph default Y \emph on \begin_inset Quotes erd \end_inset \emph default ), which is translated to \begin_inset Quotes eld \end_inset \emph on rätten \begin_inset Quotes erd \end_inset \emph default . But in Swedish it sounds more natural to say \emph on \begin_inset Quotes eld \end_inset rätt \begin_inset Quotes erd \end_inset \emph default (lit. \emph on \begin_inset Quotes eld \end_inset right \begin_inset Quotes erd \end_inset \emph default ), at least in this context. In other cases, English indefinite noun phrases are better translated to definite form, such as \emph on \begin_inset Quotes eld \end_inset traffic \begin_inset Quotes erd \end_inset \emph default which should translate to \emph on \begin_inset Quotes eld \end_inset trafiken \begin_inset Quotes erd \end_inset \emph default (lit. \emph on \begin_inset Quotes eld \end_inset the traffic \begin_inset Quotes erd \end_inset \emph default ). Another example is gender problems, since Swedish has two genders, such as \emph on \begin_inset Quotes eld \end_inset one of the tenors \begin_inset Quotes erd \end_inset \emph default where the gender of \emph on \begin_inset Quotes eld \end_inset one \begin_inset Quotes erd \end_inset \emph default should depend on the gender of \emph on \begin_inset Quotes eld \end_inset tenor \begin_inset Quotes erd \end_inset \emph default . Problems with number were mostly due to the singular pronoun \emph on \begin_inset Quotes eld \end_inset everyone \begin_inset Quotes erd \end_inset \emph default which was translated to the plural pronoun \emph on \begin_inset Quotes eld \end_inset alla \begin_inset Quotes erd \end_inset \emph default . \end_layout \begin_layout Paragraph Agreement examples \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset one of the tenors \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset ett av tenorerna \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset en av tenorerna \begin_inset Quotes erd \end_inset (1) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset everyone continues until he is broke \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset alla fortsätter tills han är pank \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset \SpecialChar \ldots{} tills de är panka \begin_inset Quotes erd \end_inset (1) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset clients at the demonstration \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset klienter på presentationen \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset klienterna \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset (2) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset traffic increased \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset trafik ökade \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset trafiken ökade \begin_inset Quotes erd \end_inset (1) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset is the chairman of ITEL \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset är ordföranden för ITEL \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset ordförande \begin_inset Quotes erd \end_inset (1) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset every customer who owns a computer has a service contract for it \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset varje kund som äger en dator har ett servicekontrakt för det \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset \SpecialChar \ldots{} för den \begin_inset Quotes erd \end_inset (2) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset the right to \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset rätten att \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset rätt att \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset (16) \end_layout \begin_layout Paragraph OK: (ta bort ProgrVP på svenska) Progressive \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset Smith was writing a report \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset Smith höll på att skriva en rapport \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset skrev en rapport \begin_inset Quotes erd \end_inset (24) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset APCOM has been paying mortgage \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset APCOM har hållit på att betala hypoteksränta \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset betalat \begin_inset Quotes erd \end_inset \end_layout \begin_layout Paragraph Reflexive pronouns \end_layout \begin_layout Plain Layout \series bold OK \series default : (lägg till refl_Pron) \begin_inset Quotes eld \end_inset his/her/their \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset hans/hennes/deras \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset sin \begin_inset Quotes erd \end_inset / \begin_inset Quotes erd \end_inset sitt \begin_inset Quotes erd \end_inset / \begin_inset Quotes erd \end_inset sina \begin_inset Quotes erd \end_inset (~30) \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset himself \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset sig \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset sig själv \begin_inset Quotes erd \end_inset (but not always) (1) \end_layout \begin_layout Paragraph Uncomprehensible \end_layout \begin_layout Plain Layout prepositions/subjunctions: 2 \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset twice as many than \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset dubbelt så många än \SpecialChar \ldots{} \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset som \begin_inset Quotes erd \end_inset \end_layout \begin_layout Plain Layout \begin_inset Quotes eld \end_inset Bill suggested to Frank's boss that \SpecialChar \ldots{} , and Carl to Alan's wife \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset Bill föreslog för Franks chef att \SpecialChar \ldots{} , och Carl till Alans fru \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset för Alans fru \begin_inset Quotes erd \end_inset \end_layout \begin_layout Plain Layout \series bold OK \series default : (arrive_in_V2) \begin_inset Quotes eld \end_inset arrived in Katmandu \begin_inset Quotes erd \end_inset => \begin_inset Quotes eld \end_inset anlände i Katmandu \begin_inset Quotes erd \end_inset / \begin_inset Quotes eld \end_inset till \begin_inset Quotes erd \end_inset (2) \end_layout \begin_layout Plain Layout Uncomprehensible/difficult to fix: 6 \end_layout \begin_layout Plain Layout No linearisation: 24 \end_layout \begin_layout Plain Layout \begin_inset Note Note status collapsed \begin_layout Subsection Statistics \end_layout \begin_layout Plain Layout Out of 1220 original sentences, 1043 could eventually be correctly parsed and their tree representations be used for generating the equivalent Swedish sentences. Also, the changes listed in section 3.2 were performed, resulting in better linearizations. The generated Swedish sentences were checked for accuracy and divided into a few different groups. The number of sentences in each group is given in the left-most column. Descriptions and examples for each group are given on the right and can be viewed as a list of remaining problems to be solved. \end_layout \begin_layout Plain Layout \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \end_inset \end_layout \begin_layout Plain Layout \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout unique sentences \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 874 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout (som förut) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 599 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout (skiljer sig) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 89 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout (hade inte förut) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 150 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout no linearisation \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 36 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \end_inset \end_layout \begin_layout Paragraph Number Type Description Result Desired result \end_layout \begin_layout Itemize 811 correct & natural \end_layout \begin_layout Itemize 120 considered correct but could be more natural \end_layout \begin_deeper \begin_layout Itemize “each” / “every”: “varje europé” “alla européer” \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout proper inclusion -- indefinite article: “Mary är en student” “Mary är student” \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout infinitive marker desired: “John sade Bill hade skadat sig” “John sade att Bill hade skadat sig” \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout infinitive marker not desired: “lyckades att vinna” “lyckades vinna” \end_layout \end_inset \end_layout \begin_layout Itemize passive constructions: “blev använd” “användes” \end_layout \begin_layout Itemize gender of pronoun referring to previous sentence: “Bill äger ett också” (referring to “bil”) “Bill äger en också” \end_layout \begin_layout Itemize definite form: “ordföranden för” “ordförande för” \end_layout \begin_layout Itemize meaning of “female”: “Mary är kvinnlig” “Mary är kvinna” \end_layout \end_deeper \begin_layout Itemize 28 requiring changes in the FraCaS lexicon \end_layout \begin_deeper \begin_layout Itemize “of” constructions: \end_layout \begin_deeper \begin_layout Itemize “medlemmar av kommittén” “medlemmar i kommittén” \end_layout \begin_layout Itemize “kedja av affärsverksamhet” “affärskedja” \end_layout \begin_layout Itemize “grupp av människor” “grupp människor” \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout “alla av dem” “alla” / “allihop” \end_layout \end_inset \end_layout \end_deeper \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout translation of “should”: “föreslog [...] att de borde” “föreslog [...] att de skulle” \end_layout \end_inset \end_layout \begin_layout Itemize translation of “make a loss”: “gjorde en förlust” “gick med förlust” \end_layout \begin_layout Itemize translation of “have been to”: “har varit till” “har varit i” \end_layout \begin_layout Itemize translation of “be asleep”: “har varit sovande” “har sovit” \end_layout \end_deeper \begin_layout Itemize 30 requiring changes in the English and/or Swedish general grammar(s) \end_layout \begin_deeper \begin_layout Itemize gender: “ett av de ledande tenorerna” “en av de ledande tenorerna” \end_layout \begin_layout Itemize translation of “come cheap”: “fås billigt” “vara billig (att anlita)” \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout “both” with adjective -- definite article: “båda ledande tenorerna” “båda de ledande tenorerna” \end_layout \end_inset \end_layout \begin_layout Itemize “will” -- difference in modality: “ska bli” “kommer att bli” (sometimes) \end_layout \begin_layout Itemize AdV position of “also”: “hon gav också dem en faktura” “hon gav dem också en faktura” \end_layout \begin_layout Itemize translation of “awarded himself”: “tilldelade sig” “tilldelade sig själv” \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout translation of “used to be”: “brukade att vara” e.g. “var tidigare” \end_layout \end_inset \end_layout \end_deeper \begin_layout Itemize 54 difficult to correct \end_layout \begin_deeper \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout “were blamed for” (non-human subject): “blev anklagade för” [difficult to find Swedish equivalent] \end_layout \end_inset \end_layout \begin_layout Itemize reflexive possessive: “skrev hans första roman” “skrev sin första roman” \end_layout \begin_layout Itemize progressive aspect: “höll på att” (sometimes meaning “nearly”) [difficult to find Swedish equivalent] \end_layout \begin_layout Itemize singular / plural: “alla italienska män vill vara en framstående tenor” “alla italienska män vill vara framstående tenorer” \end_layout \begin_layout Itemize “be likely to”: “Smith är sannolik att bli” “det är sannolikt att Smith blir” \end_layout \begin_layout Itemize \begin_inset Note Note status open \begin_layout Plain Layout “some”: “snabbare än någon ITEL-dator” “snabbare än någon viss ITEL-dator” \end_layout \end_inset \end_layout \begin_layout Itemize “lose one's temper”: “Smith förlorade hans humör” “Smith tappade humöret” \end_layout \begin_layout Itemize “have something accepted”: “John hade hans uppsats godkänd” “John fick sin uppsats godkänd” \end_layout \end_deeper \end_inset \end_layout \end_inset \end_layout \begin_layout Section Discussion \end_layout \begin_layout Standard The FraCaS treebank was a small project financed by the Centre for Language Technology (CLT) at the University of Gothenburg. The project used less than three person months to create a treebank for the FraCaS test suite, together with a bilingual GF grammar for the trees. The coverage of the English grammar is 95--99%, depending on whether you include elliptic phrases or not. The Swedish grammar is not as developed yet and has a coverage of 86% of the FraCaS sentences. \end_layout \begin_layout Standard The treebank is released under an open-source license, and can be downloaded as a part of the Gothenburg CLT Toolkit: \end_layout \begin_layout Standard \noindent \align center \family sans \begin_inset CommandInset href LatexCommand href target "http://www.clt.gu.se/clt-toolkit" \end_inset \end_layout \begin_layout Subsection Implications for the FraCaS Test Suite \end_layout \begin_layout Standard From the corpus point of view, the FraCaS test suite is not very interesting. It is a small corpus (less than 1000 sentences), with non-natural, made up sentences. Furthermore it uses a fairly standard syntax and is monolingual. \end_layout \begin_layout Standard However, the main value of FraCaS is as a resource for testing semantic inference algorithms \begin_inset CommandInset citation LatexCommand citep key "MacCartneyManning2007:Natural-logic-for-textual,MacCartneyManning2008:Modeling-semantic-containment" \end_inset . This project adds syntactic structures to the test sentences, which we hope can be beneficial since the semantics of a sentence has a close dependence on syntax. \end_layout \begin_layout Standard Furthermore, we have added a new language to the test set, albeit not perfect yet. And since we are using the multilingual GF resource grammar, more languages should be relatively easy to add. \end_layout \begin_layout Subsection Implications for GF \end_layout \begin_layout Standard The making of this treebank has been a strees test, both for GF and for the resource grammar. The main work in this project has been by a person who is an experienced computational linguist, but had never used GF before. This means that the project has been a test of how easy it is to learn and start using GF and its resource grammar. Furthermore, the project was a test of the coverage of the existing grammatical constructions in the resource grammar. \end_layout \begin_layout Subsection Future Work \end_layout \begin_layout Standard There are several remaining problems and interesting extension possible with the FraCaS treebank; the following are some examples: \end_layout \begin_layout Itemize First and most important is to get most of the remaining Swedish sentences to work, by factoring out idioms and other constructions from the treebank and put them in the grammars instead. \end_layout \begin_layout Itemize A good treatment of elliptical phrases, by implementing more coordination constructions in the resource grammar. \end_layout \begin_layout Itemize We would like to add new languages from the resource grammar to the multilingual FraCaS grammar. Hopefully this will also benefit the existing two languages, by requiring us to abstract away from language-specific details, thus making the grammar more abstract. \end_layout \begin_layout Itemize A long-term goal would be to make the treebank and the associated grammar more \begin_inset Quotes eld \end_inset semantic \begin_inset Quotes erd \end_inset by factoring out even more syntactic constructions and put them in a semantic resource grammar. That it is possible to formulate classic Montague semantics in GF has already been shown \begin_inset CommandInset citation LatexCommand citep key "Ranta2001:Computational-Semantics" \end_inset , but here we need to handle many more semantic and pragmatic phenomena. \end_layout \begin_layout Standard \begin_inset Note Note status open \begin_layout Subsection Related work \end_layout \begin_layout Plain Layout Converting the Penn Treebank to GF, Swedish Talbanken to GF \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset bibtex LatexCommand bibtex bibfiles "FraCaSBank" options "apalike" \end_inset \end_layout \end_body \end_document