HFST transducers based on FinnWordNet dictionary data ===================================================== This package contains various HFST transducers based on FinnWordNet lexical data. The transducers can be used as (inflecting) Finnish or English thesauri or translation dictionaries. FinnWordNet ----------- FinnWordNet is a wordnet for Finnish. It was created by having professional translators translate the word senses of the Princeton WordNet (PWN) 3.0 into Finnish and by combining the translations with the PWN structure. FinnWordNet is a part of the FIN-CLARIN project: http://www.ling.helsinki.fi/finclarin/ For more information about FinnWordNet, please visit the FinnWordNet project Web page http://www.ling.helsinki.fi/en/lt/research/finnwordnet/ HFST – Helsinki Finite-State Transducer Technology -------------------------------------------------- For more information about HFST, please see the project Web page http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ The FinnWordNet transducers use the HFST optimized lookup format: https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstOptimizedLookupFormat The transducer files have the suffix .hfstol. Using them requires either the HFST library and tools (version 3.2.0 or later) or the standalone HFST optimized lookup program with which they can be run (applied): http://sourceforge.net/projects/hfst/files/optimized-lookup/ The transducers require version 1.3 or later of the standalone optimized lookup or the Java implementation (hfst-ol.jar as of 2011-05-23); the do not work with the Python implementation of 2011-05-24. FinnWordNet transducer packages ------------------------------- The FinnWordNet transducers are divided into three packages, each with a few slightly different transducers (YYYYMMDD denotes the release date of the package): fiwnsyn-fi-YYYYMMDD.zip – Finnish thesauri fiwnsyn-en-YYYYMMDD.zip - English thesauri (based on the Princeton WordNet) fiwntransl-YYYYMMDD.zip - Finnish–English and English–Finnish translation dictionaries. This README file is common to all the packages. The names of the thesaurus transducer files have the form fiwnsyn-LG-TYPE.hfstol, where LG is the language code ‘fi’ or ‘en’ and TYPE is one of the following: infl – The transducer recognizes inflected forms of the input word and generates synonyms with the same form. Multi-word synonyms are not recognized nor generated. A word is not considered its own synonym. infl-refl – The same as above but synonymy is reflexive: a word is considered its own synonym. This makes it possible to generate alternative forms of the input word, such as ‘indices’ and ‘indexes’. noinfl - The transducer recognizes inflected forms of the input word but generates synonyms in their base form. Multi-word synonyms are recognized and generated for English and generated for Finnish. A word is not considered its own synonym. noinfl-refl – The same as above but synonymy is reflexive. The names of the translation dictionary transducer files are fiwntransl-fien.hfstol for the Finnish–English dictionary and fiwntransl-enfi.hfstol for the English–Finnish one. They recognize inflected forms of the input word but generate the base form of the translation. The English–Finnish dictionary recognizes and generates multi-word translations, whereas the Finnish–English one only generates them. Sources ------- In addition to the FinnWordNet and Princeton WordNet data, the transducers have been constructed using the Omorfi open morphology tool for Finnish (http://gna.org/projects/omorfi) and the HFST English morphology (http://sourceforge.net/projects/hfst/files/morphological-transducers/hfst-english.tar.gz/download), originally by Måns Hulden, based on Princeton WordNet data. Deficiencies ------------ * Multi-word expressions are handled somewhat inconsistently. * The Finnish thesauri, in particular the inflecting ones, often generate many identical output words. * The inflecting English thesaurus overgenerates some word forms, such as an incorrect double plural genitive (‘nets’s’) in addition to the correct one (‘nets’’). * The non-inflecting English thesauri and the English–Finnish dictionary recognize inflection in the last word of a multi-word expression, even if it would be correct to inflect a preceding word. For example, they recognize ‘arrive ated’ but not the correct ‘arrived at’. * All the synonyms or translations of an ambiguous or polysemous word form are listed together, without any sorting or grouping according to the part of speech or word sense. Licence ------- Since FinnWordNet retains the structure and glosses of Princeton WordNet, it is a derivative of PWN subject to the PWN licence: http://wordnet.princeton.edu/wordnet/license/ The PWN licence allows free use, including commercial use, provided that a copyright notice is given: WordNet Release 3.0 This software and database is being provided to you, the LICENSEE, by Princeton University under the following license. By obtaining, using and/or copying this software and database, you agree that you have read, understood, and will comply with these terms and conditions.: Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution. WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same. The translations of FinnWordNet are copyright of the University of Helsinki and they are licenced under Creative Commons Attribution (CC BY) 3.0, which is similar to the PWN licence: http://creativecommons.org/licenses/by/3.0/ Please cite the following paper when referring to FinnWordNet: Krister Lindén and Lauri Carlson. 2010. FinnWordNet – WordNet på finska via översättning. LexicoNordica – Nordic Journal of Lexicography, 17:119–140. HFST is licenced under the GNU Lesser General Public License, version 3.0: http://www.gnu.org/licenses/lgpl.html Contact ------- The FinnWordNet project is led by Dr Krister Lindén at the Department of Modern Languages (Language Technology) of the University of Helsinki. In technical questions, please contact Mr Jyrki Niemi. Email addresses are of the form firstname.lastname@helsinki.fi (accents removed).