The MT System of the Project KIT-FAST

The experimental MT System of the Project KIT-FAST

The project FAST, which is member of the project group KIT, has developed and implemented an experimental machine translation (MT) system. For that reason a syntactic, semantic and aspects of a conceptual level of representation have been realized. The syntactic and semantic sentential representations are structures, which are generated by Generalized Phrase Structure Grammars (GPSG) and Functor-Argument-Structures (FAS), respectively. The conceptual level of representation is realized with the help of the KL-ONE based knowledege representation system BACK. The BACK system has been developed by the sister project KIT-BACK and integrated into the MT system. In the ABox (assertional knowledge) of the BACK system the text content is represented. The TBox (terminological knowledge) contains predefined background knowledge (domain and world knowledge, linguistic knowledge).

The KIT-FAST MT system is transfer-based and translates written German texts into English sentence by sentence. The translation of a sentence consists of morphological, syntactical, semantical and conceptual analysis, transfer, generation and morphological synthesis. The algorithms for morphological analysis and synthesis are based on the SUTRA system (a module of the HAM-ANS hotel information system). The syntactic analysis is realized by a GPSG parser, which interprets ID rules, LP statements and metarules directly. The semantic and conceptual analysis, the transfer as well as the generation is realized by one algorithm on the basis of term-rewriting (known from the automatic provement of equations).

After the semantic analysis of a sentence the resulting FAS expression is conceptually analysed, i.e. it is mapped onto an expression of the ABox-Tell-Language (ATL), with the help of which the contents of the sentence is added to the representation of the text content in the ABox of the BACK system.

Our first step towards the translation of German texts instead of single sentences was to interpret anaphoric relations in the source language. For that reason an algorithm for the evaluation of anaphoric relations has been developed and implementet. This algorithms uses the textual and background knowledge in order to determine the structural prominence of an antecedent candidate and its consistency with the anaphor.

The evaluation component of the MT system takes a FAS expression of the source language as its input and looks for antecedents in the same and preceeding sentences. After evaluating anaphorical relations the FAS expression is actualized, i.e. the parts of the FAS expressions corresponding to an anaphor and its antecedent are made to refer to the same ABox object. An actualized FAS expression for a source language sentence is transfered into a target language FAS expression, from which the corresponding target language sentence is generated.

The MT system employs two textual representations. One for representing the structural information of a text and another for representing the text content. The textual representations are constructed incrementally from the sentential ones during translation. In principle a textual representation is needed on every level, but this would lead to redundant representations on the syntactic and semantic level. For that reason we decided to take the more general semantic level (FAS) for the representation of structural aspects of the text.

The components of the MT system

morphological analyser based on the SUTRA system
GPSG parser for direct interpretation of ID rules, LP statements and metarules
term-rewrite rule interpreter for semantic and conceptual analysis, transfer and generation
morphological synthesizer based on the SUTRA system
module for the evaluation of anaphoric relations
the knowledge representation system BACK
tools for the development of lexicons, grammars and term-rewrite systems

Linguistic Data

Linguistic data was developed in order to translate a German text, which is The Proposal of the European Commission for the ESPRIT Programme. About 100 sentences were successfully tested with the help of the MT system. The linguistic data comprises:

a German grammar (GPSG):
- 22 main categories, 34 features
- 22 aliases
- 76 ID rules
- 23 LP statements
- 5 metarules
- 23 FCRs
- 265 lexical entries (stem forms)
134 term-rewrite rules for semantic analysis (German)
37 term-rewrite rules for conceptual analysis (German)
248 term-rewrite rules for transfer (German --> English)
182 term-rewrite rules for generation (English)
8 factors for the evaluation of anaphoric relations in German:
1. agreement
2. binding
3. proximity
4. preference for the semantic subject
5. topic preference
6. identity of roles
7. negative preference for free adjuncts
8. conceptual consistency
the predefined background knowledge comprises selectional restrictions

Implementation

The MT system is implemented in Quintus-Prolog 3.1 (commercial software) and SWI-Prolog 1.9.5 (public domain software). Both Prolog dialects are running on Sun workstations under SunOS and AT compatible PCs under DOS (Windows 3.1). The MT system is tested for Quintus- and SWI-Prolog under SunOS and under SWI-Prolog under Windows 3.1 and needs about 10 MB of hard disk space.

In order to get the software for the MT system running on AT compatible PCs under DOS (Windows 3.1) see http://www.cs.tu-berlin.de/~ww/mtdos.html.

If you are interested in receiving the software for the MT system for Sun workstations under SunOS see http://www.cs.tu-berlin.de/~ww/mtsun.html.

Documents related to the MT system

Wilhelm Weisweber
Ein Dominanz-Chart-Parser für generalisierte Phrasenstrukturgrammatiken
KIT-Report 45, Institute for Software and Theoretical CS, Technical University of Berlin 1987
Christa Hauenschild, Stephan Busemann
A constructive Version of GPSG for Machine Translation
in: Erich Steiner, Paul Schmidt, Cornelia Zellinsky-Wibbelt (eds.), From Syntax to Semantics - Insights from Machine Translation, Frances Pinter, London 1988, pp. 216-238 and in KIT-Report 59, Institute for Software and Theoretical CS, Technical University of Berlin 1988
Stephan Busemann, Christa Hauenschild
A Constructive View of GPSG or How to Make it Work
Proceedings of the Coling-88, Budapest 1988, pp. 77-82 and in KIT-Report 60, Institute for Software and Theoretical CS, Technical University of Berlin 1988
Wilhelm Weisweber
Using Constraints in a Constructive Version of GPSG
Proceedings of the Coling-88, Budapest 1988, pp. 738-743 and extended version in KIT-Report 61, Institute for Software and Theoretical CS, Technical University of Berlin 1988
Wilhelm Weisweber
Transfer in Machine Translation by Non-Confluent Term-Rewrite Systems
Proceedings of the GWAI-89, Springer, Berlin 1989, pp. 264-269
Wilhelm Weisweber, Christa Hauenschild
A Model of Multi-Level Transfer for Machine Translation and Its Partial Realization
KIT-Report 77, Institute for Software and Theoretical CS, Technical University of Berlin 1990 and to appear in: Proceedings of the Seminar Computers & Translation '89, Tiflis 1989
Birte Schmitz
Zur Wissensrepräsentation in der Maschinellen Übersetzung - Die mögliche Verwendung von KL-ONE -
KIT-Report 80, Institute for Software and Theoretical CS, Technical University of Berlin 1990
Stephan Busemann
Generierung natürlicher Sprache mit Generalisierten Phrasenstruktur-Grammatiken
Informatik Fachberichte 313, Springer, Berlin 1992 and a preliminary version in KIT-Report 87, Institute for Software and Theoretical CS, Technical University of Berlin 1990
Projekt KIT-FAST
Schlußbericht des Berliner Projekts der EUROTRA-D-Begleitforschung "Transfer und Generierung auf satzsemantischer Basis
KIT-Report 88, Institute for Software and Theoretical CS, Technical University of Berlin 1991
Birte Schmitz, Susanne Preuß, Christa Hauenschild
Textrepräsentation und Hintergrundwissen für die Anaphernresolution im Maschinellen Übersetzungssystem KIT-FAST
in: M. Kohrt, Ch. Küper (eds.), Probleme der Übersetzungswissenschaft, Working Papers in Linguistics, Department for Linguistics, Technical University of Berlin 1991, pp. 39-81 and in KIT-Report 93, Institute for Software and Theoretical CS, Technical University of Berlin 1992
Christa Hauenschild
Anapherninterpretation in der Maschinellen Übersetzung
KIT-Report 94, Institute for Software and Theoretical CS, Technical University of Berlin 1992 and Zeitschrift für Literaturwissenschaft und Linguistik 84 (1991), Vandenhoeck & Ruprecht, pp. 50-66
Susanne Preuß, Birte Schmitz, Christa Hauenschild
Anaphora Resolution Based on Semantic and Conceptual Knowledge
in: Susanne Preuß, Birte Schmitz, Workshop on Textrepresentation and Domain Modelling - Ideas from Linguistics and AI, KIT-Report 97, Institute for Software and Theoretical CS, Technical University of Berlin 1992, pp. 1-13
Wilhelm Weisweber
Term-Rewriting as a Basis for a Uniform Architecture in Machine Translation
Proceedings of the Coling-92, Nantes 1992, pp. 777-783 and extended version in KIT-Report 101, Institute for Software and Theoretical CS, Technical University of Berlin 1992
Wilhelm Weisweber, Susanne Preuß
Direct Parsing with Metarules
Proceedings of the Coling-92, Nantes 1992, pp. 1111-1115 and extended version in KIT-Report 102, Institute for Software and Theoretical CS, Technical University of Berlin 1992
Susanne Preuß, Birte Schmitz, Christa Hauenschild, Carla Umbach
Anaphora Resolution in Machine Translation
KIT-Report 104, Institute for Software and Theoretical CS, Technical University of Berlin 1992
Birte Schmitz, Joachim Quantz
Defaults in Machine Translation
KIT-Report 106, Institute for Software and Theoretical CS, Technical University of Berlin 1993
Projekt KIT-FAST
Anapherninterpretation in der Maschinellen Übersetzung - Schlußbericht des Berliner Projekts der EUROTRA-D-Begleitforschung
KIT-Report 108, Institute for Software and Theoretical CS, Technical University of Berlin 1993
Wilhelm Weisweber
Termersetzung als Basis für eine einheitliche Architektur in der maschinellen Sprachübersetzung
Sprache und Information Band 28, Niemeyer, Tübingen 1994,
in order to see the contents click here
Wilhelm Weisweber
The experimental MT System System of the Project KIT-FAST
Proceedings of the International Conference Machine Translation: Ten Years On, Cranfield 1994, pp. 12.1-12.19

User and System documentation:

Wilhelm Weisweber
Implementierungs- und Benutzerhandbuch des experimentellen Berliner MÜ-Systems
KIT-Report 116, Institute for Software and Theoretical CS, Technical University of Berlin 1994

Internal Working Papers

Please order Internal Working Papers via (e-)mail from the address below

Susanne Preuß
Koordination und Kongruenz in einer Verallgemeinerten Phrasenstrukturgrammatik
KIT-Internal Working Paper 25, Institute for Software and Theoretical CS, Technical University of Berlin 1989
Carla Umbach
Terminterpretation von FAS-Strukturen
KIT-Internal Working Paper 26, Institute for Software and Theoretical CS, Technical University of Berlin 1989
Guido Dunker, Carla Umbach
Verfahren zur Anaphernresolution in KIT-FAST
KIT-Internal Working Paper 28, Institute for Software and Theoretical CS, Technical University of Berlin 1993
Christian Werner-Meier
Konsistenzüberprüfung eines MÜ-Lexikons - Eine Anwendung Terminologischer Logik -
KIT-Internal Working Paper 29, Institute for Software and Theoretical CS, Technical University of Berlin 1993
Wilhelm Weisweber
Generation by Term-Rewriting
Internal Working Paper, Institute for Software and Theoretical CS, Technical University of Berlin 1994

The list of available KIT reports can be found at http://www.cs.tu-berlin.de/~kit/reportliste/kitlistehtml.html.

Further Information

Dr.-Ing. Wilhelm Weisweber
Technical University of Berlin
Department of Computer Sciences
Institute for Communication and Software Technology (IKS)
Formal Models, Logic and Programming (FLP)
Sekr.: FR 6-10
Franklinstr. 28/29
D-10587 Berlin-Charlottenburg
Federal Republic of Germany

Fon: +49-30-314-73608
Fax: +49-30-314-73622
E-mail: ww@cs.tu-berlin.de
WWW: http://www.cs.tu-berlin.de/~ww/