The experimental MT System of the Project KIT-FAST

The project FAST, which is member of the project group KIT, has developed and implemented an experimental machine translation (MT) system. For that reason a syntactic, semantic and aspects of a conceptual level of representation have been realized. The syntactic and semantic sentential representations are structures, which are generated by Generalized Phrase Structure Grammars (GPSG) and Functor-Argument-Structures (FAS), respectively. The conceptual level of representation is realized with the help of the KL-ONE based knowledege representation system BACK. The BACK system has been developed by the sister project KIT-BACK and integrated into the MT system. In the ABox (assertional knowledge) of the BACK system the text content is represented. The TBox (terminological knowledge) contains predefined background knowledge (domain and world knowledge, linguistic knowledge).

The KIT-FAST MT system is transfer-based and translates written German texts into English sentence by sentence. The translation of a sentence consists of morphological, syntactical, semantical and conceptual analysis, transfer, generation and morphological synthesis. The algorithms for morphological analysis and synthesis are based on the SUTRA system (a module of the HAM-ANS hotel information system). The syntactic analysis is realized by a GPSG parser, which interprets ID rules, LP statements and metarules directly. The semantic and conceptual analysis, the transfer as well as the generation is realized by one algorithm on the basis of term-rewriting (known from the automatic provement of equations).

After the semantic analysis of a sentence the resulting FAS expression is conceptually analysed, i.e. it is mapped onto an expression of the ABox-Tell-Language (ATL), with the help of which the contents of the sentence is added to the representation of the text content in the ABox of the BACK system.

Our first step towards the translation of German texts instead of single sentences was to interpret anaphoric relations in the source language. For that reason an algorithm for the evaluation of anaphoric relations has been developed and implementet. This algorithms uses the textual and background knowledge in order to determine the structural prominence of an antecedent candidate and its consistency with the anaphor.

The evaluation component of the MT system takes a FAS expression of the source language as its input and looks for antecedents in the same and preceeding sentences. After evaluating anaphorical relations the FAS expression is actualized, i.e. the parts of the FAS expressions corresponding to an anaphor and its antecedent are made to refer to the same ABox object. An actualized FAS expression for a source language sentence is transfered into a target language FAS expression, from which the corresponding target language sentence is generated.

The MT system employs two textual representations. One for representing the structural information of a text and another for representing the text content. The textual representations are constructed incrementally from the sentential ones during translation. In principle a textual representation is needed on every level, but this would lead to redundant representations on the syntactic and semantic level. For that reason we decided to take the more general semantic level (FAS) for the representation of structural aspects of the text.

The components of the MT system

Linguistic Data

Linguistic data was developed in order to translate a German text, which is The Proposal of the European Commission for the ESPRIT Programme. About 100 sentences were successfully tested with the help of the MT system. The linguistic data comprises:

Implementation

The MT system is implemented in Quintus-Prolog 3.1 (commercial software) and SWI-Prolog 1.9.5 (public domain software). Both Prolog dialects are running on Sun workstations under SunOS and AT compatible PCs under DOS (Windows 3.1). The MT system is tested for Quintus- and SWI-Prolog under SunOS and under SWI-Prolog under Windows 3.1 and needs about 10 MB of hard disk space.

In order to get the software for the MT system running on AT compatible PCs under DOS (Windows 3.1) see http://www.cs.tu-berlin.de/~ww/mtdos.html.

If you are interested in receiving the software for the MT system for Sun workstations under SunOS see http://www.cs.tu-berlin.de/~ww/mtsun.html.

Documents related to the MT system

User and System documentation:
Internal Working Papers

Please order Internal Working Papers via (e-)mail from the address below

The list of available KIT reports can be found at http://www.cs.tu-berlin.de/~kit/reportliste/kitlistehtml.html.

Further Information

Dr.-Ing. Wilhelm Weisweber
Technical University of Berlin
Department of Computer Sciences
Institute for Communication and Software Technology (IKS)
Formal Models, Logic and Programming (FLP)
Sekr.: FR 6-10
Franklinstr. 28/29
D-10587 Berlin-Charlottenburg
Federal Republic of Germany

Fon: +49-30-314-73608
Fax: +49-30-314-73622
E-mail: ww@cs.tu-berlin.de
WWW: http://www.cs.tu-berlin.de/~ww/