The experimental MT System of the Project KIT-FAST
|
The project FAST,
which is member of the project group
KIT, has developed
and implemented an experimental machine translation (MT)
system. For that reason a syntactic, semantic and
aspects of a conceptual level of representation have been realized.
The syntactic and semantic sentential representations are structures,
which are generated by Generalized Phrase Structure Grammars (GPSG)
and Functor-Argument-Structures (FAS), respectively. The conceptual
level of representation is realized with the help of the KL-ONE based
knowledege representation system BACK. The BACK system has been
developed by the sister project
KIT-BACK and integrated into the MT system.
In the ABox (assertional knowledge) of the BACK system the text content
is represented. The TBox (terminological knowledge) contains predefined
background knowledge (domain and world knowledge, linguistic knowledge).
The KIT-FAST MT system is transfer-based and
translates written German texts into English sentence by sentence.
The translation of a sentence consists
of morphological, syntactical, semantical and conceptual analysis,
transfer, generation and morphological synthesis. The algorithms for
morphological analysis and synthesis are based on the SUTRA system
(a module of the HAM-ANS hotel information system).
The syntactic analysis is realized by a GPSG parser, which interprets
ID rules, LP statements and metarules directly. The semantic and
conceptual analysis, the transfer as well as the generation is
realized by one algorithm on the basis of term-rewriting (known
from the automatic provement of equations).
After the semantic analysis of
a sentence the resulting FAS expression is conceptually analysed,
i.e. it is mapped onto an expression of the ABox-Tell-Language
(ATL), with the help of which the contents of the sentence is added
to the representation of the text content in the ABox of the
BACK system.
Our first step towards the translation of German texts instead of
single sentences was to interpret anaphoric relations in the source
language. For that reason an algorithm for the evaluation of
anaphoric relations has been developed and implementet. This
algorithms uses the textual and background knowledge in order to
determine the structural prominence of an antecedent candidate
and its consistency with the anaphor.
The evaluation component of the MT system takes a FAS expression
of the source language as its input and looks for antecedents in
the same and preceeding sentences. After evaluating anaphorical
relations the FAS expression is actualized, i.e. the parts of
the FAS expressions corresponding to an anaphor and its antecedent
are made to refer to the same ABox object. An actualized
FAS expression for a source language sentence is transfered
into a target language FAS expression, from which the
corresponding target language sentence is generated.
The MT system employs two textual representations. One for
representing the structural information of a text and another
for representing the text content.
The textual representations are constructed incrementally from the
sentential ones during translation. In principle a textual representation
is needed on every level, but this would lead to
redundant representations on the syntactic and semantic level.
For that reason we decided to take the more general semantic level
(FAS) for the representation of structural aspects of the text.
The components of the MT system
|
- morphological analyser based on the SUTRA system
- GPSG parser for direct interpretation of ID rules, LP statements
and metarules
- term-rewrite rule interpreter for semantic and conceptual
analysis, transfer and generation
- morphological synthesizer based on the SUTRA system
- module for the evaluation of anaphoric relations
- the knowledge representation system BACK
- tools for the development of lexicons, grammars and term-rewrite
systems
Linguistic data was developed in order to translate a German text,
which is The Proposal of the European Commission for the ESPRIT
Programme. About 100 sentences were successfully tested with
the help of the MT system. The linguistic data comprises:
- a German grammar (GPSG):
- 22 main categories, 34 features
- 22 aliases
- 76 ID rules
- 23 LP statements
- 5 metarules
- 23 FCRs
- 265 lexical entries (stem forms)
- 134 term-rewrite rules for semantic analysis (German)
- 37 term-rewrite rules for conceptual analysis (German)
- 248 term-rewrite rules for transfer (German --> English)
- 182 term-rewrite rules for generation (English)
- 8 factors for the evaluation of anaphoric relations in German:
- agreement
- binding
- proximity
- preference for the semantic subject
- topic preference
- identity of roles
- negative preference for free adjuncts
- conceptual consistency
- the predefined background knowledge comprises selectional restrictions
The MT system is implemented in Quintus-Prolog 3.1 (commercial software)
and SWI-Prolog 1.9.5
(public domain software). Both Prolog dialects are running on
Sun workstations under SunOS and AT compatible PCs under DOS
(Windows 3.1). The MT system is tested for Quintus- and SWI-Prolog
under SunOS and under SWI-Prolog under Windows 3.1 and needs about
10 MB of hard disk space.
In order to get the software for the MT system running on AT compatible
PCs under DOS (Windows 3.1) see
http://www.cs.tu-berlin.de/~ww/mtdos.html.
If you are interested in receiving the software for the MT system
for Sun workstations under SunOS see
http://www.cs.tu-berlin.de/~ww/mtsun.html.
Documents related to the MT system
|
- Wilhelm Weisweber
Ein Dominanz-Chart-Parser für generalisierte
Phrasenstrukturgrammatiken
KIT-Report 45, Institute for Software and Theoretical CS,
Technical University of Berlin 1987
- Christa Hauenschild, Stephan Busemann
A constructive Version of GPSG for Machine Translation
in: Erich Steiner, Paul Schmidt, Cornelia Zellinsky-Wibbelt (eds.),
From Syntax to Semantics - Insights from Machine Translation,
Frances Pinter, London 1988, pp. 216-238 and in KIT-Report 59,
Institute for Software and Theoretical CS, Technical University
of Berlin 1988
- Stephan Busemann, Christa Hauenschild
A Constructive View of GPSG or How to Make it Work
Proceedings of the Coling-88, Budapest 1988, pp. 77-82 and in
KIT-Report 60, Institute for Software and Theoretical CS,
Technical University of Berlin 1988
- Wilhelm Weisweber
Using Constraints in a Constructive Version of GPSG
Proceedings of the Coling-88, Budapest 1988, pp. 738-743 and
extended version in
KIT-Report 61, Institute for Software
and Theoretical CS, Technical University of Berlin 1988
- Wilhelm Weisweber
Transfer in Machine Translation by Non-Confluent
Term-Rewrite Systems
Proceedings of the GWAI-89, Springer, Berlin 1989, pp. 264-269
- Wilhelm Weisweber,
Christa Hauenschild
A Model of Multi-Level Transfer for Machine Translation
and Its Partial Realization
KIT-Report 77, Institute for Software and Theoretical CS,
Technical University of Berlin 1990 and to appear in: Proceedings
of the Seminar Computers & Translation '89, Tiflis 1989
- Birte Schmitz
Zur Wissensrepräsentation in der Maschinellen Übersetzung
- Die mögliche Verwendung von KL-ONE -
KIT-Report 80, Institute for Software and Theoretical CS,
Technical University of Berlin 1990
- Stephan Busemann
Generierung natürlicher Sprache mit Generalisierten
Phrasenstruktur-Grammatiken
Informatik Fachberichte 313, Springer, Berlin 1992 and a
preliminary version in KIT-Report 87, Institute for Software
and Theoretical CS, Technical University of Berlin 1990
- Projekt KIT-FAST
Schlußbericht des Berliner Projekts der
EUROTRA-D-Begleitforschung "Transfer und Generierung auf
satzsemantischer Basis
KIT-Report 88, Institute for Software and Theoretical CS,
Technical University of Berlin 1991
- Birte Schmitz, Susanne Preuß, Christa Hauenschild
Textrepräsentation und Hintergrundwissen für die
Anaphernresolution im Maschinellen Übersetzungssystem
KIT-FAST
in: M. Kohrt, Ch. Küper (eds.),
Probleme der Übersetzungswissenschaft,
Working Papers in Linguistics, Department for Linguistics,
Technical University of Berlin 1991, pp. 39-81 and in
KIT-Report 93, Institute for Software and Theoretical CS,
Technical University of Berlin 1992
- Christa Hauenschild
Anapherninterpretation in der Maschinellen Übersetzung
KIT-Report 94, Institute for Software and Theoretical CS,
Technical University of Berlin 1992 and
Zeitschrift für Literaturwissenschaft und Linguistik 84
(1991), Vandenhoeck & Ruprecht, pp. 50-66
- Susanne Preuß, Birte Schmitz, Christa Hauenschild
Anaphora Resolution Based on Semantic and Conceptual Knowledge
in: Susanne Preuß, Birte Schmitz,
Workshop on Textrepresentation and Domain Modelling - Ideas
from Linguistics and AI,
KIT-Report 97, Institute for Software and Theoretical CS,
Technical University of Berlin 1992, pp. 1-13
- Wilhelm Weisweber
Term-Rewriting as a Basis for a Uniform
Architecture in Machine Translation
Proceedings of the Coling-92, Nantes 1992, pp. 777-783 and
extended version in KIT-Report 101, Institute for Software
and Theoretical CS, Technical University of Berlin 1992
- Wilhelm Weisweber,
Susanne Preuß
Direct Parsing with Metarules
Proceedings of the Coling-92, Nantes 1992, pp. 1111-1115 and
extended version in KIT-Report 102, Institute for Software
and Theoretical CS, Technical University of Berlin 1992
- Susanne Preuß, Birte Schmitz, Christa Hauenschild, Carla Umbach
Anaphora Resolution in Machine Translation
KIT-Report 104, Institute for Software
and Theoretical CS, Technical University of Berlin 1992
- Birte Schmitz, Joachim Quantz
Defaults in Machine Translation
KIT-Report 106, Institute for Software
and Theoretical CS, Technical University of Berlin 1993
- Projekt KIT-FAST
Anapherninterpretation in der Maschinellen Übersetzung -
Schlußbericht des Berliner Projekts der
EUROTRA-D-Begleitforschung
KIT-Report 108, Institute for Software
and Theoretical CS, Technical University of Berlin 1993
- Wilhelm Weisweber
Termersetzung als Basis für eine einheitliche Architektur
in der maschinellen Sprachübersetzung
Sprache und Information Band 28, Niemeyer, Tübingen 1994,
in order to see the contents click
here
- Wilhelm Weisweber
The experimental MT System System of the Project KIT-FAST
Proceedings of the International Conference Machine Translation:
Ten Years On, Cranfield 1994, pp. 12.1-12.19
User and System documentation:
- Wilhelm Weisweber
Implementierungs- und Benutzerhandbuch des experimentellen
Berliner MÜ-Systems
KIT-Report 116, Institute for Software and Theoretical CS,
Technical University of Berlin 1994
Please order Internal Working Papers via (e-)mail from the
address below
- Susanne Preuß
Koordination und Kongruenz in einer Verallgemeinerten
Phrasenstrukturgrammatik
KIT-Internal Working Paper 25, Institute for Software and
Theoretical CS, Technical University of Berlin 1989
- Carla Umbach
Terminterpretation von FAS-Strukturen
KIT-Internal Working Paper 26, Institute for Software and
Theoretical CS, Technical University of Berlin 1989
- Guido Dunker, Carla Umbach
Verfahren zur Anaphernresolution in KIT-FAST
KIT-Internal Working Paper 28, Institute for Software and
Theoretical CS, Technical University of Berlin 1993
- Christian Werner-Meier
Konsistenzüberprüfung eines MÜ-Lexikons
- Eine Anwendung Terminologischer Logik -
KIT-Internal Working Paper 29, Institute for Software and
Theoretical CS, Technical University of Berlin 1993
- Wilhelm Weisweber
Generation by Term-Rewriting
Internal Working Paper, Institute for Software and
Theoretical CS, Technical University of Berlin 1994
The list of available KIT reports can be found at
http://www.cs.tu-berlin.de/~kit/reportliste/kitlistehtml.html.
Dr.-Ing. Wilhelm Weisweber
Technical University of Berlin
Department of Computer Sciences
Institute for Communication and Software Technology (IKS)
Formal Models, Logic and Programming (FLP)
Sekr.: FR 6-10
Franklinstr. 28/29
D-10587 Berlin-Charlottenburg
Federal Republic of Germany
Fon: +49-30-314-73608
Fax: +49-30-314-73622
E-mail:
ww@cs.tu-berlin.de
WWW:
http://www.cs.tu-berlin.de/~ww/