Analiza sintagmatică a textelor românești prin mijloace informatice: proiectul SIASTRO

Publicația: Dacoromania. Serie nouă, XI-XII, p. 77-87
Editura:Editura Academiei
Rezumat:In the present context of the information society, each language needs technological products to connect it to the international environment of computerized communication tools, and naturallanguage text processing and storage techniques. The linguistic resources necessary for human language technologies (HLT) applications are classified into three main types: (a) theoretical resources (such as grammatical theories and formalisms); (b) linguistic data resources (textual, lexical and grammatical resources); (c) computer applications (such as automatic annotation, information extraction and information retrieval applications, authoring tools, translation authoring assistants). The article starts from an analysis of the current situation in the field regarding the existing resources for the Romanian language, and proceeds to the presentation of a complex interdisciplinary project (SIASTRO) undertaken by a consortium of four partners from Cluj-Napoca, aiming at the creation of a system for the automatic phrase analysis of Romanian texts. The system is designed to have three components: (1) a lexico-grammatical system, which consists of a lexicon with entries corresponding to Romanian words and containing sets of data required for the automatic processing of texts, lexicomorphological analysis procedures and the necessary graphic interfaces; (2) a parser, which performs the analysis of noun phrases, verb phrases, adjectival phrases, adverbial phrases; (3) an interactive system for term extraction from specialized texts, as a first practical application of the parser. The project’s expected outcome is the implementation of a prototype system for term extraction, as well as comprehensive scientific documentation concerning both the formal aspects of Romanian grammar, and modalities of implementation. Based on these results, research can subsequently be extended towards the syntactic-semantic analysis of Romanian texts, with most diverse applications: grammar checking programs, systems for computer-assisted Romanian language learning, both for native, and for non-native speakers, corpus annotation systems.
Limba: română

