ROMTEXT – a Fundamental Instrument for the New Edition of the Dictionary of the Romanian Language

Rezumat:The article is a short presentation of the ROMTEXT project, a dated and annotated corpus of selected texts from the bibliography of the Dictionary of the Romanian Language, from the 16th − 21st centuries. The project aims at supporting the new digital edition of the thesaurus-dictionary, developed by the Lexicology and Lexicography Department of the “Iorgu Iordan − Al. Rosetti” Institute of Linguistics, Romanian Academy (2017–2019). ROMTEXT shall include over 500 literary and non- literary texts, obtained by optical recognition of the best editions, with assisted corrections. Subsequently, these texts shall be annotated from a morphological, syntactic and semantic point of view, by a team of lexicographers, with computer assistance. ROMTEXT shall have two concordance searching interfaces: one for lexicographers and one for the public. Results limitation and selection methods are also provided based on the text metadata. Due to its design and results, ROMTEXT shall be one of the most modern and versatile corpus linguistics available in Romanian.
Cuvinte-cheie:corpus linguistics, Dictionary of the Romanian Language; lemmatization; annotated corpus; reference corpus, Romanian language
