“Diacronia” bibliometric database (BDD)

Représentation de Textes à l’Aide d’Étiquettes Sémantiques dans le Cadre de la Classification Automatique

Publication: Revue roumaine de linguistique, LI (3-4)
Publisher:Editura Academiei
Abstract:This paper describes an algorithm for document representation in a reduced vectorial space by a process of feature extraction. The algorithm is evaluated in the context of the supervised classification of news articles. We are generating a document representation (profile) represented by semantic tags from a machine-readable dictionary. We are dealing with synonymy handled by thematic conflation, and polysemy for which we have developed a statistical method for word-sense disambiguation. We propose four variants for the profile generation depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. We have evaluated 32 variants, depending on the algorithm type and on three other parameters: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve the performance noticeably.
Language: French

Citations to this publication: 1

References in this publication: 0

The citations/references list is based on indexed publications only, and may therefore be incomplete.
For any and all inquiries related to the database, please contact us at [Please enable javascript to view.].