Skip to Main content Skip to Navigation
Conference papers

RFreeStem un raciniseur pour le malgache

Abstract : Stemming is a step in text pre-processing that groups together words that are morphologically different but semantically similar, and which therefore, when used in a query in a search engine, should match similar or even identical documents. For many languages, stemmers are rule-based. For languages without tools, the stemming problem remains unsolved. This is the case of Malagasy. This paper analyzes the efficiency of a stemmer, RFreeStem, based on the statistical analysis of texts and without rules. We study the hyperparameters of this stemmer and their influence on the efficiency of the stemming for Malagasy by comparing it to an existing test collection containing manually obtained word roots.
Complete list of metadata

https://hal-univ-tlse2.archives-ouvertes.fr/hal-03360868
Contributor : Romain Meunier Connect in order to contact the contributor
Submitted on : Friday, October 1, 2021 - 9:42:50 AM
Last modification on : Tuesday, October 19, 2021 - 2:23:38 PM

File

RFreeStem un raciniseur pour l...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03360868, version 1

Citation

Andonirina Andriamihasinoro, Josiane Mothe, Oihana Coustie, Olivier Teste. RFreeStem un raciniseur pour le malgache. 17ème conférence francophone en Recherche d’Information et Application (CORIA 2021), Apr 2021, Grenoble, France. pp.1-10. ⟨hal-03360868⟩

Share

Metrics

Record views

25

Files downloads

6