Morphologically Annotated Amharic Text Corpora - Archive ouverte HAL Access content directly
Conference Papers Year :

Morphologically Annotated Amharic Text Corpora

(1) , (2) , (1)
1
2
Tilahun Yeshambel
  • Function : Author
  • PersonId : 1078785
Josiane Mothe
Yaregal Assabie
  • Function : Author
  • PersonId : 1078786

Abstract

In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the documents to be relevant. Stemmers are known to be effective in many languages for IR. However, there are still languages where stemmers or morphological analyzers are missing; this is the case for Amharic which is the working language of Ethiopia. Morphological analysis is the key to derive stems, roots (primary lexical units) and grammatical markers of words such as person, tense and negation markers. This paper presents morphologically annotated Amharic lexicons as well as stem-based and root-based morphologically annotated corpora which could be used by the research community as benchmark collections either to evaluate morphological analyzers or information retrieval for Amharic. Such resources are believed to foster research in Amharic IR.
Fichier principal
Vignette du fichier
Morphologically.pdf (2.11 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03362977 , version 1 (02-10-2021)

Identifiers

Cite

Tilahun Yeshambel, Josiane Mothe, Yaregal Assabie. Morphologically Annotated Amharic Text Corpora. SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event Canada, France. pp.2349-2355, ⟨10.1145/3404835.3463237⟩. ⟨hal-03362977⟩
50 View
118 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More