Published in

2009 Fourth Balkan Conference in Informatics

DOI: 10.1109/bci.2009.16

Links

Tools

Export citation

Search in Google Scholar

Bootstrapping the Albanian Information Retrieval

Proceedings article published in 2009 by Nikitas N. Karanikolas
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In this paper we investigate the Albanian language and try to uncover the characteristics of the language that will permit the information retrieval (IR) community to develop IR systems adapted for the specific language. As a consequence of our study (investigation) we provide a naive-single-step (rudimentary) stemming algorithm for the Albanian language. A stopword list is also created. Human experts are contacted for the evaluation of the provided stemming algorithm. The evaluation method used and the observation of the method's results uncover more rules, which could improve the capabilities of the rudimentary stemming algorithm. We believe that our approach for this specific language could become a standard way for building information retrieval functionalities (tools, functions, etc) for languages less perused, as is the language studied in this paper.