Published in

Elsevier, Procedia Social and Behavioral Sciences, (147), p. 370-375, 2014

DOI: 10.1016/j.sbspro.2014.07.113

Links

Tools

Export citation

Search in Google Scholar

A Methodology for Building Simple but Robust Stemmers without Language Knowledge: Stemmer Configuration

Journal article published in 2014 by Nikitas N. Karanikolas
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

This work is part of a project aiming to define a methodology for building simple but robust stemmers, without having knowledge of the stemmer's target language. The methodology starts with a very simple primary stemmer that is applied in some collection of words and returns the corresponding stems. The primary stemmer removes always the longest suffix that match the ending of the examined word. Next, Information Retrieval (IR) experts express their arguments against the results of the primary stemmer. This methodology allows the creation of a number of consecutive trial stemmers that gradually conform increasingly to the arguments expressed by the IR experts. Here, we are giving attention to the attributes and the adjusted characteristics/options that are available to the responsible person for building the consecutive trial stemmers and finally creating the best trial (the stemmer that respects as much as possible the arguments against the primary stemmer).