Moara: a Java library for extracting and normalizing gene and protein mentions

Neves, Mariana L.; Carazo, José-María; Pascual-Montano, Alberto

Published in

BioMed Central, BMC Bioinformatics, 1(11), 2010

DOI: 10.1186/1471-2105-11-157

Tools

Export citation

Search in Google Scholar

Moara: a Java library for extracting and normalizing gene and protein mentions

Journal article published in 2010 by Mariana L. Neves, José-María Carazo

, Alberto Pascual-Montano

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Background Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. Results This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. Conclusions Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.

Published in

Links

Tools

Moara: a Java library for extracting and normalizing gene and protein mentions

Abstract