Dissemin is shutting down on January 1st, 2025

Published in

Cambridge University Press, Natural Language Engineering, 01(19), p. 121-141

DOI: 10.1017/s1351324912000174

Links

Tools

Export citation

Search in Google Scholar

A SMS normalization system integrating multiple grammatical resources

Journal article published in 2012 by J. Oliva, J. I. Serrano ORCID, M. D. Del Castillo, Á. Igesias
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

AbstractSMS language presents special phenomena and important deviations from natural language. Every day, an impressive amount of chat messages, SMS messages, and e-mails are sent all over the world. This widespread use makes important the development of systems that normalize SMS language into natural language. However, typical machine translation approaches are difficult to adapt to SMS language because of many irregularities that are shown by this kind of language. This paper presents a new approach for SMS normalization that combines lexical and phonological translation techniques with disambiguation algorithms at two different levels: lexical and semantic. The method proposed does not depend on big annotated corpus, which is difficult to build and is applied in two different domains showing its easiness of adaptation across different languages and domains. The results obtained by the system outperform some of the existing methods of SMS normalization despite the fact that the Spanish language and the corpus created have some features that complicate the normalization task.