Information Theoretic Based Segments for Language Identification

Harbeck, Stefan; Ohler, Uwe; Nöth, Elmar; Niemann, Heinrich

Published in

Springer Verlag, Lecture Notes in Computer Science, p. 187-192

DOI: 10.1007/3-540-48239-3_34

Tools

Export citation

Search in Google Scholar

Information Theoretic Based Segments for Language Identification

Journal article published in 1999 by Stefan Harbeck, Uwe Ohler, Elmar Nöth

, Heinrich Niemann

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

In our paper we present two new approaches for language identification. Both of them are based on the use of so-called multigrams, an information theoretic based observation representation. In the first approach we use multigram models for phonotactic modeling of phoneme or codebook sequences. The multigram model can be used to segment the new observation into larger units (e.g. something like words) and calculates a probability for the best segmentation. In the second approach we build a fenon recognizer using the segments of the best segmentation of the training material as "words" inside the recognition vocabulary.

Published in

Links

Tools

Information Theoretic Based Segments for Language Identification

Abstract