Published in

Elsevier, Computer Speech and Language, 1(18), p. 1-23

DOI: 10.1016/s0885-2308(03)00027-5

Links

Tools

Export citation

Search in Google Scholar

Using rule-induction techniques to model pronunciation variation in Dutch

Journal article published in 2004 by Veronique Hoste, Walter Daelemans, Steven Gillis ORCID
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Red circle
Postprint: archiving forbidden
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

In this paper, we present an inductive approach to the automatic extraction of knowledge about inter-regional pronunciation variation. We compare two different rule-induction techniques, both popular in language engineering applications, viz. the rule sequence learner Transformation-Based Error-Driven Learning (TBEDL) Brill (1995) and the decision tree learner C5.0 Quinlan (1993). We investigate whether both techniques detect the same regularities and evaluate the extracted rules in terms of accuracy and in terms of linguistic relevance. As a case study, we apply the approach to Dutch and Flemish (the variety of Dutch spoken in Flanders, a part of Belgium), based on Celex and Fonilex, pronunciation lexica for Dutch and Flemish, respectively. Our main goal is to show that this approach allows the automatic acquisition of compact, interpretable translation rules between pronunciation varieties, on the basis of phonemic representations of words in both varieties (as output of phoneme recognition or, as in our case, on the basis of existing lexica). We also show that the observed differences coincide with the tendencies studied and described in linguistic comparative research of inter-regional pronunciation variation in standard Dutch.