Published in

Elsevier, Procedia Computer Science, (81), p. 53-60, 2016

DOI: 10.1016/j.procs.2016.04.029

Links

Tools

Export citation

Search in Google Scholar

Eyra - Speech Data Acquisition System for Many Languages

Journal article published in 2016 by Matthias Petursson, Simon Klüpfel, Jon Gudnason ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Speech data acquisition is particularly important for under-resourced languages. The data gathering is the most labour-intensive part of developing speech technologies such as automatic speech recognizers and synthesizers. It is therefore important to facilitate this process with as much automation and labour-cutting tools as possible. This paper describes a new open-source system called Eyra which enables distributed speech data collecting through a variety of devices. It addresses internet connectivity issues by allowing the data collectors to run the back-end server off a local laptop, thereby facilitating automatic quality control and less labour-intensive data uploading and compiling. It can also be used in a crowd-sourcing set-up where volunteers can donate voice samples through a desktop web-browser interface. An initial test shows that the system works well in an offline mode using smart-phones for data collection.