A distributed approach to speech resource collection
View/ Open
Date
2013Author
Molapo, Raymond
Barnard, Etienne
De Wet, Febe
Metadata
Show full item recordAbstract
We describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing too ls for gathering text and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a usable Shona speech corpus for the development of the first Automatic Speech Recognition system in that language.