The semi-automated creation of stratified speech corpora

Van Heerden, Carel; Barnard, Etienne; Davel, Marelie H.

The semi-automated creation of stratified speech corpora

Files

prasa2013-17.pdf (195.75 KB)

Date

2013

Authors

Van Heerden, Carel

Barnard, Etienne

Davel, Marelie H.

Researcher ID

11539151 - Van Heerden, Carel Jacobus
23607955 - Davel, Marelie Hattingh
21021287 - Barnard, Etienne

Publisher

Pattern recognition association of South Africa (PRASA)

Abstract

Smartphones provide an efficient means for the collection of speech data; however, the quality of the corpora created in this fashion is not predictable. We describe an approach that allows us to post-process and rank utterances in a prompted speech corpus quickly and effectively. Utterance ranking makes it possible to both select those utterances with the highest likelihood of being correct and to evaluate the quality of the resulting corpus from a limited sample. This approach has been applied to a collection in the eleven official languages of South Africa, and we show that it naturally leads to the creation of stratified corpora from the same collection. Such corpora can be useful for different purposes, and corpus users are provided with the tools to extract these easily: from small, highly accurate corpora to larger corpora that are likely to contain more errors

Keywords

Speech corpora, Automatic speech recognition, Confidence scoring

Citation

Van Heerden, C. & Davel, M.H., et al. 2013. The semi-automated creation of stratified speech corpora. In: Conference Proceedings of the 24th Annual Symposium of the Pattern Recognition Association of South Africa. Pretoria. p. 115-119. [http://www.prasa.org/]

URI

http://hdl.handle.net/10394/12120

Collections

Faculty of Engineering
Conference Papers - Vaal Triangle Campus
Faculty of Natural and Agricultural Sciences

Full item page

The semi-automated creation of stratified speech corpora

Files

Date

Authors

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Record Identifier

Abstract

Sustainable Development Goals

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By