Medium-vocabulary speech recognition for under-resourced languages

Van Heerden, Charl J.; Barnard, Etienne; Davel, Marelie H.

Medium-vocabulary speech recognition for under-resourced languages

Date

2012

Authors

Van Heerden, Charl J.

Barnard, Etienne

Davel, Marelie H.

Researcher ID

11539151 - Van Heerden, Carel Jacobus
23607955 - Davel, Marelie Hattingh
21021287 - Barnard, Etienne

Publisher

SLTU

Abstract

We report on the development of speech-recognition systems that are able to perform accurate recognition on mediumvocabulary tasks (i.e. tasks that require distinctions between approximately 200 different terms). We are able to achieve error rates of less than 5% (our design goal) on four underresourced languages as well as English, by using training corpora that contain 70–100 hours of speech per language. The majority of the errors stem from words such as abbreviations, foreign words or names, which do not adhere to the standard orthography of the target language. We also find that recognition accuracy does not depend strongly on the number of occurrences of a term in the training set or the length of the term to be recognized, and that a few problematic speakers are responsible for a disproportionate number of errors.

Description

International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, 7-9 May 2012

Keywords

Speech recognition, Under-resourced languages, Multilingual speech processing

Citation

Van Heerden, C.J. & Davel, M.H., et al. 2012. Medium-vocabulary speech recognition for under-resourced languages. In: International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, 7-9 May 2012. [http://www.mica.edu.vn/sltu2012/files/proceedings/26.pdf]

URI

http://hdl.handle.net/10394/13632
http://www.mica.edu.vn/sltu2012/files/proceedings/26.pdf

Collections

Faculty of Engineering
Conference Papers - Vaal Triangle Campus
Faculty of Natural and Agricultural Sciences

Full item page

Medium-vocabulary speech recognition for under-resourced languages

Date

Authors

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Record Identifier

Abstract

Sustainable Development Goals

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By