The NCHLT Speech Corpus of the South African languages

Barnard, Etienne; Davel, Marelie H.; van Heerden, Charl; De Wet, Febe; Badenhorst, Jaco

The NCHLT Speech Corpus of the South African languages

dc.contributor.author	Barnard, Etienne
dc.contributor.author	Davel, Marelie H.
dc.contributor.author	van Heerden, Charl
dc.contributor.author	De Wet, Febe
dc.contributor.author	Badenhorst, Jaco
dc.date.accessioned	2018-03-02T13:44:09Z
dc.date.available	2018-03-02T13:44:09Z
dc.date.issued	2014
dc.description	This work was supported by the Department of Arts and Culture.	en_US
dc.description.abstract	The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa. We describe the design and development processes that were undertaken in order to develop the corpus, and report on associated materials such as orthographic transcriptions and pronunciation dictionaries that were released as part of the corpus. In order to benchmark speech recognition performance on the corpus, we have also developed both phone-recognition and word-recognition systems for all eleven languages; we find that high accuracies can be achieved for these speaker-independent but vocabulary-dependent recognition tasks in all languages.	en_US
dc.description.sponsorship	Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa Human Language Technologies Research Group, Meraka Institute, CSIR, Pretoria, South Africa	en_US
dc.identifier.citation	E. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J. Badenhorst, “The NCHLT Speech Corpus of the South African languages”, in Proc. Int. Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), pp 194-200, St Petersburg, Russia, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]	en_US
dc.identifier.uri	https://researchspace.csir.co.za/dspace/handle/10204/7549
dc.identifier.uri	http://mica.edu.vn/sltu2014/proceedings/28.pdf
dc.identifier.uri	http://hdl.handle.net/10394/26493
dc.language.iso	en	en_US
dc.publisher	Workshop Spoken Language Technologies for Under-resourced Languages (SLTU)	en_US
dc.subject	Speech Corpus	en_US
dc.subject	South African languages	en_US
dc.subject	Speech recognition	en_US
dc.subject	wword-recognition	en_US
dc.subject	phone-recognition	en_US
dc.title	The NCHLT Speech Corpus of the South African languages	en_US
dc.type	Presentation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: barnard-2014-speech-corpus.pdf
Size:: 652.21 KB
Format:: Adobe Portable Document Format
Description:: barnard-2014-speech-corpus

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Engineering
Faculty of Natural and Agricultural Sciences