NWU Institutional Repository

The NCHLT Speech Corpus of the South African languages

dc.contributor.authorBarnard, Etienne
dc.contributor.authorDavel, Marelie H.
dc.contributor.authorvan Heerden, Charl
dc.contributor.authorDe Wet, Febe
dc.contributor.authorBadenhorst, Jaco
dc.date.accessioned2018-03-02T13:44:09Z
dc.date.available2018-03-02T13:44:09Z
dc.date.issued2014
dc.descriptionThis work was supported by the Department of Arts and Culture.en_US
dc.description.abstractThe NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa. We describe the design and development processes that were undertaken in order to develop the corpus, and report on associated materials such as orthographic transcriptions and pronunciation dictionaries that were released as part of the corpus. In order to benchmark speech recognition performance on the corpus, we have also developed both phone-recognition and word-recognition systems for all eleven languages; we find that high accuracies can be achieved for these speaker-independent but vocabulary-dependent recognition tasks in all languages.en_US
dc.description.sponsorshipMultilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa Human Language Technologies Research Group, Meraka Institute, CSIR, Pretoria, South Africaen_US
dc.identifier.citationE. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J. Badenhorst, “The NCHLT Speech Corpus of the South African languages”, in Proc. Int. Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), pp 194-200, St Petersburg, Russia, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]en_US
dc.identifier.urihttps://researchspace.csir.co.za/dspace/handle/10204/7549
dc.identifier.urihttp://mica.edu.vn/sltu2014/proceedings/28.pdf
dc.identifier.urihttp://hdl.handle.net/10394/26493
dc.language.isoenen_US
dc.publisherWorkshop Spoken Language Technologies for Under-resourced Languages (SLTU)en_US
dc.subjectSpeech Corpusen_US
dc.subjectSouth African languagesen_US
dc.subjectSpeech recognitionen_US
dc.subjectwword-recognitionen_US
dc.subjectphone-recognitionen_US
dc.titleThe NCHLT Speech Corpus of the South African languagesen_US
dc.typePresentationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
barnard-2014-speech-corpus.pdf
Size:
652.21 KB
Format:
Adobe Portable Document Format
Description:
barnard-2014-speech-corpus

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: