NWU Institutional Repository

The South African directory enquiries (SADE) name corpus

dc.contributor.authorThirion, Jan W.F.
dc.contributor.authorVan Heerden, Charl
dc.contributor.authorGiwa, Oluwapelumi
dc.contributor.authorDavel, Marelie H.
dc.contributor.researchID23299126 - Giwa, Oluwapelumi
dc.contributor.researchID23607955 - Davel, Marelie Hattingh
dc.date.accessioned2020-03-09T08:58:23Z
dc.date.available2020-03-09T08:58:23Z
dc.date.issued2019
dc.description.abstractWe present the design and development of a South African directory enquiries corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first-language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasksen_US
dc.identifier.citationThirion, J.W.F. et al. 2019. The South African directory enquiries (SADE) name corpus. Language resources and evaluation, 54:155-184. [https://doi.org/10.1007/s10579-019-09448-6]en_US
dc.identifier.issn1574-020X
dc.identifier.issn1574-0218 (Online)
dc.identifier.urihttp://hdl.handle.net/10394/34321
dc.identifier.urihttps://link.springer.com/article/10.1007/s10579-019-09448-6
dc.identifier.urihttps://doi.org/10.1007/s10579-019-09448-6
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.subjectSpeech corpus collectionen_US
dc.subjectPronunciation modellingen_US
dc.subjectSpeech recognitionen_US
dc.subjectProper namesen_US
dc.titleThe South African directory enquiries (SADE) name corpusen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: