Search
Now showing items 1-2 of 2
N-gram based language identification of individual words
(PRASA, 2013)
Various factors influence the accuracy with which the language of individual words can be classified using n-grams. We consider a South African text-based language identification (LID) task and experiment with two different ...
The semi-automated creation of stratified speech corpora
(Pattern recognition association of South Africa (PRASA), 2013)
Smartphones provide an efficient means for the collection of speech data; however, the quality of the corpora created in this fashion is not predictable. We describe an approach that allows us to post-process and rank ...